Hi, four people have checked the spam web form submissions concerning debian-project. More background can be found at [1]. Thanks to Bas Wijnen, Paul Wise, and Richard Hecker for reviewing! (Of course, a special mention to Y Giridhar Appaji Nag who already looked through debian-devel, but that isn't ripe for action yet.)
Proposal -------- I propose to remove the 436 messages unanimously classified "spam" from the web archive.[2] Note, these will remain available to Devlopers on master.debian.org and messages will be reincluded if complaints about an erroneous removal are received by the Listmaster, as discussed at [1] (Policy corner stones). Some statistics --------------- Number of messages by range of classification responses (the four possible responses are explained at[1]): 839 submissions reviewed 436 spam 225 not spam 6 inapp 1 unknown 68 unknown, spam 33 unknown, not spam 18 inapp, spam 9 unknown, inapp 3 not spam, inapp 17 unknown, spam, inapp 8 unknown, not spam, spam 5 not spam, spam 2 spam, not spam, inapp 4 unknown, not spam, inapp 4 unknown, inapp, not spam, spam Analysis of the debian-project review ------------------------------------- We should be most concerned about the messages with (detected) errors, namely those where the answers contain both "spam" and "non-spam", so below are the message-ids (best used in conjunction with[3]) and some analysis of the nature of these messages. While an error estimate would be nice to have, the naive approach is based on an independence assumption that seems to be very wrong in our case. I think that improved tools (quicker access to the web pages with the "next in thread" links or using the web page, in particular), experience for the corner cases, and triple review (including some experienced spam-checker) is a good balance of reliability and effort. (I would even claim that we there is nothing of particular value that received two spam votes, but we want to be sure and loose as little as possible.) hecker pabs tviehmann wijnen --- one spam vote not spam inapp unknown spam [EMAIL PROTECTED] a request to remove stuff from the archive spam not spam not spam inapp [EMAIL PROTECTED] a German user complaining about Debian CDs he bought elsewhere spam unknown not spam unknown [EMAIL PROTECTED] an Italian user question not spam unknown unknown spam [EMAIL PROTECTED] someone complaining about ICQ spam matching some list spam spam unknown not spam inapp [EMAIL PROTECTED] a German user looking for a translation program spam not spam not spam not spam [EMAIL PROTECTED] a complaint about IRC in response to an DWN article spam unknown not spam inapp [EMAIL PROTECTED] a Portuguese user question spam not spam not spam inapp [EMAIL PROTECTED] a German (Swiss) request to be sent a t-shirt to match the swirl on his motor scooter spam not spam not spam not spam [EMAIL PROTECTED] a French and English user question spam not spam unknown not spam [EMAIL PROTECTED] start of a troll thread spam not spam unknown not spam [EMAIL PROTECTED] further down that troll thread not spam not spam not spam spam [EMAIL PROTECTED] an offer to redesign our web site, possibly serious spam unknown not spam inapp [EMAIL PROTECTED] a Spanish user question not spam unknown unknown spam [EMAIL PROTECTED] a Linux portal announcement at least bordering spam --- two spam votes spam unknown not spam spam [EMAIL PROTECTED] a Polish user question spam unknown not spam spam [EMAIL PROTECTED] someone looking (in a strange way) for someone with the the same name as a Debian contributor who has some 256 posts on our English language lists between 1999/09 and 2001/10 spam spam not spam unknown [EMAIL PROTECTED] a Spanish unsolicited software survey not directly related to Debian --- three spam votes spam not spam spam spam [EMAIL PROTECTED] a Croatian (one-liner) user question --- unquestionably spam not spam spam spam spam [EMAIL PROTECTED] link request spam Kind regards Thomas 1. http://wiki.debian.org/Teams/ListMaster/ListArchiveSpam and originally, with followups, on this mailing list http://lists.debian.org/debian-project/2007/11/msg00012.html 2. In master.d.o:~tviehmann/spam-removals/ you will find "reports" and "proposed" removals and the python (>=2.4) script comparing them. The .spam files actually used reside with the mbox archives on master:/org/lists.debian.org/lists/, presently only four Listmaster-removed spams. 3. http://lists.debian.org/msgid-search/ use http://lists.debian.org/msgid-search/%s for quick bookmarks -- Thomas Viehmann, http://thomas.viehmann.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]