Hi,

four people have checked the spam web form submissions concerning
debian-project. More background can be found at [1]. Thanks to Bas
Wijnen, Paul Wise, and Richard Hecker for reviewing! (Of course, a
special mention to Y Giridhar Appaji Nag who already looked through
debian-devel, but that isn't ripe for action yet.)

Proposal
--------

I propose to remove the 436 messages unanimously classified "spam" from
the web archive.[2]

Note, these will remain available to Devlopers on master.debian.org and
messages will be reincluded if complaints about an erroneous removal are
received by the Listmaster, as discussed at [1] (Policy corner stones).

Some statistics
---------------

Number of messages by range of classification responses (the four
possible responses are explained at[1]):

839 submissions reviewed

436 spam
225 not spam
  6 inapp
  1 unknown
 68 unknown, spam
 33 unknown, not spam
 18 inapp, spam
  9 unknown, inapp
  3 not spam, inapp
 17 unknown, spam, inapp
  8 unknown, not spam, spam
  5 not spam, spam
  2 spam, not spam, inapp
  4 unknown, not spam, inapp
  4 unknown, inapp, not spam, spam

Analysis of the debian-project review
-------------------------------------

We should be most concerned about the messages with (detected) errors,
namely those where the answers contain both "spam" and "non-spam", so
below are the message-ids (best used in conjunction with[3]) and some
analysis of the nature of these messages.

While an error estimate would be nice to have, the naive approach is
based on an independence assumption that seems to be very wrong in our case.

I think that improved tools (quicker access to the web pages with the
"next in thread" links or using the web page, in particular), experience
for the corner cases, and triple review (including some experienced
spam-checker) is a good balance of reliability and effort. (I would even
claim that we there is nothing of particular value that received two
spam votes, but we want to be sure and loose as little as possible.)

  hecker       pabs  tviehmann     wijnen
--- one spam vote
not spam      inapp    unknown       spam
        [EMAIL PROTECTED]
        a request to remove stuff from the archive
    spam   not spam   not spam      inapp
        [EMAIL PROTECTED]
        a German user complaining about Debian CDs he bought elsewhere
    spam    unknown   not spam    unknown
        [EMAIL PROTECTED]
        an Italian user question
not spam    unknown    unknown       spam
        [EMAIL PROTECTED]
        someone complaining about ICQ spam matching some list spam
    spam    unknown   not spam      inapp
        [EMAIL PROTECTED]
        a German user looking for a translation program
    spam   not spam   not spam   not spam
        [EMAIL PROTECTED]
        a complaint about IRC in response to an DWN article
    spam    unknown   not spam      inapp
        [EMAIL PROTECTED]
        a Portuguese user question
    spam   not spam   not spam      inapp
        [EMAIL PROTECTED]
        a German (Swiss) request to be sent a t-shirt to match the swirl
        on his motor scooter
    spam   not spam   not spam   not spam
        [EMAIL PROTECTED]
        a  French and English user question
    spam   not spam    unknown   not spam
        [EMAIL PROTECTED]
        start of a troll thread
    spam   not spam    unknown   not spam
        [EMAIL PROTECTED]
        further down that troll thread
not spam   not spam   not spam       spam
        [EMAIL PROTECTED]
        an offer to redesign our web site, possibly serious
    spam    unknown   not spam      inapp
        [EMAIL PROTECTED]
        a Spanish user question
not spam    unknown    unknown       spam
        [EMAIL PROTECTED]
        a Linux portal announcement at least bordering spam
--- two spam votes
    spam    unknown   not spam       spam
        [EMAIL PROTECTED]
        a Polish user question
    spam    unknown   not spam       spam
        [EMAIL PROTECTED]
        someone looking (in a strange way) for someone with the the same
        name as a Debian contributor who has some 256 posts on our
        English language lists between 1999/09 and 2001/10
    spam       spam   not spam    unknown
        [EMAIL PROTECTED]
        a Spanish unsolicited software survey not directly related to
        Debian
--- three spam votes
    spam   not spam       spam       spam
        [EMAIL PROTECTED]
        a Croatian (one-liner) user question
--- unquestionably spam
not spam       spam       spam       spam
        [EMAIL PROTECTED]
        link request spam

Kind regards

Thomas

1. http://wiki.debian.org/Teams/ListMaster/ListArchiveSpam
   and originally, with followups, on this mailing list
   http://lists.debian.org/debian-project/2007/11/msg00012.html
2. In master.d.o:~tviehmann/spam-removals/ you will find
   "reports" and "proposed" removals and the python (>=2.4) script
   comparing them. The .spam files actually used reside with the
   mbox archives on master:/org/lists.debian.org/lists/,
   presently only four Listmaster-removed spams.
3. http://lists.debian.org/msgid-search/
   use http://lists.debian.org/msgid-search/%s for quick bookmarks
-- 
Thomas Viehmann, http://thomas.viehmann.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to