Hi Manoj, Manoj Srivastava wrote: > Hmm. I'll be happy to help automate some of the decision making > using my Spam classification mechanisms; please look at > http://www.golden-gryphon.com/software/spam/crm114_accuracy.html > to see the lower bound on accuracy I get from (mostly) Debian email. > Adding SA to the CRM114 results above gives about 99.92% accuracy > overall -- and crm114 has had 100% accuracy in identifying Spam in the > last two years I have been using it.
If you have suggestions how automatic testing can be incorporated into the a spam-removal process in a way that is acceptable to the project, I'd be very happy to seem them discussed here. However I'm not sure that the bias that we (there are six people currently seeing how things work) currently impose in our manual review can be very well implemented in software. What to do with "sponsorship request spam" from people claiming to be students or clans, what to do with foreign language spam that people reply to with translation and the explanation "ignore, this is spam", what to do with the reply? > It would be interesting to see how many messages escape my > filters, and give me an opportunity to further train them. All I need > would be the mbox file; and for me to setup a process to feed the email > to the filters, and classify the result -- and then send back the > message ID's of Ham and Spam back to Debian. There is a couple of almost-mboxes linked from [1]. Before the first "From " there is a mbox-like header but from there on it is a regular mbox archive consisting of the nominations. Preliminary results indicate that around 2/3 of the submissions for debian-project are actually removal candidates (based on review by pabs and me, there are others looking at the same things). The information in the initial headers should be fairly self-explanatory, the number besides year, month, and message number is the number of times this a message was reported as spam. I can easily put up more of these, of course, just tell me what you want. (There are ca. 90000 nominated messages, it is unclear to me whether old data is equally usable as newer.) Kind regards Thomas 1. http://wiki.debian.org/Teams/ListMaster/ListArchiveSpam -- Thomas Viehmann, http://thomas.viehmann.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]