At 01:10 PM 8/16/02 -0700, you wrote: >Hey, all. > >Take a look at this -- > ><http://www.paulgraham.com/spam.html> > >It's a new technique for identifying spam. The more I look into the details, >the more I think we have the "anti-spam killer app", becaues it tunes itself >to the individual (or site), adapts as the anti-spammers adapt, and the >technique used is fairly easy to implement and damn difficult for a spammer >to avoid.... > >Damn, I wish I'd thought of this. > >(I've dropped a pointer to it at >http://www.chuqui.com/cgi-bin/mwf/topic_show.pl?tid=389)
Yea, I read that... It really started wheels turning in my head. Now, if I'd coded in lisp more recently than 20 years ago it would have been a little easier, since I hadn't taken stats any more recently either, and couldn't remember Bayesean analysis to save my soul... Of course, it still requires either you accept someone elses starting corpus, or you have to have someone still tag the spam as it arrives. But it wouldn't be *that* hard to add to mailman... Add a button to the admindb page for "this was spam". Ideally add a button into the pipermail interface (yea, I know, change pipermail; ugh feh) that the admin can use that says "This message is spam. Treat it a so". Both would make the message go away, and get added to the lists spam corpus correspondingly deleting it from the lists good corpus stats if it was in the archives (why? because if it made it that far then it was considered valid email, and we need to get its keywords *out* of that database). We'd also archive the entire note for future reference. At some point, when we had enough data to trust the corpus sample size, the list admin would be given the option of turning on the spam filter, which could just throw the same results out that the moderation checks do; discard, hold, pass, etc. Meanwhile, back at the ranch, the site owner could run a site-level filter if they liked, independant of the list-level one (hence why we save the messages. This allows the site admin to review messages tagged as spam, and agree or disagree on their inclusion, preventing the psycho-moderator syndrome), which could be used to seed new lists with a better starting corpus... _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman-21/listinfo/mailman-developers
