Steve, Thanks for your information!! I examined about the bayseian filtering, and I can easily test it on the distributed system -- map/reduce is easy.
See http://blog.udanax.org/2008/10/parallel-bayesian-spam-filtering-using.html /Edward On Mon, Sep 22, 2008 at 7:21 PM, Steve Loughran <[EMAIL PROTECTED]> wrote: > Edward J. Yoon wrote: >> >> Hi all, >> >> To reduce the efforts of the artificial management for planet-scale >> mail service, I'm consider about the statistical spam filtering with >> the SpamAssasin, Hadoop (distributed computing), Hama (parallel matrix >> computing) projects. >> >> Please any advice (or experience) !! > > Have you spoken to SpamAssassin? They'd probably love to get involved in a > streams-based filtering system. One thing to know there is that a lot of > their test data is private, as they have to include lots of legitimate email > alongside the spam, so their big datasets aren't always that public. > > Talk to Justin Mason and the spamassassin developers > > -steve > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org