Steve, Thanks for your information!!

I examined about the bayseian filtering, and I can easily test it on
the distributed system -- map/reduce is easy.

See http://blog.udanax.org/2008/10/parallel-bayesian-spam-filtering-using.html

/Edward

On Mon, Sep 22, 2008 at 7:21 PM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> Edward J. Yoon wrote:
>>
>> Hi all,
>>
>> To reduce the efforts of the artificial management for planet-scale
>> mail service, I'm consider about the statistical spam filtering with
>> the SpamAssasin, Hadoop (distributed computing), Hama (parallel matrix
>> computing) projects.
>>
>> Please any advice (or experience) !!
>
> Have you spoken to SpamAssassin? They'd probably love to get involved in a
> streams-based filtering system. One thing to know there is that a lot of
> their test data is private, as they have to include lots of legitimate email
> alongside the spam, so their big datasets aren't always that public.
>
> Talk to Justin Mason and the spamassassin developers
>
> -steve
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org

Reply via email to