Hi Mar,

On Tue, Aug 9, 2011 at 02:22, qwerty qwerty <qwery_asdf...@yahoo.com> wrote:
> Hi,
> I came across dspam while investigating products for anti spam. A slight
> wrinkle is that I want to integrate the anti spam in a non-email system.
> Examples include using libdspam on user comments, instant messages, tweets,
> etc.

Interesting. Are you trying to make a service, or do you want to embed
libdspam in an application?
I myself am pondering a web service scenario (JSON-RPC) for blog
comments. A naive Bayesian classifier is easy enough to come by, but
then you discover you also need a proper tokenizer, auto-whitelisting,
maybe noise reduction, different accounts, token expiry, training
modes, a client-server model, performance, maybe even inoculation,
etc.... things that dspam already has and does very well.

> I do not have the traditional email header and body. I still would like
> to use libdspam for tokenization and implementation of spam bayes algorithm
>  rather than rolling my own implementation to catch the traditional spam.

If you just want dspam to classify content in a 'service' scenario, as
I am planning on doing, then an obvious Improper Hack is to convert
content snippets (eg blog comment) to the email message format on the
fly, possibly making up some header fields, and submit it to dspam,
with dspam doing something such as '--deliver=stdout'. You parse the
response and store the verdict and dspam signature together with the
piece of content. If you want scalability, run dspam in server mode.

There's no need to keep the email interim format around or to actually
send email through MTAs.
So save for the "bad taste" the transient email message format leaves
us with, from a practical perspective such a hack could be a pretty
good solution with minimal code maintenance, you or your system
engineers can probably whip up a proof of concept over tea ;-)

> What do you think is the easiest way to supply an alternate message decoder
> and hopefully in a way that I can apply libdspam code updates easily?

An alternate message decoder is the more elegant solution. But as long
as you want to use dspam to classify *unstructured text*, which you
do, then the email message format can do the job. Why not.

Cheers, Wicher.

------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to