Hi Mar, On Tue, Aug 9, 2011 at 02:22, qwerty qwerty <qwery_asdf...@yahoo.com> wrote: > Hi, > I came across dspam while investigating products for anti spam. A slight > wrinkle is that I want to integrate the anti spam in a non-email system. > Examples include using libdspam on user comments, instant messages, tweets, > etc.
Interesting. Are you trying to make a service, or do you want to embed libdspam in an application? I myself am pondering a web service scenario (JSON-RPC) for blog comments. A naive Bayesian classifier is easy enough to come by, but then you discover you also need a proper tokenizer, auto-whitelisting, maybe noise reduction, different accounts, token expiry, training modes, a client-server model, performance, maybe even inoculation, etc.... things that dspam already has and does very well. > I do not have the traditional email header and body. I still would like > to use libdspam for tokenization and implementation of spam bayes algorithm > rather than rolling my own implementation to catch the traditional spam. If you just want dspam to classify content in a 'service' scenario, as I am planning on doing, then an obvious Improper Hack is to convert content snippets (eg blog comment) to the email message format on the fly, possibly making up some header fields, and submit it to dspam, with dspam doing something such as '--deliver=stdout'. You parse the response and store the verdict and dspam signature together with the piece of content. If you want scalability, run dspam in server mode. There's no need to keep the email interim format around or to actually send email through MTAs. So save for the "bad taste" the transient email message format leaves us with, from a practical perspective such a hack could be a pretty good solution with minimal code maintenance, you or your system engineers can probably whip up a proof of concept over tea ;-) > What do you think is the easiest way to supply an alternate message decoder > and hopefully in a way that I can apply libdspam code updates easily? An alternate message decoder is the more elegant solution. But as long as you want to use dspam to classify *unstructured text*, which you do, then the email message format can do the job. Why not. Cheers, Wicher. ------------------------------------------------------------------------------ Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user