Re: Why not Neural networks ??

Mark Martinec Thu, 16 Apr 2015 08:40:03 -0700

Sarang Shrivastava wrote:

Yes, indeed CRM114 has a lot criterias for categorization of data andthatcan be done via a host of methods, including regexes, approximateregexes,
a Hidden Markov Model, Orthogonal Sparse Bigrams, WINNOW, Correllation,
KNN/Hyperspace, or Bit Entropy.
We can take ideas from them and develop our own plugin that has the
capability to compete with CRM114. Afterall there is no place likehome. Ilook forward to work on these given the fact that my proposal getsaccepted.


Like with Kevin, the AI is not my field of expertise either, although in
a nearby laboratory at our institute there is quite a strong group of
researchers working in that area (but with less interest in open-source
projects than myself). I think such algorithms can potentially offer a
substantial fresh air to SpamAssassin and are well worth exploring.

A thought to give: Does using custom plugins hinder the performance ofSAin terms of speed ? No doubt that CRM114 is good in classifying spamsand
hams but does it in any case hamper the speed at all ?
What do you guys say about including these into SA itself if possible ?


A SpamAssasin plugin is just a perl module, loaded with the rest of

SpamAssassin framework. There is no additional overhead in callingmethods

in a plugin, it's just a normal perl subroutine call. In fact most
of the existing content-checking methods that come with SpamAssassin
are even now packaged as plugins. The rest of SpamAssassin are utility
routines, parsing and interfacing code and such.

If a plugin (i.e. a perl module) cannot do all the work by itself
but needs to invoke some external service, e.g. run some program
or connect to some service or use a database, that does add some
overhead. Done carefully such overhead can still be relatively small
and manageable.

If some database is needed, currently SpamAssassin can use a couple
of them, from file-based (e.g. BerkeleyDB), to SQL, LDAP, Redis.
Btw, Quanah Gibson-Mount (from Zimbra) is a strong advocate for trying
the LMDB (http://symas.com/mdb/), which recently also can be used by a

Postfix mailer, so it may be worth considering when an in-memorydatabase

(such as Redis or Memcache) may not suffice.

Some databases (like SQL or Redis) offer server-side code execution
(stored procedures, or LUA in Redis), which can significantly reduce
the number of round-trip query/response accesses to a database, which
can be valuable for performance.

  Mark

Re: Why not Neural networks ??

Reply via email to