Yes, indeed CRM114 has a lot criterias for categorization of data and that can be done via a host of methods, including regexes, approximate regexes, a Hidden Markov Model, Orthogonal Sparse Bigrams, WINNOW, Correllation, KNN/Hyperspace, or Bit Entropy.
We can take ideas from them and develop our own plugin that has the capability to compete with CRM114. Afterall there is no place like home. I look forward to work on these given the fact that my proposal gets accepted. A thought to give: Does using custom plugins hinder the performance of SA in terms of speed ? No doubt that CRM114 is good in classifying spams and hams but does it in any case hamper the speed at all ? What do you guys say about including these into SA itself if possible ? On Thu, Apr 16, 2015 at 4:38 AM, Mark Martinec <mark.martinec...@ijs.si> wrote: > Quanah Gibson-Mount wrote: > >> --On Wednesday, April 15, 2015 3:34 PM +0200 Mark Martinec wrote: >> >>> Don't know. It might be worth taking a look at the CRM114 classifier, >>> which implements a number of methods. The CRM114 can be used as a >>> plugin to SpamAssassin, or can be called from Amavis and results >>> combined with those from SpamAssassin. We had good results from CRM114, >>> and some friends of mine are very enthusiastic about it. Certainly >>> it is a good complement to SpamAssassin's naive bayes classifier, >>> >> >> I was looking at adding CRM-114 to Zimbra to integrate with SA and/or >> Amavis as a dspam, but it seems to have the same issue dspam has -- >> Abandoned. The last mail to the announcement list was in 2007 about >> some issue with the software, and prior to that was an email in July >> 2006 about a new RC. It's been dead silent since. There's some minor >> traffic on the users list. >> It also depends on a program called tre, which also appears to be >> abandoned, although it at least had a commit in the last yearish. >> > > That is unfortunately true, it seems to be abandoned. It also uses > some unorthodox implementation language. But it does work as advertised. > I haven't noticed any malfunctions, and its results are comparable > to SA's Bayes: when they agree it adds valuable score points to marginal > spam, and when they disagree it is not unusual that one or the other > saves us from a false positive. > > CRM114 can be a rich source of ideas and AI algorithms, which is why > I mentioned it in this thread - worth looking into its approaches. > > > I would be interested to hear more about what results your friends >> are seeing and how they are integrating it into their workflow. >> > > Don't have any statistics on that. It's been in use as either the > SpamAssassin plugin or as Amavis external spam filter - either way > it does its job, and one or the other approach has some benefits and > drawbacks. I believe auto-learning-on-error is in use (after initial > training) through learn_ham and learn_spam fields in an @spam_scanners > entry in amavisd.conf. > > I was looking at adding CRM-114 to Zimbra to integrate with SA and/or >> Amavis as a dspam, but it seems to have the same issue dspam has >> > > It may not be the best match for an unattended operation at some > remote customer location, I agree. It uses a fixed size database > which may need occasional resizing. For a managed site with a > knowledgeable administrator it can be a good fit - it certainly > does well for some sites. But as with all auto-learning approaches: > the quality of results depends on the quality of input. Better classical > rules (DNS, regexps, ...) yield better auto-learning, and the more > homogeneous a use base is (with their mail content), the better > results are. > > Mark > > -- *Sarang Shrivastava* *Computer Science & Engineering* *MNNIT Allahabad*