Yes, indeed CRM114 has a lot criterias for categorization of data and that
can be done via a host of methods, including regexes, approximate regexes,
a Hidden Markov Model, Orthogonal Sparse Bigrams, WINNOW, Correllation,
KNN/Hyperspace, or Bit Entropy.

We can take ideas from them and develop our own plugin that has the
capability to compete with CRM114. Afterall there is no place like home. I
look forward to work on these given the fact that my proposal gets accepted.

A thought to give: Does using custom plugins hinder the performance of SA
in terms of speed ? No doubt that CRM114 is good in classifying spams and
hams but does it in any case hamper the speed at all ?

What do you guys say about including these into SA itself if possible ?

On Thu, Apr 16, 2015 at 4:38 AM, Mark Martinec <mark.martinec...@ijs.si>
wrote:

> Quanah Gibson-Mount wrote:
>
>> --On Wednesday, April 15, 2015 3:34 PM +0200 Mark Martinec wrote:
>>
>>> Don't know. It might be worth taking a look at the CRM114 classifier,
>>> which implements a number of methods. The CRM114 can be used as a
>>> plugin to SpamAssassin, or can be called from Amavis and results
>>> combined with those from SpamAssassin. We had good results from CRM114,
>>> and some friends of mine are very enthusiastic about it. Certainly
>>> it is a good complement to SpamAssassin's naive bayes classifier,
>>>
>>
>> I was looking at adding CRM-114 to Zimbra to integrate with SA and/or
>> Amavis as a dspam, but it seems to have the same issue dspam has --
>> Abandoned.  The last mail to the announcement list was in 2007 about
>> some issue with the software, and prior to that was an email in July
>> 2006 about a new RC.  It's been dead silent since.  There's some minor
>> traffic on the users list.
>> It also depends on a program called tre, which also appears to be
>> abandoned, although it at least had a commit in the last yearish.
>>
>
> That is unfortunately true, it seems to be abandoned. It also uses
> some unorthodox implementation language. But it does work as advertised.
> I haven't noticed any malfunctions, and its results are comparable
> to SA's Bayes: when they agree it adds valuable score points to marginal
> spam, and when they disagree it is not unusual that one or the other
> saves us from a false positive.
>
> CRM114 can be a rich source of ideas and AI algorithms, which is why
> I mentioned it in this thread - worth looking into its approaches.
>
>
>  I would be interested to hear more about what results your friends
>> are seeing and how they are integrating it into their workflow.
>>
>
> Don't have any statistics on that. It's been in use as either the
> SpamAssassin plugin or as Amavis external spam filter - either way
> it does its job, and one or the other approach has some benefits and
> drawbacks. I believe auto-learning-on-error is in use (after initial
> training) through learn_ham and learn_spam fields in an @spam_scanners
> entry in amavisd.conf.
>
>  I was looking at adding CRM-114 to Zimbra to integrate with SA and/or
>> Amavis as a dspam, but it seems to have the same issue dspam has
>>
>
> It may not be the best match for an unattended operation at some
> remote customer location, I agree. It uses a fixed size database
> which may need occasional resizing. For a managed site with a
> knowledgeable administrator it can be a good fit - it certainly
> does well for some sites. But as with all auto-learning approaches:
> the quality of results depends on the quality of input. Better classical
> rules (DNS, regexps, ...) yield better auto-learning, and the more
> homogeneous a use base is (with their mail content), the better
> results are.
>
>   Mark
>
>


-- 
*Sarang Shrivastava*
*Computer Science & Engineering*
*MNNIT Allahabad*

Reply via email to