Re: SA efficency degrades quickly

2005-06-21 Thread Loren Wilton
> but after few days, the
> efficency of SpamAssassin degrades from >90% of spam correctly
> identified to a 60%... I tried to learn it again with new, not

You must have something really wrong here.  SA does degrade with time, but
it is over months, not days, and it is only around 10% degredation.

You don't say what kind of learning you are doing, Bayes or Awl.  I will
assume it is probably Bayes, but maybe you are doing both.

You also don't show an example spam that didn't get marked, so we don't know
what rules it hit.  So all we can really do is make guesses rather than
telling you what the real problem is.

I'm hesitant to guess at what the problem is, so you should probably show an
excerpt of a spam that failed to be marked, including the rules that hit on
it.

Loren



Re: SA efficency degrades quickly

2005-06-21 Thread Robert Menschel
Hello Mailing,

Tuesday, June 21, 2005, 10:48:44 AM, you wrote:

MLANC> Hi!
MLANC>  I have a little problem with spam recognition. I have re-learned
MLANC> SpamAssassin (deleting old file from ".spamassassin" directory, to clear
MLANC> old information) and it worked really nice... but after few days, the
MLANC> efficency of SpamAssassin degrades from >90% of spam correctly
MLANC> identified to a 60%... I tried to learn it again with new, not
MLANC> recognized spam (and with all new ham, to respect a 1:1 - about - ratio
MLANC> of spam:ham) but without any result.

My experience is the opposite -- after wiping a Bayes database SA is
initially 70%-80% accurate, and then rises steadily to 95% and better
(better = with SARE rules).

I'm guessing you may have auto-learn enabled with the default limits,
and spam that sneaks by with 0.0 or 0.1 scores are learned as
non-spam, polluting your database.

If you have reliable negative-scoring ham rules (which generally are
domain- or user-specific, then set your auto-learn ham threshold to
some negative score (-0.2 or -0.5 or something like that).  If you
have no reliable negative-scoring ham rules, then turn off auto-learn
and ONLY use sa-learn manually as you describe above.

That may take care of your problem.

Alternately, are you using SARE rules?  Start with the most reliable
SARE rules files, expand slowly, and they'll probably help you avoid
Bayes degredation.

Bob Menschel