autolearn vs sa-learn / Bayes

2008-02-21 Thread Diego Pomatta

Hello list.

Does the bayes system use a separate db for the "autolearn" mode?

Today I noticed that my SA bayes has 50 spam and 45 ham mails learned, 
when I thought the db had a lot more, because bayes IS being used.


# sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
*0.000  0 50  0  non-token data: nspam
0.000  0 45  0  non-token data: nham*

# spamassassin -D --lint
...
[7896] dbg: bayes: found bayes db version 3
[7896] dbg: bayes: DB journal sync: last sync: 0
*[7896] dbg: bayes: not available for scanning, only 50 spam(s) in bayes 
DB < 200*

...

In the beginning , after setting up SA, bayes was not being used.
I had not trained it with anything yet, but my local.cf had:
*use_bayes 1
use_bayes_rules 1
bayes_auto_learn 1*

Reading the logs I noticed that it was only autolearning spam, not ham.
So I added
*bayes_auto_learn_threshold_nonspam 0.5*
and it started learning ham.
I monitored the logs and at some point incoming mails started triggering 
the BAYES_20, BAYES_50, BAYES_00, BAYES_95, BAYES_99, rules.
So I figured it had autlearned the minimum needed amount of ham and spam 
(200) to start working.
Every now and then I use sa-learn to feed some spam and ham to bayes, 
and I thought I was contributing to the same db. Those must be the 50 
spam and 45 ham mails.


So what's the deal? :)
/Regards



Re: autolearn vs sa-learn / Bayes

2008-02-21 Thread Luis Hernán Otegui
Hola, Diego

2008/2/21, Diego Pomatta <[EMAIL PROTECTED]>:
> Hello list.
>
>  Does the bayes system use a separate db for the "autolearn" mode?
>
>  Today I noticed that my SA bayes has 50 spam and 45 ham mails learned,
>  when I thought the db had a lot more, because bayes IS being used.
>
>  # sa-learn --dump magic
>  0.000  0  3  0  non-token data: bayes db version
>  *0.000  0 50  0  non-token data: nspam
>  0.000  0 45  0  non-token data: nham*
>
>  # spamassassin -D --lint
>  ...
>  [7896] dbg: bayes: found bayes db version 3
>  [7896] dbg: bayes: DB journal sync: last sync: 0
>  *[7896] dbg: bayes: not available for scanning, only 50 spam(s) in bayes
>  DB < 200*
>  ...
>
>  In the beginning , after setting up SA, bayes was not being used.
>  I had not trained it with anything yet, but my local.cf had:
>  *use_bayes 1
>  use_bayes_rules 1
>  bayes_auto_learn 1*
>
>  Reading the logs I noticed that it was only autolearning spam, not ham.
>  So I added
>  *bayes_auto_learn_threshold_nonspam 0.5*
>  and it started learning ham.
>  I monitored the logs and at some point incoming mails started triggering
>  the BAYES_20, BAYES_50, BAYES_00, BAYES_95, BAYES_99, rules.
>  So I figured it had autlearned the minimum needed amount of ham and spam
>  (200) to start working.
>  Every now and then I use sa-learn to feed some spam and ham to bayes,
>  and I thought I was contributing to the same db. Those must be the 50
>  spam and 45 ham mails.
>
>  So what's the deal? :)
>  /Regards
>
>

Well, a couple of questions should be answered first: how do you call
SA? under which user does SA run? are you learning those mails under
the right user? Which version are you running? do you use sa-update?

Provided those questions, let's move to the core of this issue: As you
said, you only have 50 spams and 45 hams learned. You should feed more
data to SA, to make the Bayes scores kick-in. Normally, Bayes scores
help SA to get better filtering (at least, they do here, and I suspect
they'll help you too, since as you work in Argentina, your main locale
should be Spanish, and you'll be getting mostly Argentinian spam).

Regards,

Luis
-- 
-
GNU-GPL: "May The Source Be With You...
Linux Registered User #448382.
When I grow up, I wanna be like Theo...
-


Re: autolearn vs sa-learn / Bayes

2008-02-21 Thread Diego Pomatta

Luis Hernán Otegui escribió:

Hola, Diego

2008/2/21, Diego Pomatta <[EMAIL PROTECTED]>:
  

Hello list.

 Does the bayes system use a separate db for the "autolearn" mode?

 Today I noticed that my SA bayes has 50 spam and 45 ham mails learned,
 when I thought the db had a lot more, because bayes IS being used.

 # sa-learn --dump magic
 0.000  0  3  0  non-token data: bayes db version
 *0.000  0 50  0  non-token data: nspam
 0.000  0 45  0  non-token data: nham*

 # spamassassin -D --lint
 ...
 [7896] dbg: bayes: found bayes db version 3
 [7896] dbg: bayes: DB journal sync: last sync: 0
 *[7896] dbg: bayes: not available for scanning, only 50 spam(s) in bayes
 DB < 200*
 ...

 In the beginning , after setting up SA, bayes was not being used.
 I had not trained it with anything yet, but my local.cf had:
 *use_bayes 1
 use_bayes_rules 1
 bayes_auto_learn 1*

 Reading the logs I noticed that it was only autolearning spam, not ham.
 So I added
 *bayes_auto_learn_threshold_nonspam 0.5*
 and it started learning ham.
 I monitored the logs and at some point incoming mails started triggering
 the BAYES_20, BAYES_50, BAYES_00, BAYES_95, BAYES_99, rules.
 So I figured it had autlearned the minimum needed amount of ham and spam
 (200) to start working.
 Every now and then I use sa-learn to feed some spam and ham to bayes,
 and I thought I was contributing to the same db. Those must be the 50
 spam and 45 ham mails.

 So what's the deal? :)
 /Regards





Well, a couple of questions should be answered first: how do you call
SA? under which user does SA run? are you learning those mails under
the right user? Which version are you running? do you use sa-update?

Provided those questions, let's move to the core of this issue: As you
said, you only have 50 spams and 45 hams learned. You should feed more
data to SA, to make the Bayes scores kick-in. Normally, Bayes scores
help SA to get better filtering (at least, they do here, and I suspect
they'll help you too, since as you work in Argentina, your main locale
should be Spanish, and you'll be getting mostly Argentinian spam).

Regards,

Luis
  

Hey Luis. I forgot to add that info, duh.

The setup here is
qmail 3.05
simscan 1.3.1
SpamAssassin 3.2.1 (spamd/spamc)
sa-update is cron'ed to run daily ( no parameters = default channel -> 
updates.spamassassin.org, right? )


Simscan calls spamc under the user "simscan".
I did the manual feeding to sa-learn as root.
so... ummm. I guess root has the separate database and I've been using 
sa-learn with the wrong user...?

Ook, time to remove head from butt, and insert foot in mouth *lol*

Regards
Where are you from Luis?