On 26/8/2016 9:48 μμ, Dino Edwards wrote:

First question I have how many ham/spam have you used to train the bayes. Need at least 200 of each for it to even start working. Also, BAYES_00=-1.9 score usually points to poor training of the bayes database. Basically it means that the those spam messages look a lot what spamasassin has been told are ham messages.

How are you training the database?


Thank you Dino for your reply.

I accumulate spam mails in eml format from users and I put them (usually via ftp) into a particular *empty* directory (/root/reported-spam) on the server.

After each upload of new messages, I run:

   # sa-learn --spam /root/reported-spam
   Learned tokens from 18 message(s) (18 message(s) examined)

Then, after running the above command, I empty the above dir (/root/reported-spam) until the next time that I'll upload new spam mails.

I do not train for ham. I once did that in the past when some messages were misinterpreted as spam.

Today I tried adding in /etc/mail/spamassassin/local.cf:

   bayes_min_ham_num   0
   bayes_min_spam_num  0

to make sure that these settings do not stop bayesian filtering.

Finally, I also increased logging level (in /etc/amavisd.conf):

   $log_level = 3;
   $sa_debug = 'bayes';

while trying to find more details on what is happening, and I noticed messages like:

   Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
   tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_toks
   Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
   tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_seen
   Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
   found bayes db version 3
   Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes: DB
   journal sync: last sync: 1472231151
   Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
   corpus size: nspam = 3440, nham = 717405
   Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
   score = 0
   Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes: DB
   journal sync: last sync: 1472231151
   Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
   untie-ing

Interestingly, all bayesian scoring is quite low:

   # grep 'bayes: score =' /var/log/amavisd.log

   Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
   score = 0
   Aug 26 20:14:49 mailgw3 amavis[24794]: (24794-02) SA dbg: bayes:
   score = 0
   Aug 26 20:35:50 mailgw3 amavis[24794]: (24794-14) SA dbg: bayes:
   score = 0
   Aug 26 20:39:41 mailgw3 amavis[24794]: (24794-15) SA dbg: bayes:
   score = 1.11022302462516e-16
   Aug 26 20:49:38 mailgw3 amavis[24794]: (24794-19) SA dbg: bayes:
   score = 5.55111512312578e-17
   Aug 26 21:19:55 mailgw3 amavis[25627]: (25627-12) SA dbg: bayes:
   score = 0
   Aug 26 21:35:48 mailgw3 amavis[26085]: (26085-03) SA dbg: bayes:
   score = 1.11022302462516e-16
   Aug 26 21:35:54 mailgw3 amavis[26087]: (26087-03) SA dbg: bayes:
   score = 0
   Aug 26 21:46:37 mailgw3 amavis[26085]: (26085-09) SA dbg: bayes:
   score = 1.77635683940025e-15
   Aug 26 21:53:17 mailgw3 amavis[26085]: (26085-12) SA dbg: bayes:
   score = 1.88737914186277e-15
   Aug 26 22:07:24 mailgw3 amavis[26087]: (26087-15) SA dbg: bayes:
   score = 5.32351940307763e-14
   Aug 26 22:49:35 mailgw3 amavis[26691]: (26691-16) SA dbg: bayes:
   score = 0
   Aug 26 23:01:04 mailgw3 amavis[27067]: (27067-02) SA dbg: bayes:
   score = 5.55111512312578e-17
   Aug 26 23:13:18 mailgw3 amavis[27065]: (27065-05) SA dbg: bayes:
   score = 0
   Aug 26 23:30:51 mailgw3 amavis[27065]: (27065-11) SA dbg: bayes:
   score = 0
   Aug 26 23:35:49 mailgw3 amavis[27065]: (27065-12) SA dbg: bayes:
   score = 2.22044604925031e-16
   Aug 26 23:56:13 mailgw3 amavis[27067]: (27067-20) SA dbg: bayes:
   score = 2.43476726314862e-05
   Aug 26 23:59:58 mailgw3 amavis[27673]: (27673-02) SA dbg: bayes:
   score = 0
   Aug 27 00:02:39 mailgw3 amavis[27707]: (27707-02) SA dbg: bayes:
   score = 2.22044604925031e-16
   Aug 27 00:04:33 mailgw3 amavis[27707]: (27707-03) SA dbg: bayes:
   score = 1.11022302462516e-16
   Aug 27 00:05:25 mailgw3 amavis[27673]: (27673-04) SA dbg: bayes:
   score = 1.11022302462516e-16
   Aug 27 00:06:45 mailgw3 amavis[27673]: (27673-05) SA dbg: bayes:
   score = 5.55111512312578e-17

Any ideas?

Nick

Reply via email to