Re: Bayes auto-learn - Solved
On Fri, 11 Aug 2017, John Hardin wrote: On Fri, 11 Aug 2017, Scott wrote: I'm chicken. :D I don't have much (almost no) experience overriding those yum packages. It's pretty simple, just "yum install {local_filename}" And those warnings I got when I rebuilt from source made me nervous. I suppose I could publish the Centos 7 x86/64 RPMs I build and use on my website. They wouldn't be signed... http://www.impsec.org/~jhardin/antispam/centos7/ You do need the epel-release package installed to use this. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Je ne suis pas Charlie. Je suis armé. --- 4 days until the 72nd anniversary of the end of World War II
Re: Bayes auto-learn - Solved
On Fri, 11 Aug 2017, Scott wrote: I'm chicken. :D I don't have much (almost no) experience overriding those yum packages. It's pretty simple, just "yum install {local_filename}" And those warnings I got when I rebuilt from source made me nervous. I suppose I could publish the Centos 7 x86/64 RPMs I build and use on my website. They wouldn't be signed... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #6: If you can choose what to bring to a gunfight, bring a long gun and a friend with a long gun. --- 4 days until the 72nd anniversary of the end of World War II
Re: Bayes auto-learn - Solved
I'm chicken. :D I don't have much (almost no) experience overriding those yum packages. And those warnings I got when I rebuilt from source made me nervous. Maybe when the dust settles... -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138316.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - Solved
On Fri, 11 Aug 2017, Scott wrote: Centos7 (selinux disabled at the time of testing) Spamassassin 3.4.0 Next on your plate: upgrading to 3.4.1... https://dl.fedoraproject.org/pub/fedora/linux/releases/25/Everything/source/tree/Packages/s/spamassassin-3.4.1-9.fc25.src.rpm It works jes' fine here. ...ooo, time to update: https://dl.fedoraproject.org/pub/fedora/linux/releases/26/Everything/source/tree/Packages/s/spamassassin-3.4.1-12.fc26.src.rpm -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- ...for a nation to tax itself into prosperity is like a man standing in a bucket and trying to lift himself up by the handle. -- Winston Churchill --- 4 days until the 72nd anniversary of the end of World War II
Re: Bayes auto-learn - Solved
Restored my last copy of my manually learned bayes database from the "bad" directory to the new location. Let it cook for a day. Of the messages that made it through postscreen, RBLs, etc since my logs rotated early this morning, 93% were autolearn=no, 2% autolearn=spam, 5% autolearn=ham. ZERO autolearn=unavailable. This is solved. Summary for posterity for anyone who may run into the same problem. Some tags for searching: Centos7 (selinux disabled at the time of testing) Postfix 3.2.2 Amavisd-new amavisd amavis 2.11.0 Spamassassin 3.4.0 bayes autolearn=unavailable I strongly suspected bayes auto-learn was not functioning. Read the thread for evidence. In local.cf had the bayes path set to: /etc/mail/bayes/bayes Don't remember if it came packaged that way or if I followed someone else's "guide" to ed up with that bad location. I do see one well written guide that specified that folder, honestly I'm not sure. No matter. In any case, the end result was that any message that would have been autolearned got "autolearn=unavailable" and did not learn. The fix for this setup as listed above was to NOT have the directory under /etc. Even with wide open (777) write permissions, amavisd/SA was apparently unable to write there. I moved the bayes database under /var/spool/amavisd/bayes and all now functions properly. Note the default if no path is specified is /var/spool/amavisd/.spamassassin/bayes IIRC (assuming /var/spool/amavisd is the home directory for amavis) I tested both, it appears happy with either. local.cf: bayes_path /var/spool/amavisd/bayes/bayes I now have a journal file FWIW: [root@mail2 root]# ls -la /var/spool/amavisd/bayes total 4280 drwx-- 2 amavis amavis4096 Aug 11 16:07 . drwxr-xr-x 7 amavis amavis4096 Aug 10 22:18 .. -rw-rw-rw- 1 amavis amavis 81888 Aug 11 16:07 bayes_journal -rw-rw-rw- 1 amavis amavis 86016 Aug 11 16:07 bayes_seen -rw-rw-rw- 1 amavis amavis 5267456 Aug 11 16:07 bayes_toks And my database is happy: [root@mail2 root]# su amavis -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0359 0 non-token data: nspam 0.000 0494 0 non-token data: nham 0.000 0 149970 0 non-token data: ntokens I now know way more about amavis-new and spamassassin than I did when I started. Guess that's the silver lining to a few days of hair pulling. Thanks, Scott -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138314.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening, tentative success....
Tom: re selinux: Yes, once I discovered the fix, I considered that could have been the casue. FWIW I'm not using it and it's disabled, so it *shouldn't* hose anything. But I would not be surprised if it were the culprit. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138313.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening, tentative success....
xOn 11-08-17 17:05, Scott wrote: > I'm going to go back and look at my build notes but I think that directory > got created for me. It's just as possible i followed some "guide". I am > positive i did not think it up on my own LOL. I remember more than set of > instructions one with that path setting, and it very well could be the > related Centos7 package. Glad i found the casue though. Regardless of the > source. > > In the FWIW department, as shown above, I still don't have it in the default > location (I know, risks...), but why it is happy there and not under /etc I > don't know. And really don't care at this point. > I had to go way back in thread to look it up, but I noticed you're running Centos, which has selinux. Maybe your custom path is disallowed under the amavis/spamd/whatever role? And manual testing when su'ing from the root role will not have the same impact as running amavis using an init system. Kind regards, Tom signature.asc Description: OpenPGP digital signature
Re: Bayes auto-learn - not happening, tentative success....
I'm going to go back and look at my build notes but I think that directory got created for me. It's just as possible i followed some "guide". I am positive i did not think it up on my own LOL. I remember more than set of instructions one with that path setting, and it very well could be the related Centos7 package. Glad i found the casue though. Regardless of the source. In the FWIW department, as shown above, I still don't have it in the default location (I know, risks...), but why it is happy there and not under /etc I don't know. And really don't care at this point. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138299.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening, tentative success....
On Fri, 11 Aug 2017 16:22:50 +0200 Matus UHLAR - fantomas wrote: > don't set the path, that way it should work OOTB. Maybe amavis is different and has it's own internl default location, but the equivalent for spamd relies on the packager giving the spamd user a unix home directory. I once saw a Bayes howto that recommended: mkdir /nonexistent
Re: Bayes auto-learn - not happening, tentative success....
On 10.08.17 20:15, Scott wrote: About the only difference in my old, functioning box and this new "clean" install was the location of the bayes files. Old box: /var/spool/amavisd/.spamassassin/ New box: /etc/mail/bayes On 11.08.17 16:22, Matus UHLAR - fantomas wrote: Do did you change bayes path in first place? I mean why, of course [deleted] don't set the path, that way it should work OOTB. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Spam is for losers who can't get business any other way.
Re: Bayes auto-learn - not happening, tentative success....
On 10.08.17 20:15, Scott wrote: About the only difference in my old, functioning box and this new "clean" install was the location of the bayes files. Old box: /var/spool/amavisd/.spamassassin/ New box: /etc/mail/bayes Do did you change bayes path in first place? amavis is the only one who processes the database, there's no need to change it and play with permissions (which might be the reason why it does not work). you can still train as root: sa-learn --dbpath /var/spool/amavis/.spamassassin/ ... Finally to the path setting: I tried setting the default path, and changing the path filename suffix of bayes to mybayes for curiosity... bayes_path /var/spool/amavisd/.spamassassin/mybayes Upon sending a test message, SA promptly auto-learned as ham and created the 2 new files starting with "mybayes" instead of bayes. So changing the filename didn't hurt autolearn don't set the path, that way it should work OOTB. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. The only substitute for good manners is fast reflexes.
Re: Bayes auto-learn - not happening, tentative success....
Yeah, i don't know who the culprit is. sa-learn always worked. autolearn did not. So far this am it's looking good. An expected spread of autolearn no, spam, and ham. Not a single unavailable. Will check this afternoon and expect to call this done. Summary for other googlers to follow. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138295.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening, tentative success....
On Thu, 10 Aug 2017 20:15:48 -0700 (MST) Scott wrote: > For reasons beyond my skill set, > SA will not auto-learn to a bayes db in a folder in /etc/mail/bayes. > Regardless of wide open permissions on everything except /etc. And > the user's confirmed ability to write to the folder. But sa-learn was working. It seems more likely that the difference is between the ordinary SA scripts and amavis rather than between auto and manual training. Amavis uses SA libraries from it's own perl code and it's free to behave very differently.
Re: Bayes auto-learn - not happening - tentative success
Aug 10, 2017; 10:15pm Scottonline Scott Re: Bayes auto-learn - not happening, tentative success Well, here's a development... About the only difference in my old, functioning box and this new "clean" install was the location of the bayes files. Old box: /var/spool/amavisd/.spamassassin/ New box: /etc/mail/bayes The other details that caught my attention were that on the old box, the ONLY bayes thing that was explicitly set was the path, which was: bayes_path /home/amavis/.spamassassin/bayes On this (new) box, I commented out most of the bayes settings similarly. Restarted amavis/SA and got some errors about no bayes db. Ignored. Sent an email anyway. Guess what? autolearn=ham right out of the gate, it and created the database files. And the very next message received was also shown as autolearn=ham. ** that eliminates any doubt that a minimum corpus is necessary to autolearn (whether that's a good practice is a different topic). I cleared the db's with sa-learn --clear. I received a new message, autolearn=ham, again. Result: [root@mail2 .spamassassin]# su amavis -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0 0 0 non-token data: nspam 0.000 0 1 0 non-token data: nham [root@mail2 root]# ll /var/spool/amavisd/.spamassassin total 28 -rw-rw-rw- 1 amavis amavis 12288 Aug 10 21:20 bayes_seen -rw-rw-rw- 1 amavis amavis 12288 Aug 10 21:20 bayes_toks -rw-r--r-- 1 amavis amavis 1869 Aug 8 15:32 user_prefs I gradually restored all the bayes settings in local.cf, restarting amavis each time, clearing the db and sending that same test message. Each time the result was autolearn=ham Finally to the path setting: I tried setting the default path, and changing the path filename suffix of bayes to mybayes for curiosity... bayes_path /var/spool/amavisd/.spamassassin/mybayes Upon sending a test message, SA promptly auto-learned as ham and created the 2 new files starting with "mybayes" instead of bayes. So changing the filename didn't hurt autolearn Next I kept the new name and changed the folder back to where it was: /etc/mail/bayes/mybayes Send a test message This time, NO new file was created. It apparently cannot write to it. Permissions on mail, and on bayes are both amavis:amavis & 777. autolearn=unavailable (As Matus expected) Next, I log on as amavis. cd to /etc/mail/bayes. Create a file, edit it, and then delete it. User amavis CAN write there. I verify I'm amavis, Try to cd to some other user's folder, get "permission denied", check. Finally, because I don't like the hidden directory anyway, I try to move the bayes folder from the default. I configure the bayes path to: bayes_path /var/spool/amavisd/bayes/bayes Send my test message, voila, db files created, and autolearn=ham Success! (tentative, cautiously optimistic) I hope I have solved the mystery. For reasons beyond my skill set, SA will not auto-learn to a bayes db in a folder in /etc/mail/bayes. Regardless of wide open permissions on everything except /etc. And the user's confirmed ability to write to the folder. Bug I guess? SHIT! What a PITA to figure out. I'm gonna let this cook overnight and see how it does. Will report back. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138268.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening, tentative success....
Well, here's a development... About the only difference in my old, functioning box and this new "clean" install was the location of the bayes files. Old box: /var/spool/amavisd/.spamassassin/ New box: /etc/mail/bayes The other details that caught my attention were that on the old box, the ONLY bayes thing that was explicitly set was the path, which was: bayes_path /home/amavis/.spamassassin/bayes On this (new) box, I commented out most of the bayes settings similarly. Restarted amavis/SA and got some errors about no bayes db. Ignored. Sent an email anyway. Guess what? autolearn=ham right out of the gate, it and created the database files. And the very next message received was also shown as autolearn=ham. ** that eliminates any doubt that a minimum corpus is necessary to autolearn (whether that's a good practice is a different topic). I cleared the db's with sa-learn --clear. I received a new message, autolearn=ham, again. Result: [root@mail2 .spamassassin]# su amavis -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0 0 0 non-token data: nspam 0.000 0 1 0 non-token data: nham [root@mail2 root]# ll /var/spool/amavisd/.spamassassin total 28 -rw-rw-rw- 1 amavis amavis 12288 Aug 10 21:20 bayes_seen -rw-rw-rw- 1 amavis amavis 12288 Aug 10 21:20 bayes_toks -rw-r--r-- 1 amavis amavis 1869 Aug 8 15:32 user_prefs I gradually restored all the bayes settings in local.cf, restarting amavis each time, clearing the db and sending that same test message. Each time the result was autolearn=ham Finally to the path setting: I tried setting the default path, and changing the path filename suffix of bayes to mybayes for curiosity... bayes_path /var/spool/amavisd/.spamassassin/mybayes Upon sending a test message, SA promptly auto-learned as ham and created the 2 new files starting with "mybayes" instead of bayes. So changing the filename didn't hurt autolearn Next I kept the new name and changed the folder back to where it was: /etc/mail/bayes/mybayes Send a test message This time, NO new file was created. It apparently cannot write to it. Permissions on mail, and on bayes are both amavis:amavis & 777. autolearn=unavailable (As Matus expected) Next, I log on as amavis. cd to /etc/mail/bayes. Create a file, edit it, and then delete it. User amavis CAN write there. I verify I'm amavis, Try to cd to some other user's folder, get "permission denied", check. Finally, because I don't like the hidden directory anyway, I try to move the bayes folder from the default. I configure the bayes path to: bayes_path /var/spool/amavisd/bayes/bayes Send my test message, voila, db files created, and autolearn=ham Success! (tentative, cautiously optimistic) I hope I have solved the mystery. For reasons beyond my skill set, SA will not auto-learn to a bayes db in a folder in /etc/mail/bayes. Regardless of wide open permissions on everything except /etc. And the user's confirmed ability to write to the folder. Bug I guess? SHIT! What a PITA to figure out. I'm gonna let this cook overnight and see how it does. Will report back. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138267.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
Trying to check for any locking issues I ran sa-update in debug moed su amavis -c 'sa-learn -D --spam --showdots --mbox /home/mail/onespam' Appears to be creating and dropping lock files. Nothing left over after running.. Aug 10 16:48:39.109 [7524] dbg: bayes: expiry starting Aug 10 16:48:39.110 [7524] dbg: locker: mode is 438 Aug 10 16:48:39.110 [7524] dbg: locker: safe_lock: created /etc/mail/bayes/bayes.lock.mail2.myserver.com.7524 Aug 10 16:48:39.110 [7524] dbg: locker: safe_lock: trying to get lock on /etc/mail/bayes/bayes with 0 retries Aug 10 16:48:39.110 [7524] dbg: locker: safe_lock: link to /etc/mail/bayes/bayes.lock: link ok Aug 10 16:48:39.110 [7524] dbg: bayes: tie-ing to DB file R/W /etc/mail/bayes/bayes_toks Aug 10 16:48:39.110 [7524] dbg: bayes: tie-ing to DB file R/W /etc/mail/bayes/bayes_seen Aug 10 16:48:39.111 [7524] dbg: bayes: found bayes db version 3 Aug 10 16:48:39.111 [7524] dbg: locker: refresh_lock: refresh /etc/mail/bayes/bayes.lock Aug 10 16:48:39.111 [7524] dbg: bayes: expiry completed Aug 10 16:48:39.111 [7524] dbg: archive-iterator: _set_default_message_selection_opts After: Scanprob[1], want_date[0], cache[0], from_regex[^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)] Aug 10 16:48:39.115 [7524] dbg: archive-iterator: _run_mailbox /home/mail/onespam, ofs 0, limit 262144 Aug 10 16:48:39.118 [7524] info: archive-iterator: skipping large message: 1277 lines, 262241 bytes, limit 262144 bytes Learned tokens from 0 message(s) (0 message(s) examined) Aug 10 16:48:39.118 [7524] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x29bb798) implements 'learner_close', priority 0 Aug 10 16:48:39.118 [7524] dbg: bayes: untie-ing Aug 10 16:48:39.119 [7524] dbg: bayes: files locked, now unlocking lock Aug 10 16:48:39.119 [7524] dbg: locker: safe_unlock: unlink /etc/mail/bayes/bayes.lock -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138266.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
Scouring the differences between this and my old server I see this: Old server: -rw--- 1 amavis amavis 83472 Aug 10 15:51 bayes_journal -rw--- 1 amavis amavis 1986 Aug 10 15:51 bayes.mutex -rw--- 1 amavis amavis 328491008 Aug 10 15:51 bayes_seen -rw--- 1 amavis amavis 5443584 Aug 10 15:51 bayes_toks I gathered the journal may very well not always be there and maybe that's OK. But from what I could tell googling the bayes.mutex file is a lock file: (for others: http://lists.mailscanner.info/pipermail/mailscanner/2004-November/043067.html) Is missing IT a problem? Is this a hint? (fingers crossed) -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138264.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
> Imho You need 100 ham and 100 spam to auto learning working. Do manual learning See earlier post today. I've got it loaded up, right?: [root@tn2 mail]# su amavis -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0349 0 non-token data: nspam 0.000 0478 0 non-token data: nham 0.000 0 166030 0 non-token data: ntokens -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138263.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
Imho You need 100 ham and 100 spam to auto learning working. Do manual learning 08.08.2017 8:20 PM "Scott Techlist" <techlis...@msws.org> napisał(a): > Centos7 > Postfix 3.2.2 > Amavisd-new 2.11.0 > Spamassassin 3.4.0 > Site-wide configuration > > This is a new box and I've configured some conservative values for > auto-learn. I've enabled it properly AFAIK, but I can't see any sign of it > working. > > I have these set in local.cf > use_bayes 1 > bayes_auto_learn1 > bayes_auto_learn_threshold_nonspam -1.7 > bayes_auto_learn_threshold_spam 10.0 > # this is a filename prefix, not a directory per se > bayes_path /etc/mail/bayes/bayes > bayes_file_mode 0666 > > -bayes prep > Start fresh for troubleshooting: > su amavis -c 'sa-learn --clear' > > Add one spam manually and check tokens: > > [root@tn2 mail]# su amavis -c 'sa-learn --dump magic' > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 1 0 non-token data: nspam > 0.000 0 0 0 non-token data: nham > 0.000 0 2157 0 non-token data: ntokens > > -amavisd prep > > Restart amavisd/spamassassin just to be sure all configs read.. > > --- ready to process - > > The next high scoring spam arrives, it was sent to my spam mailbox. It > did NOT autolearn. Nor did several others. > > To troubleshoot, I took one that did not autolearn, and learned it > manually by: > su amavis -c 'sa-learn -D --spam --showdots --mbox /home/mail/onespam > > even though this message was slightly over the threshold, the log says it > learned anyway: > -D log snippet: > - > Aug 8 12:37:27.216 [13198] info: archive-iterator: skipping large > message: 858 lines, 262203 bytes, limit 262144 bytes > > Learned tokens from 1 message(s) (1 message(s) examined) > - > > Verified it learned: > > [root@tn2 mail]# su amavis -c 'sa-learn --dump magic' > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 2 0 non-token data: nspam > > > Partial header from that message: > > X-Spam-Flag: YES > X-Spam-Score: 17.374 > X-Spam-Level: * > X-Spam-Status: Yes, score=17.374 tag=- tag2=5 kill=6.31 > tests=[RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_DNSWL_NONE=-0.0001, > RCVD_IN_RP_RNBL=1.284, RCVD_IN_SBL_CSS=3.558, > RCVD_IN_SORBS_WEB=1.5, > RP_MATCHES_RCVD=-0.001, SUSPICIOUS_RECIPS=2.497, > URIBL_ABUSE_SURBL=1.948, URIBL_BLACK=1.7, URIBL_DBL_SPAM=2.5, > URIBL_SBL=0.644, URIBL_SBL_A=0.1] autolearn=no autolearn_force=no > > Why aren't my spams getting auto-learned? If sa-learn "ate" it, shouldn't > auto-learn too? > > I know there is a default 200 threshold before Bayes starts tagging > anything, but I understand it should learn without issue. > > Can't figure out what's wrong... > > > > > > > > > > > > > >
Re: Bayes auto-learn - not happening
OK, so I don't think auto-learn works on spam. What about HAM? I've raised the floor to auto-learn HAM to 1. Before anyone gives me any grief, it's just for testing. I'll rebuild the bayes db from a corpus when I get it working. So SPAM takes the 3-way patch, 3 from the header, 3 from the body. but what about HAM. Since the default for autolearn is much lower than 6, I presume this same limitation does not apply. So hams should be free to auto-learn with any score that is below my threshold (1). Like this one, right? (assuming it is not already learned): Aug 10 14:21:46 mail2 amavis[3231]: (03231-06) Passed CLEAN {RelayedInbound}, [168.100.1.7]:43757 [173.167.109.218] ESMTP/LMTP <owner-postfix-us...@postfix.org> -> <techlis...@myvirt.com>, (ESMTPS://[168.100.1.7]:43757 < ESMTPS://173.203.187.85 < 173.167.109.218), Queue-ID: 5EE363BF5, Message-ID: <018901d3120d$c8473120$58d59360$@mefox.org>, mail_id: JkAvl418yTui, b: ZbU4iXvCD, Hits: -5.799, size: 5533, queued_as: BD81F3EE7, Subject: "RE: reloading postfix with systemd", From: <n...@mefox.org>, X-Mailer: Microsoft_Outlook_16.0, helo=english-breakfast.cloud9.net, Tests: [AM.WBL=-3,BAYES_05=-0.5,HEADER_FROM_DIFFERENT_DOMAINS=0.001,RCVD_IN_DNSWL_MED=-2.3], autolearn=unavailable autolearn_force=no, autolearnscore=-2.299, 4376 ms Now this sender and a similar message would have been in my my corpus so I don't expect IT to learn, but I'd expect others to. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138261.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
OK, so I don't think auto-learn works on spam. What about HAM? I've raised the floor to auto-learn HAM to 1. Before anyone gives me any grief, it's just for testing. I'll rebuild the bayes db from a corpus when I get it working. So SPAM takes the 3-way patch, 3 from the header, 3 from the body. but what about HAM. Since the default for autolearn is much lower than 6, I presume this same limitation does not apply. So hams should be free to auto-learn with any score that is below my threshold (1). Like this one, right? (assuming it is not already learned): Aug 10 14:21:46 tn2 amavis[3231]: (03231-06) Passed CLEAN {RelayedInbound}, [168.100.1.7]:43757 [173.167.109.218] ESMTP/LMTP <owner-postfix-us...@postfix.org> -> <techlis...@myvirt.com>, (ESMTPS://[168.100.1.7]:43757 < ESMTPS://173.203.187.85 < 173.167.109.218), Queue-ID: 5EE363BF5, Message-ID: <018901d3120d$c8473120$58d59360$@mefox.org>, mail_id: JkAvl418yTui, b: ZbU4iXvCD, Hits: -5.799, size: 5533, queued_as: BD81F3EE7, Subject: "RE: reloading postfix with systemd", From: <n...@mefox.org>, X-Mailer: Microsoft_Outlook_16.0, helo=english-breakfast.cloud9.net, Tests: [AM.WBL=-3,BAYES_05=-0.5,HEADER_FROM_DIFFERENT_DOMAINS=0.001,RCVD_IN_DNSWL_MED=-2.3], autolearn=unavailable autolearn_force=no, autolearnscore=-2.299, 4376 ms Now this sender and a similar message would have been in my my corpus so I don't expect IT to learn, but I'd expect others to. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138260.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: Bayes auto-learn - not happening
>surely, it makes no sense blow up the database with already 100% >classified samples - you even don't do that uncnditional with a >hand-trained database (at least not forever, at the begin it makes sense >to get additional tokens) I think you misunderstood my question. I meant that as I look at messages to see if I think they should have learned, or not, would one that shows 90-100 spam likely per bayes likely be one that is being skipped due to already being recognized. I am not asking why isn't it learning one like that. Or maybe I misunderstood your answer. >but train every single message which is already classified as expected >would leat to a lot of useless load. blows up the database and makes >bayes-poisioning and the need to purge the whole database and start from >scratch (with thanks to autotraining no available corpus) then >autolearning on it's down does Agree. And I understand that is not how it is designed. >the question of bayes-poisioning is not "if", it's "when and how often" >and hence after 10 years expierience i stopped that nonsense and keep a >currently 12 messages large corpus of eml-files (HAM AND SPAM) Not arguing the pros and cons of IF one should use it. I only want to make it work, or better said, verify that it IS working. Then I can decide if I want to keep using it. Right now, I've never seen it work. Thus my strong suspicion that is is not working. One thing for sure, it hasn't found a single spam or ham to auto-learn, yet. Which seems unlikely if it were functioning properly. The output of "unavailable" is too ambiguous for me to devise a way to troubleshoot. But I'm not an expert with SA. Thus the plea for assistance in seeing if it is working. If auto-learn isn't working, my expectation is that auto-anything-else isn't working either. Journal maint, etc.
RE: Bayes auto-learn - not happening
>surely, it makes no sense blow up the database with already 100% >classified samples - you even don't do that uncnditional with a >hand-trained database (at least not forever, at the begin it makes sense >to get additional tokens) I think you misunderstood my question. I meant that as I look at messages to see if I think they should have learned, or not, would one that shows 90-100 spam likely per bayes likely be one that is being skipped due to already being recognized. I am not asking why isn't it learning one like that. Or maybe I misunderstood your answer. >but train every single message which is already classified as expected >would leat to a lot of useless load. blows up the database and makes >bayes-poisioning and the need to purge the whole database and start from >scratch (with thanks to autotraining no available corpus) then >autolearning on it's down does Agree. And I understand that is not how it is designed. >the question of bayes-poisioning is not "if", it's "when and how often" >and hence after 10 years expierience i stopped that nonsense and keep a >currently 12 messages large corpus of eml-files (HAM AND SPAM) Not arguing the pros and cons of IF one should use it. I only want to make it work, or better said, verify that it IS working. Then I can decide if I want to keep using it. Right now, I've never seen it work. Thus my strong suspicion that is is not working. One thing for sure, it hasn't found a single spam or ham to auto-learn, yet. Which seems unlikely if it were functioning properly. The output of "unavailable" is too ambiguous for me to devise a way to troubleshoot. But I'm not an expert with SA. Thus the plea for assistance in seeing if it is working. If auto-learn isn't working, my expectation is that auto-anything-else isn't working either. Journal maint, etc.
Re: Bayes auto-learn - not happening
If any particular message has a * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% * [score: 1.] Is it safe to assume that spam or one close to it has been learned and so it would not be a candidate for auto-learn? Maybe I'm not being patient enough. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138255.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
FYI, here's the verbose headers for that same one that flowed above: X-Spam-Flag: YES X-Spam-Score: 23.904 X-Spam-Level: *** X-Spam-Status: Yes, score=23.904 tag=- tag2=5 kill=6.4 tests=[BAYES_999=0.2, BAYES_99=3.5, DCC_CHECK=3.2, DIGEST_MULTIPLE=0.293, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.377, MIME_HTML_ONLY=0.723, MISSING_MID=0.497, NORMAL_HTTP_TO_IP=0.001, OBFUSCATING_COMMENT=0.723, RAZOR2_CF_RANGE_51_100=0.5, RAZOR2_CF_RANGE_E8_51_100=1.886, RAZOR2_CHECK=2.5, RCVD_IN_BRBL_LASTEXT=1.449, RDNS_NONE=0.793, SPF_HELO_SOFTFAIL=3, SPF_SOFTFAIL=3, T_HTML_TAG_BALANCE_CENTER=0.01, URIBL_ABUSE_SURBL=1.25] autolearn=unavailable autolearn_force=no X-Spam-Report: * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% * [score: 1.] * 1.4 RCVD_IN_BRBL_LASTEXT RBL: No description available. * [208.110.82.116 listed in bb.barracudacentral.org] * 1.2 URIBL_ABUSE_SURBL Contains an URL listed in the ABUSE SURBL * blocklist * [URIs: 154.16.37.73] * 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail * domains are different * 3.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) * 3.0 SPF_HELO_SOFTFAIL SPF: HELO does not match SPF record (softfail) * 0.0 NORMAL_HTTP_TO_IP URI: URI host has a public dotted-decimal IPv4 * address * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% * [score: 1.] * 0.0 HTML_MESSAGE BODY: HTML included in message * 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts * 3.2 DCC_CHECK Detected as bulk mail by DCC (dcc-servers.net) * 2.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) * 1.9 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level * above 50% * [cf: 100] * 0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% * [cf: 100] * 0.3 DIGEST_MULTIPLE Message hits more than one network digest check * 0.4 HTML_MIME_NO_HTML_TAG HTML-only message, but there is no HTML tag * 0.5 MISSING_MID Missing Message-Id: header * 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS * 0.7 OBFUSCATING_COMMENT HTML comments which obfuscate text * 0.0 T_HTML_TAG_BALANCE_CENTER Malformatted HTML X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138254.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
Here is a debug log for one that just flowed. I don't see anything about why auto-learn was unavailable. But it shows it's talking to the db anyway I think. Is there a way to set auto_learn_force to yes? The log format makes one thing it's a global setting but all I can find it looks like a per-rule setting. Be easier to troubleshoot if I could relax it to looking to any 6 points instead of 3/3. Aug 10 11:03:39 mail2 amavis[377]: (00377-01) LMTP :10024 /var/spool/amavisd/tmp/amavis-20170810T110339-00377-JQiRqEtF: <cont...@qq.com> -> <myu...@myvirt.com> SIZE=175613 BODY=8BITMIME ENVID=671416;675610;322132;sachin2 Received: from tn2.myserver.com ([127.0.0.1]) by localhost (tn2.myserver.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP for <myu...@myvirt.com>; Thu, 10 Aug 2017 11:03:39 -0500 (CDT) Aug 10 11:03:39 mail2 postfix/smtpd[450]: disconnect from unknown[208.110.82.116] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) Checking: KmK8jCUCqcuq [208.110.82.116] <cont...@qq.com> -> <myu...@myvirt.com> Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: tie-ing to DB file R/O /etc/mail/bayes/bayes_toks Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: tie-ing to DB file R/O /etc/mail/bayes/bayes_seen Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: found bayes db version 3 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: DB journal sync: last sync: 0 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: corpus size: nspam = 349, nham = 478 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *p = "U*contact D*qq.com D*com" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for X-Amavis-PolicyBank = "" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for X-Amavis-MessageSize = "174380" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for MIME-Version = "" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *F = "U*myuser D*myvirt.com D*org U*yt5r4e3 D*qq.com D*com" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for To = "U*myuser D*myvirt.com D*org" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *c = "/html;" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for x-spam-relays-external = " [ ip=208.110.82.116 rdns= helo=qq.com by=tn2.myserver.com ident= envfrom= intl=0 id=1B93B3F5A auth= msa=0 ] [ ip=127.0.0.1 rdns=localhost helo=localhost by=qq.com ident= envfrom=cont...@qq.com intl=0 id=hhi1tk16lt0l auth= msa=0 ]" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for x-spam-relays-internal = " " Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *RT = " " Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *RU = " [ ip=208.110.82.116 rdns= helo=qq.com by=tn2.myserver.com ident= envfrom= intl=0 id=1B93B3F5A auth= msa=0 ] [ ip=127.0.0.1 rdns=localhost helo=localhost by=qq.com ident= envfrom=cont...@qq.com intl=0 id=hhi1tk16lt0l auth= msa=0 ]" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *r = " localhost (127.0.0 ip*127.0.0.1 ) by qq.com <myu...@myvirt.com>; envelope- <cont...@qq.com>)" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *r = " localhost (127.0.0 ip*127.0.0.1 ) by qq.com <myu...@myvirt.com>; envelope- <cont...@qq.com>) qq.com (unknown [208.110.82 ip*208.110.82.116 ]) by tn2.myserver.com (Postfix) <myu...@myvirt.com>; " Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token '4001' => 0.999898854265489 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token 'H*F:U*myuser' => 0.999772898574472 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token 'corresponde' => 0.998847880299252 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token 'newcastle' => 0.998847880299252 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: score = 1 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: DB expiry: tokens in DB: 166030, Expiry max size: 15, Oldest atime: 1501594564, Newest atime: 1502289189, Last expire: 1502304550, Current time: 1502381019 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: opportunistic call found expiry due Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: bayes journal sync starting Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: bayes journal sync completed Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: expiry starting Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: expiry completed Aug 10 11:03:39
RE: Bayes auto-learn - not happening
Here's a verbose log of amavis/spamassassin processing another high score that just came through. I don't see a peep about auto-learn. But it was unavailable too. (posting via nabble, apologies if it wraps) Aug 10 11:03:39 mail2 amavis[377]: (00377-01) LMTP :10024 /var/spool/amavisd/tmp/amavis-20170810T110339-00377-JQiRqEtF: <cont...@qq.com> -> <shor...@myvirt.com> SIZE=175613 BODY=8BITMIME ENVID=671416;675610;322132;sachin2 Received: from tn2.companypostoffice.com ([127.0.0.1]) by localhost (tn2.companypostoffice.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP for <shor...@myvirt.com>; Thu, 10 Aug 2017 11:03:39 -0500 (CDT) Aug 10 11:03:39 mail2 postfix/smtpd[450]: disconnect from unknown[208.110.82.116] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) Checking: KmK8jCUCqcuq [208.110.82.116] <cont...@qq.com> -> <shor...@myvirt.com> Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: tie-ing to DB file R/O /etc/mail/bayes/bayes_toks Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: tie-ing to DB file R/O /etc/mail/bayes/bayes_seen Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: found bayes db version 3 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: DB journal sync: last sync: 0 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: corpus size: nspam = 349, nham = 478 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *p = "U*contact D*qq.com D*com" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for X-Amavis-PolicyBank = "" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for X-Amavis-MessageSize = "174380" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for MIME-Version = "" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *F = "U*shorton D*myvirt.com D*org U*yt5r4e3 D*qq.com D*com" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for To = "U*shorton D*myvirt.com D*org" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *c = "/html;" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for x-spam-relays-external = " [ ip=208.110.82.116 rdns= helo=qq.com by=tn2.companypostoffice.com ident= envfrom= intl=0 id=1B93B3F5A auth= msa=0 ] [ ip=127.0.0.1 rdns=localhost helo=localhost by=qq.com ident= envfrom=cont...@qq.com intl=0 id=hhi1tk16lt0l auth= msa=0 ]" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for x-spam-relays-internal = " " Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *RT = " " Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *RU = " [ ip=208.110.82.116 rdns= helo=qq.com by=tn2.companypostoffice.com ident= envfrom= intl=0 id=1B93B3F5A auth= msa=0 ] [ ip=127.0.0.1 rdns=localhost helo=localhost by=qq.com ident= envfrom=cont...@qq.com intl=0 id=hhi1tk16lt0l auth= msa=0 ]" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *r = " localhost (127.0.0 ip*127.0.0.1 ) by qq.com <shor...@myvirt.com>; envelope- <cont...@qq.com>)" Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens for *r = " localhost (127.0.0 ip*127.0.0.1 ) by qq.com <shor...@myvirt.com>; envelope- <cont...@qq.com>) qq.com (unknown [208.110.82 ip*208.110.82.116 ]) by tn2.companypostoffice.com (Postfix) <shor...@myvirt.com>; " Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token '4001' => 0.999898854265489 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token 'H*F:U*shorton' => 0.999772898574472 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token 'corresponde' => 0.998847880299252 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token 'newcastle' => 0.998847880299252 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: score = 1 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: DB expiry: tokens in DB: 166030, Expiry max size: 15, Oldest atime: 1501594564, Newest atime: 1502289189, Last expire: 1502304550, Current time: 1502381019 Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: opportunistic call found expiry due Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: bayes journal sync starting Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: bayes journal sync completed Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: expiry starting Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: expiry completed Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: untie-ing Then Aug 10 11:03:44 mail2 amavis[377]: (00377-01) KmK8jCUCqcuq(KmK8jCUCqcuq) SEN
Re: Bayes auto-learn - not happening
>why this? >When you run from amavisd, you only need permission for amavis user, not for >anyone. To be sure that is not the problem. I can tighten it up once working. I understand thisis what one woudl normally use if they had a multi-user enviroment. But it can't hurt the problem for testing, right? > Is /etc/mail/bayes writeable by amavisd? Yes, from "3b" in my lists above: /etc/mail/bayes is wide open right now. [root@mail2 amavisd]# ls -la /etc/mail/bayes total 4196 drwxrwxrwx 2 amavis amavis4096 Aug 9 13:49 . drwxr-xr-x 4 amavis amavis4096 Aug 3 13:02 .. -rwxrwxrwx 1 amavis amavis 86016 Aug 9 09:51 bayes_seen -rwxrwxrwx 1 amavis amavis 5246976 Aug 9 13:49 bayes_toks -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138251.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
On 08/10/2017 10:06 AM, techlist06 wrote: Update: Still NOT working, but I'm giving it hell trying to figure out why :) First a couple of answers to other's questions: - John, others, not an ISP, high is relative I'm sure but the volume is much higher than I can duplicate and review every flagged message. Right now running at about 10% before I migrate one of my larger domains. Mail is relayed to exchange servers. Users do not have imap accounts on box. A few local users with POP only. I don't configure or allow anyone to submit messages for training directly. - re no, or careful auto-training. I get it. I'm migrating from a server that's run for years with auto-learn on set at conservative learn values. Never had any trouble with it thank goodness. As I look at the messages that would be autolearned, I've never found one that would have learned that should not have in my corpus. The volume would just be too high to personally go through each one of them myself. I have had "problem" users that get a lot of spam misses and I plan to set up a way for them to submit their spam to me (not autolearn) for review and manual training as needed. - Matus: re:" autolearn=unavailable apparently due to not accessible bayes database [due to permissions]". I hope you are right. That would make sense to me. See below please. I think I listed them all. Config and permissions look good to me, I'm grateful to have anything I missed pointed out by an experienced eye. My old server, running embarrassingly old versions of everything works great. So the auto-learn in general has been a good fit for my environment. I get it that it's not for everyone. But a tleast it SHOULD work, and let me choose to tweak it or turn it off. As far as I can tell it is not working, at all. So here's where I am: 1. I stepped back and went through all my configurations carefully. spamassassin is being run via amavisd, as the amavis user. Site wide config, no other users have direct access. POP accounts and relay accounts only. 2. From prior research before asking for help, I understood no spam was necessary for auto-learn to work but one person here said I had to be at the minimum (200 default) before it would. So, to rule that out as the issue, I manually fed it plenty of spam and ham. For others who might read this thread archived, I was having trouble getting enough learned due to the default size limit my version of SA/sa-learn had. With some digging I found out how to raise that limit and then I had plenty of spam to feed: su amavis -c 'sa-learn -D --spam --showdots --max-size=100 --mbox /home/mail/spam' [root@mail2 amavisd]# su amavis -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0349 0 non-token data: nspam 0.000 0478 0 non-token data: nham 0.000 0 166030 0 non-token data: ntokens 0.000 0 1501594564 0 non-token data: oldest atime 0.000 0 1502289189 0 non-token data: newest atime 3. Next up were questions about the config and permissions. I checked my setup, it looked OK, but I even opened some directories up 777 for testing This is my config, I'd be grateful if anyone sees anything wrong point it out: I include the amavis stuff just to show it is running and invoked as and by amavis user 3a. amavis in /usr/lib/systemd/system/amavisd.service User=amavis Group=amavis ExecStart=/usr/sbin/amavisd -c /etc/amavisd/amavisd.conf amavis user's home dir per /etc/passwd is: /var/spool/amavisd verified with cd ~amavis 3b. local.cf My spamassassin local.cf is at: /etc/mail/spamassassin/local.cf verified this is the one being used by putting an error line and restarting amavisd. It compalins about the error. Fixed of cousre and continue... in local.cf I have these related settings: use_bayes 1 bayes_auto_learn1 bayes_auto_learn_threshold_nonspam -1.7 bayes_auto_learn_threshold_spam 10.0 bayes_path /etc/mail/bayes/bayes bayes_file_mode 0777 3c. bayes for troubleshooting I set the permissions to 777 on /etc/mail/bayes and it's files This is the only occurrence of the "bayes" files on the server [root@mail2 amavisd]# ls -la /etc/mail/bayes total 4196 drwxrwxrwx 2 amavis amavis4096 Aug 9 13:49 . drwxr-xr-x 4 amavis amavis4096 Aug 3 13:02 .. -rwxrwxrwx 1 amavis amavis 86016 Aug 9 09:51 bayes_seen -rwxrwxrwx 1 amavis amavis 5246976 Aug 9 13:49 bayes_toks 3d. amavis spamassassin folder settings For amavis which is calling spamassassin via it's perl libraries (I am not running spamd), I have it's related configuration parts as: $MYHOME = '/var/spool/amavisd'; # a convenient default for other settings, -H $TEMPBASE = "$MYHOME/tmp"; # working directory, needs to exist, -T $ENV{TMPDIR} = $TEMPBASE
Re: Bayes auto-learn - not happening
On 10.08.17 10:06, techlist06 wrote: Update: Still NOT working, but I'm giving it hell trying to figure out why :) - Matus: re:" autolearn=unavailable apparently due to not accessible bayes database [due to permissions]". I hope you are right. That would make sense to me. See below please. I think I listed them all. Config and permissions look good to me, I'm grateful to have anything I missed pointed out by an experienced eye. here is it: bayes_path /etc/mail/bayes/bayes bayes_file_mode 0777 why this? When you run from amavisd, you only need permission for amavis user, not for anyone. Is /etc/mail/bayes writeable by amavisd? -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. The early bird may get the worm, but the second mouse gets the cheese.
RE: Bayes auto-learn - not happening
Update: Still NOT working, but I'm giving it hell trying to figure out why :) First a couple of answers to other's questions: - John, others, not an ISP, high is relative I'm sure but the volume is much higher than I can duplicate and review every flagged message. Right now running at about 10% before I migrate one of my larger domains. Mail is relayed to exchange servers. Users do not have imap accounts on box. A few local users with POP only. I don't configure or allow anyone to submit messages for training directly. - re no, or careful auto-training. I get it. I'm migrating from a server that's run for years with auto-learn on set at conservative learn values. Never had any trouble with it thank goodness. As I look at the messages that would be autolearned, I've never found one that would have learned that should not have in my corpus. The volume would just be too high to personally go through each one of them myself. I have had "problem" users that get a lot of spam misses and I plan to set up a way for them to submit their spam to me (not autolearn) for review and manual training as needed. - Matus: re:" autolearn=unavailable apparently due to not accessible bayes database [due to permissions]". I hope you are right. That would make sense to me. See below please. I think I listed them all. Config and permissions look good to me, I'm grateful to have anything I missed pointed out by an experienced eye. My old server, running embarrassingly old versions of everything works great. So the auto-learn in general has been a good fit for my environment. I get it that it's not for everyone. But a tleast it SHOULD work, and let me choose to tweak it or turn it off. As far as I can tell it is not working, at all. So here's where I am: 1. I stepped back and went through all my configurations carefully. spamassassin is being run via amavisd, as the amavis user. Site wide config, no other users have direct access. POP accounts and relay accounts only. 2. From prior research before asking for help, I understood no spam was necessary for auto-learn to work but one person here said I had to be at the minimum (200 default) before it would. So, to rule that out as the issue, I manually fed it plenty of spam and ham. For others who might read this thread archived, I was having trouble getting enough learned due to the default size limit my version of SA/sa-learn had. With some digging I found out how to raise that limit and then I had plenty of spam to feed: su amavis -c 'sa-learn -D --spam --showdots --max-size=100 --mbox /home/mail/spam' [root@mail2 amavisd]# su amavis -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0349 0 non-token data: nspam 0.000 0478 0 non-token data: nham 0.000 0 166030 0 non-token data: ntokens 0.000 0 1501594564 0 non-token data: oldest atime 0.000 0 1502289189 0 non-token data: newest atime 3. Next up were questions about the config and permissions. I checked my setup, it looked OK, but I even opened some directories up 777 for testing This is my config, I'd be grateful if anyone sees anything wrong point it out: I include the amavis stuff just to show it is running and invoked as and by amavis user 3a. amavis in /usr/lib/systemd/system/amavisd.service User=amavis Group=amavis ExecStart=/usr/sbin/amavisd -c /etc/amavisd/amavisd.conf > amavis user's home dir per /etc/passwd is: /var/spool/amavisd verified with cd ~amavis 3b. local.cf > My spamassassin local.cf is at: /etc/mail/spamassassin/local.cf > verified this is the one being used by putting an error > line and restarting amavisd. It compalins about the error. > Fixed of cousre and continue... > in local.cf I have these related settings: use_bayes 1 bayes_auto_learn1 bayes_auto_learn_threshold_nonspam -1.7 bayes_auto_learn_threshold_spam 10.0 bayes_path /etc/mail/bayes/bayes bayes_file_mode 0777 3c. bayes > for troubleshooting I set the permissions to 777 on /etc/mail/bayes and it's > files > This is the only occurrence of the "bayes" files on the server [root@mail2 amavisd]# ls -la /etc/mail/bayes total 4196 drwxrwxrwx 2 amavis amavis4096 Aug 9 13:49 . drwxr-xr-x 4 amavis amavis4096 Aug 3 13:02 .. -rwxrwxrwx 1 amavis amavis 86016 Aug 9 09:51 bayes_seen -rwxrwxrwx 1 amavis amavis 5246976 Aug 9 13:49 bayes_toks 3d. amavis spamassassin folder settings > For amavis which is calling spamassassin via it's > perl libraries (I am not running spamd), > I have it's related configuration parts as: $MYHOME = '/var/spool/amavisd'; # a convenient default for other settings, -H $TEMPBASE = "$MYHOME/tmp"; # working directory, needs to exist, -T $ENV{TMPD
Re: Bayes auto-learn - not happening
On 08/08/2017 08:02 PM, Ian Zimmerman wrote: On 2017-08-08 15:20, Scott wrote: Another new one big score, auto-learn disabled. This one is fairly small. X-Spam-Status: Yes, score=29.428 tag=- tag2=5 kill=6.4 tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2, DIGEST_MULTIPLE=0.001, FILL_THIS_FORM=0.001, FROM_MISSPACED=0.001, FROM_MISSP_SPF_FAIL=1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HEXHASH_WORD=1, HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14, NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365, RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5, RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274, SPF_FAIL=4, SPF_HELO_FAIL=4, STYLE_GIBBERISH=3.093, T_HTML_TAG_BALANCE_CENTER=0.01, URIBL_ABUSE_SURBL=1.948, WEIRD_QUOTING=0.001] autolearn=unavailable autolearn_force=no Can you tell if this one has the 3 point match? Scott, when I tried to use the autolearn feature I was as confused as you are. As far as I remember, the 3 point each from header and body is not the only requirement; the full truth is that some rules are "privileged" and can contribute to autolearning while others cannot. I found it opaque in the extreme and essentially unpredictable, and so I stopped autolearning and hacked up some scripts that put duplicate of each ham message into a folder which is then processed by sa-learn from a cronjob, with sufficient delay that I can review the contents and remove any false negatives; and similarly with spam, excluding the utterly horrible category which just goes to /dev/null. It may not be possible for you to adopt such a process if your volume is high, but OTOH in that case you probably have users to help you :) I think this is what RW is telling you, too. FWIW, this is documented (sort of) by: perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold Same here. I had a little success with autolearn. When I started splitting out messages into a spam and ham folder and using a cron script to train explicitly, the BAYES hits became very accurate and helped with zero-hour spam which is the hardest to block. I setup an iRedmail server on a local-only subdomain and send/BCC copies of messages over to it. Then I can use simple Inbox rules to sort or discard them. Then I cron'd spam and ham training based on the Maildir "cur" folders. This requires me to do a quick scan of the unread messages. When I mark them as read, then they get sa-learn'd. Takes a few minutes a day and drastically improved the mail filtering. A side effect of this has allowed me to easily spot some new spam campaigns and messages that are scoring just below the block threshold so I can add them to local custom rules. Sometimes these are legit senders with good opt-out so I add them to a whitelist_auth entry. -- David Jones
Re: Bayes auto-learn - not happening
On 08.08.17 14:38, Scott wrote: Brand new spam arrives. It gets autolearn=unavailable. [...] su amavis -c 'sa-learn -D --spam --showdots --max-size=600 --mbox /home/mail/twospam' Aug 8 16:35:23.567 [18045] dbg: bayes: learned '419769464db0fabb0f1220f9ae0cf12931ad7076@sa_generated', atime: 1502226537 Learned tokens from 1 message(s) (1 message(s) examined) At it learned it. So autolearn=unavailable was NOT due to the token already there. autolearn=unavailable apparently due to not accessible bayes database. try running "ls -la ~amavis/.spamassassin/" - apparently permissions make the directory or files in it unwritable for amavis user. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Atheism is a non-prophet organization.
Re: Bayes auto-learn - not happening
On Tue, 8 Aug 2017, Ian Zimmerman wrote: I stopped autolearning and hacked up some scripts that put duplicate of each ham message into a folder which is then processed by sa-learn from a cronjob, with sufficient delay that I can review the contents and remove any false negatives; and similarly with spam, excluding the utterly horrible category which just goes to /dev/null. This is generally a good idea, unless you have a really high-volume environment - are you an ISP? Keeping your training corpora around lets you review it for misclassifications and retrain very easily if things go off the rails. Autolearn may be useful once you are initially manually trained. Then you can focus on manually training the FPs and FNs. It's also important to be careful what you train with. If you allow users to submit messages for training (particularly a global bayes) then you either need to have strong trust in those users' judgement, or review what they submit before training with it. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Joan Peterson is like that: you expect at least a pseudological argument, but instead you get the weird ramblings of a woman with the critical thinking abilities of an 18th century peasant. -- Ken --- 7 days until the 72nd anniversary of the end of World War II
Re: Bayes auto-learn - not happening
On 2017-08-08 15:20, Scott wrote: > Another new one big score, auto-learn disabled. This one is fairly small. > > X-Spam-Status: Yes, score=29.428 tag=- tag2=5 kill=6.4 > tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2, > DIGEST_MULTIPLE=0.001, > FILL_THIS_FORM=0.001, FROM_MISSPACED=0.001, FROM_MISSP_SPF_FAIL=1, > HEADER_FROM_DIFFERENT_DOMAINS=0.001, HEXHASH_WORD=1, > HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001, > HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14, > NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365, > RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5, > RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274, SPF_FAIL=4, > SPF_HELO_FAIL=4, STYLE_GIBBERISH=3.093, > T_HTML_TAG_BALANCE_CENTER=0.01, URIBL_ABUSE_SURBL=1.948, > WEIRD_QUOTING=0.001] autolearn=unavailable autolearn_force=no > > Can you tell if this one has the 3 point match? Scott, when I tried to use the autolearn feature I was as confused as you are. As far as I remember, the 3 point each from header and body is not the only requirement; the full truth is that some rules are "privileged" and can contribute to autolearning while others cannot. I found it opaque in the extreme and essentially unpredictable, and so I stopped autolearning and hacked up some scripts that put duplicate of each ham message into a folder which is then processed by sa-learn from a cronjob, with sufficient delay that I can review the contents and remove any false negatives; and similarly with spam, excluding the utterly horrible category which just goes to /dev/null. It may not be possible for you to adopt such a process if your volume is high, but OTOH in that case you probably have users to help you :) I think this is what RW is telling you, too. FWIW, this is documented (sort of) by: perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold -- Please don't Cc: me privately on mailing lists and Usenet, if you also post the followup to the list or newsgroup. Do obvious transformation on domain to reply privately _only_ on Usenet.
Re: Bayes auto-learn - not happening
Another new one big score, auto-learn disabled. This one is fairly small. X-Spam-Status: Yes, score=29.428 tag=- tag2=5 kill=6.4 tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2, DIGEST_MULTIPLE=0.001, FILL_THIS_FORM=0.001, FROM_MISSPACED=0.001, FROM_MISSP_SPF_FAIL=1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HEXHASH_WORD=1, HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14, NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365, RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5, RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274, SPF_FAIL=4, SPF_HELO_FAIL=4, STYLE_GIBBERISH=3.093, T_HTML_TAG_BALANCE_CENTER=0.01, URIBL_ABUSE_SURBL=1.948, WEIRD_QUOTING=0.001] autolearn=unavailable autolearn_force=no Can you tell if this one has the 3 point match? -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138085.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: Bayes auto-learn - not happening
>you need to train your bayes *by hand* to start with - how do you expect >bayes classification with no hints afetr purge the database - train 200 >ham and spam mails and *after that* look further Reindl: Thanks. I want to use some auto-training with very conservative thresholds set. All of the messages I've checked would have classified correctly via autolearn comfortably in those ranges. The 200 threshold is for USING the bayes, but not a auto-learning requirement. Or that was my clear understanding from many posts. I saw several old threads where others suggested similar but were corrected. Maybe they changed it, dunno. My concern is that auto-learn is not functioning properly. I use Amavisd that calls spamassassin and has it's own issues. Trying to make sure my system is operating properly. It appears it is not to me. No hint should be necessary for it to learn a spam. Only to use bayes to score anything. I get that. No?
Re: Bayes auto-learn - not happening
I was getting my commands missed up, been looking at this too long. When I ran su amavis -c 'spamassassin -D 2>&1 -t onespam' That caused it to LEARN the spam. Database went from not there to one learned. Auto-learn apparently. That's what it should have done when it arrived. Brand new spam arrives. It gets autolearn=unavailable. X-Spam-Status: Yes, score=20.704 tag=- tag2=5 kill=6.4 tests=[DATE_IN_PAST_06_12=1.103, DCC_CHECK=3.2, DIGEST_MULTIPLE=0.001, HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14, NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365, RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5, RDNS_NONE=1.274, SPF_HELO_SOFTFAIL=3, SPF_SOFTFAIL=3, URIBL_ABUSE_SURBL=1.948] autolearn=unavailable autolearn_force=no That implies no auto-learn because the token exists (or there was something else) as I understand it. So I try to learn that one spam again... I had to increase the size limit via: su amavis -c 'sa-learn -D --spam --showdots --max-size=600 --mbox /home/mail/twospam' Aug 8 16:35:23.567 [18045] dbg: bayes: learned '419769464db0fabb0f1220f9ae0cf12931ad7076@sa_generated', atime: 1502226537 Learned tokens from 1 message(s) (1 message(s) examined) At it learned it. So autolearn=unavailable was NOT due to the token already there. Is there a size limit built into autolearn? -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138082.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
Benny: re tflags > tflags foo-rule-name noautolearn > and you can force autolearn based on rulename > https://lists.gt.net/spamassassin/users/184996 > there is a long thread there that explain it more >and all condition must be met for learning I read the thread. Nothing there concrete enough for my to latch onto. I mean I get the gist of it, but no details on how to look at my tests and see if I have the requisite 3 parts needed. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138081.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
Cleared the database, ran below on the same message: su amavis -c 'spamassassin -D 2>&1 -t onespam' | less I didn't see any errors obvious to me. It recreated the databases and added this message as expected. I don't know how to tell why it would not have auto-learned. Can you tell/ teach me from this? Content analysis details: (17.7 points, 5.0 required) pts rule name description -- -- 1.9 URIBL_ABUSE_SURBL Contains an URL listed in the ABUSE SURBL blocklist [URIs: 145.239.41.28] 0.0 SUBJ_DOLLARS Subject starts with dollar amount 3.0 SPF_HELO_SOFTFAIL SPF: HELO does not match SPF record (softfail) 1.1 DATE_IN_PAST_03_06 Date: is 3 to 6 hours before Received: date 0.0 NORMAL_HTTP_TO_IP URI: URI host has a public dotted-decimal IPv4 address 0.0 HTML_EXTRA_CLOSE BODY: HTML contains far too many close tags 0.0 HTML_MESSAGE BODY: HTML included in message 1.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 3.2 DCC_CHECK Detected as bulk mail by DCC (dcc-servers.net) 2.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) 2.4 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level above 50% [cf: 100] 0.4 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% [cf: 100] 0.0 DIGEST_MULTIPLEMessage hits more than one network digest check 0.6 HTML_MIME_NO_HTML_TAG HTML-only message, but there is no HTML tag 0.1 MISSING_MIDMissing Message-Id: header 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS Aug 8 15:47:11.098 [17077] dbg: check: tagrun - tag DKIMDOMAIN is still blocking action 0 Aug 8 15:47:11.105 [17077] dbg: plugin: Mail::SpamAssassin::Plugin::MIMEHeader=HASH(0x2ccc328) implements 'finish_tests', priority 0 Aug 8 15:47:11.105 [17077] dbg: plugin: Mail::SpamAssassin::Plugin::Check=HASH(0x2e04e38) implements 'finish_tests', priority 0 Aug 8 15:47:11.116 [17077] dbg: netset: cache trusted_networks hits/attempts: 15/17, 88.2 % -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138078.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
Scott skrev den 2017-08-08 22:19: Does this one have the requisite 3-point match? I don't understand how to tell yet. spamassassin -D 2>&1 -t mail.msg | less should show why
Re: Bayes auto-learn - not happening
On Tue, 8 Aug 2017 13:04:16 -0700 (MST) Scott wrote: > The "3 points" criteria does not apply to manually learning No it's just a sanity check to reduce mistraining. If you can, don't use autotraining at all.
Re: Bayes auto-learn - not happening
Apologies, I meant sa-learn. Brain fart. Thanks for the clarification on the 3-point rule. I've had a bunch of them come through. They all get autolearn=no or I get a few that say "unavailable" like the sample below. I gather from trying to figure out myself that unavailable may be things already learned. Or something else whatever that may be, per the wiki. But if the database is empty, it seems that "already learned" is not the reason for "unavailable" in this case anyway. X-Spam-Status: Yes, score=20.678 tag=- tag2=5 kill=6.4 tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2, DIGEST_MULTIPLE=0.001, HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14, NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365, RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5, RDNS_NONE=1.274, SPF_HELO_SOFTFAIL=3, SPF_SOFTFAIL=3, SUBJ_DOLLARS=0.001, URIBL_ABUSE_SURBL=1.948] autolearn=unavailable autolearn_force=no Does this one have the requisite 3-point match? I don't understand how to tell yet. I've cleared the db again. Will let it run to see if it learns *anything*. So far I have not seen that happen. Surely something will get a 3 way match. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138075.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
Scott skrev den 2017-08-08 22:06: Better, what test flags in general disable auto-learn? tflags foo-rule-name noautolearn and you can force autolearn based on rulename https://lists.gt.net/spamassassin/users/184996 there is a long thread there that explain it more and all condition must be met for learning
Re: Bayes auto-learn - not happening
Scott skrev den 2017-08-08 22:04: The "3 points" criteria does not apply to manually learning via sa-update then? typo ?. sa-update does not learn, it just update rules, you meant sa-learn ? when sa-learn is used, its not autolearn, so the limits are not appled
Re: Bayes auto-learn - not happening
> some of the listed tags have tflags that disable autolearn < there is nothing to fix here Benny: Will you elaborate for me please? So I can understand and self-help. Better, what test flags in general disable auto-learn? -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138072.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
The "3 points" criteria does not apply to manually learning via sa-update then? -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138071.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes auto-learn - not happening
On Tue, 8 Aug 2017 13:06:26 -0500 Scott Techlist wrote: > Centos7 > Postfix 3.2.2 > Amavisd-new 2.11.0 > Spamassassin 3.4.0 > Site-wide configuration > > This is a new box and I've configured some conservative values for > auto-learn. I've enabled it properly AFAIK, but I can't see any sign > of it working. > > I have these set in local.cf > use_bayes 1 > bayes_auto_learn1 > bayes_auto_learn_threshold_nonspam -1.7 > bayes_auto_learn_threshold_spam 10.0 > # this is a filename prefix, not a directory per se > bayes_path /etc/mail/bayes/bayes > bayes_file_mode 0666 > > -bayes prep > Start fresh for troubleshooting: > su amavis -c 'sa-learn --clear' > > Add one spam manually and check tokens: > > [root@tn2 mail]# su amavis -c 'sa-learn --dump magic' > 0.000 0 3 0 non-token data: bayes db > version 0.000 0 1 0 non-token data: nspam > 0.000 0 0 0 non-token data: nham > 0.000 0 2157 0 non-token data: ntokens > > -amavisd prep > > Restart amavisd/spamassassin just to be sure all configs read.. > > --- ready to process - > > The next high scoring spam arrives, it was sent to my spam mailbox. > It did NOT autolearn. Nor did several others. > > To troubleshoot, I took one that did not autolearn, and learned it > manually by: su amavis -c 'sa-learn -D --spam --showdots > --mbox /home/mail/onespam > > even though this message was slightly over the threshold, the log > says it learned anyway: -D log snippet: > - > Aug 8 12:37:27.216 [13198] info: archive-iterator: skipping large > message: 858 lines, 262203 bytes, limit 262144 bytes > > Learned tokens from 1 message(s) (1 message(s) examined) > - > > Verified it learned: > > [root@tn2 mail]# su amavis -c 'sa-learn --dump magic' > 0.000 0 3 0 non-token data: bayes db > version 0.000 0 2 0 non-token data: nspam > > > Partial header from that message: > > X-Spam-Flag: YES > X-Spam-Score: 17.374 > X-Spam-Level: * > X-Spam-Status: Yes, score=17.374 tag=- tag2=5 kill=6.31 > tests=[RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_DNSWL_NONE=-0.0001, > RCVD_IN_RP_RNBL=1.284, RCVD_IN_SBL_CSS=3.558, > RCVD_IN_SORBS_WEB=1.5, RP_MATCHES_RCVD=-0.001, > SUSPICIOUS_RECIPS=2.497, URIBL_ABUSE_SURBL=1.948, URIBL_BLACK=1.7, > URIBL_DBL_SPAM=2.5, URIBL_SBL=0.644, URIBL_SBL_A=0.1] autolearn=no > autolearn_force=no > > Why aren't my spams getting auto-learned? If sa-learn "ate" it, > shouldn't auto-learn too? To autolearn spam you need 3 points from the body and 3 from headers.
Re: Bayes auto-learn - not happening
Scott Techlist skrev den 2017-08-08 20:06: X-Spam-Flag: YES X-Spam-Score: 17.374 X-Spam-Level: * X-Spam-Status: Yes, score=17.374 tag=- tag2=5 kill=6.31 tests=[RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_RP_RNBL=1.284, RCVD_IN_SBL_CSS=3.558, RCVD_IN_SORBS_WEB=1.5, RP_MATCHES_RCVD=-0.001, SUSPICIOUS_RECIPS=2.497, URIBL_ABUSE_SURBL=1.948, URIBL_BLACK=1.7, URIBL_DBL_SPAM=2.5, URIBL_SBL=0.644, URIBL_SBL_A=0.1] autolearn=no autolearn_force=no Can't figure out what's wrong... some of the listed tags have tflags that disable autolearn there is nothing to fix here
Bayes auto-learn - not happening
Centos7 Postfix 3.2.2 Amavisd-new 2.11.0 Spamassassin 3.4.0 Site-wide configuration This is a new box and I've configured some conservative values for auto-learn. I've enabled it properly AFAIK, but I can't see any sign of it working. I have these set in local.cf use_bayes 1 bayes_auto_learn1 bayes_auto_learn_threshold_nonspam -1.7 bayes_auto_learn_threshold_spam 10.0 # this is a filename prefix, not a directory per se bayes_path /etc/mail/bayes/bayes bayes_file_mode 0666 -bayes prep Start fresh for troubleshooting: su amavis -c 'sa-learn --clear' Add one spam manually and check tokens: [root@tn2 mail]# su amavis -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0 1 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 2157 0 non-token data: ntokens -amavisd prep Restart amavisd/spamassassin just to be sure all configs read.. --- ready to process - The next high scoring spam arrives, it was sent to my spam mailbox. It did NOT autolearn. Nor did several others. To troubleshoot, I took one that did not autolearn, and learned it manually by: su amavis -c 'sa-learn -D --spam --showdots --mbox /home/mail/onespam even though this message was slightly over the threshold, the log says it learned anyway: -D log snippet: - Aug 8 12:37:27.216 [13198] info: archive-iterator: skipping large message: 858 lines, 262203 bytes, limit 262144 bytes Learned tokens from 1 message(s) (1 message(s) examined) - Verified it learned: [root@tn2 mail]# su amavis -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0 2 0 non-token data: nspam Partial header from that message: X-Spam-Flag: YES X-Spam-Score: 17.374 X-Spam-Level: * X-Spam-Status: Yes, score=17.374 tag=- tag2=5 kill=6.31 tests=[RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_RP_RNBL=1.284, RCVD_IN_SBL_CSS=3.558, RCVD_IN_SORBS_WEB=1.5, RP_MATCHES_RCVD=-0.001, SUSPICIOUS_RECIPS=2.497, URIBL_ABUSE_SURBL=1.948, URIBL_BLACK=1.7, URIBL_DBL_SPAM=2.5, URIBL_SBL=0.644, URIBL_SBL_A=0.1] autolearn=no autolearn_force=no Why aren't my spams getting auto-learned? If sa-learn "ate" it, shouldn't auto-learn too? I know there is a default 200 threshold before Bayes starts tagging anything, but I understand it should learn without issue. Can't figure out what's wrong...
Re: auto-learn? no: scored as spam but autolearn wanted ham
On 6 Nov 2015, at 1:52, Matthias Apitz wrote: El día Thursday, November 05, 2015 a las 04:24:04PM +0100, John Wilcock escribió: Le 05/11/2015 15:54, Matthias Apitz a écrit : X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659 X-Spam-Flag: YES X-Spam-Level: ** X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED, NO_RELAYS autolearn=no autolearn_force=no version=3.4.0 X-Spam-Report: ++ * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email * -0.0 NO_RELAYS Informational: message was not relayed via SMTP * -0.0 NO_RECEIVED Informational: message has no Received * headers ... Why auto-learn wants the mail as HAM? Because autolearning ignores rules with the noautolearn, userconf or learn tflags set (and uses the scores from scoreset 0 or 1). ... Thanks for all explanations. I now have a better understanding of the autolearning process. Please, can someone forward me off-list (gzip'ed with complete header lines) a SPAM which resulted in autolearn=spam. You may have a long wait for that... A lot of mail systems do not retain mail that is determined to be spam or even accept delivery of it. Since the autolearn threshold (at least by default) is generally much higher than the simple spam threshold, even sites that accept and deliver spam (i.e. tagged or to a spam mailbox) often don't bother keeping spam scoring so high. Beyond that, many people who do retain spam (e.g. for analytic purposes) are averse to sharing their data. There is a long history of spamtraps and fingerprints becoming useless soon after behind shared with seemingly trustworthy audiences.
Re: why: auto-learn? no: scored as spam but autolearn wanted ham
On November 5, 2015 3:54:25 PM Matthias Apitz <g...@unixarea.de> wrote: ... X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659 X-Spam-Flag: YES X-Spam-Level: ** X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED, NO_RELAYS autolearn=no autolearn_force=no version=3.4.0 X-Spam-Report: ++ * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email * -0.0 NO_RELAYS Informational: message was not relayed via SMTP * -0.0 NO_RECEIVED Informational: message has no Received * headers ... Why auto-learn wants the mail as HAM? where did you see this ?, GTUBE disables autolearn
Re: why: auto-learn? no: scored as spam but autolearn wanted ham
El día Thursday, November 05, 2015 a las 04:24:04PM +0100, John Wilcock escribió: > Le 05/11/2015 15:54, Matthias Apitz a écrit : > > X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659 > > X-Spam-Flag: YES > > X-Spam-Level: ** > > X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED, > > NO_RELAYS autolearn=no autolearn_force=no version=3.4.0 > > X-Spam-Report: ++ > > * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email > > * -0.0 NO_RELAYS Informational: message was not relayed via SMTP > > * -0.0 NO_RECEIVED Informational: message has no Received > > * headers > > ... > > > > Why auto-learn wants the mail as HAM? > > Because autolearning ignores rules with the noautolearn, userconf or > learn tflags set (and uses the scores from scoreset 0 or 1). > > ... Thanks for all explanations. I now have a better understanding of the autolearning process. Please, can someone forward me off-list (gzip'ed with complete header lines) a SPAM which resulted in autolearn=spam. Thanks in advance. matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ ☎ +49-176-38902045
Re: why: auto-learn? no: scored as spam but autolearn wanted ham
* 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email https://en.wikipedia.org/wiki/GTUBE Am 05.11.2015 um 15:54 schrieb Matthias Apitz: This is with version 3.4.0 on FreeBSD 11-CURRENT. If I run with the sample file: $ spamassassin -tD < Mail-SpamAssassin-3.4.0/sample-spam.txt it says on STDERR: ... nov 5 15:47:54.521 [3855] dbg: learn: auto-learn: currently using scoreset 1 nov 5 15:47:54.521 [3855] dbg: learn: auto-learn: message score: 999.998, computed score for autolearn: 0 nov 5 15:47:54.521 [3855] dbg: learn: auto-learn? ham=0.1, spam=12, body-points=0, head-points=0, learned-points=0 nov 5 15:47:54.521 [3855] dbg: learn: auto-learn? no: scored as spam but autolearn wanted ham nov 5 15:47:54.521 [3855] dbg: check: is spam? score=999.998 required=3 ... and returns the mail with this header: ... X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659 X-Spam-Flag: YES X-Spam-Level: ** X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED, NO_RELAYS autolearn=no autolearn_force=no version=3.4.0 X-Spam-Report: ++ * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email * -0.0 NO_RELAYS Informational: message was not relayed via SMTP * -0.0 NO_RECEIVED Informational: message has no Received * headers ... Why auto-learn wants the mail as HAM? signature.asc Description: OpenPGP digital signature
why: auto-learn? no: scored as spam but autolearn wanted ham
Hello, This is with version 3.4.0 on FreeBSD 11-CURRENT. If I run with the sample file: $ spamassassin -tD < Mail-SpamAssassin-3.4.0/sample-spam.txt it says on STDERR: ... nov 5 15:47:54.521 [3855] dbg: learn: auto-learn: currently using scoreset 1 nov 5 15:47:54.521 [3855] dbg: learn: auto-learn: message score: 999.998, computed score for autolearn: 0 nov 5 15:47:54.521 [3855] dbg: learn: auto-learn? ham=0.1, spam=12, body-points=0, head-points=0, learned-points=0 nov 5 15:47:54.521 [3855] dbg: learn: auto-learn? no: scored as spam but autolearn wanted ham nov 5 15:47:54.521 [3855] dbg: check: is spam? score=999.998 required=3 ... and returns the mail with this header: ... X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659 X-Spam-Flag: YES X-Spam-Level: ** X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED, NO_RELAYS autolearn=no autolearn_force=no version=3.4.0 X-Spam-Report: ++ * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email * -0.0 NO_RELAYS Informational: message was not relayed via SMTP * -0.0 NO_RECEIVED Informational: message has no Received * headers ... Why auto-learn wants the mail as HAM? matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ ☎ +49-176-38902045
Re: why: auto-learn? no: scored as spam but autolearn wanted ham
El día Thursday, November 05, 2015 a las 03:57:01PM +0100, Reindl Harald escribió: > * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email > https://en.wikipedia.org/wiki/GTUBE Maybe because you are top posting you have not read my question, at lease you have not answered it. > nov 5 15:47:54.521 [3855] dbg: learn: auto-learn? no: scored as spam but > autolearn wanted ham > nov 5 15:47:54.521 [3855] dbg: check: is spam? score=999.998 required=3 > > > X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED, > > NO_RELAYS autolearn=no autolearn_force=no version=3.4.0 > > X-Spam-Report: ++ > > * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email > > * -0.0 NO_RELAYS Informational: message was not relayed via SMTP > > * -0.0 NO_RECEIVED Informational: message has no Received > > * headers > > ... > > > > Why auto-learn wants the mail as HAM? Again, why it wants to declare the SPAM message as autolearn=ham? matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ ☎ +49-176-38902045
Re: why: auto-learn? no: scored as spam but autolearn wanted ham
Matthias Apitz wrote: > This is with version 3.4.0 on FreeBSD 11-CURRENT. If I run with the > sample file: > > $ spamassassin -tD < Mail-SpamAssassin-3.4.0/sample-spam.txt > Why auto-learn wants the mail as HAM? > it says on STDERR: > ... > nov 5 15:47:54.521 [3855] dbg: learn: auto-learn: currently using scoreset 1 > nov 5 15:47:54.521 [3855] dbg: learn: auto-learn: message score: 999.998, > computed score for autolearn: 0 This line reports the score used to decide which direction to autolearn. There are a number of conditions that mean the "normal" score on the message is not the one used to decide on autolearn. > nov 5 15:47:54.521 [3855] dbg: learn: auto-learn? ham=0.1, spam=12, > body-points=0, head-points=0, learned-points=0 This line reports the current thresholds for autolearn. 0 < 0.1, so if the message is to be autolearned, it should be learned as ham. > nov 5 15:47:54.521 [3855] dbg: learn: auto-learn? no: scored as spam but > autolearn wanted ham This line reports that the live score (note, not the score used to decide how to autolearn) scored as spam, so the message will not be autolearned at all. See the man page for Mail::SpamAssassin::Plugin::AutoLearnThreshold for the full set of details. -kgd
Re: why: auto-learn? no: scored as spam but autolearn wanted ham
Le 05/11/2015 15:54, Matthias Apitz a écrit : X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659 X-Spam-Flag: YES X-Spam-Level: ** X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED, NO_RELAYS autolearn=no autolearn_force=no version=3.4.0 X-Spam-Report: ++ * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email * -0.0 NO_RELAYS Informational: message was not relayed via SMTP * -0.0 NO_RECEIVED Informational: message has no Received * headers ... Why auto-learn wants the mail as HAM? Because autolearning ignores rules with the noautolearn, userconf or learn tflags set (and uses the scores from scoreset 0 or 1). Without GTUBE, this message would have had a score below the default autolearn ham threshold of 0.1 and would thus have been learnt as ham. For safety, however, SA checks the autolearn score against the actual classification before it goes ahead with the learning process. -- John
auto-learn
Since having to wipe my bayes db I've thought about going back to having 'auto-learn' setup for awhile. It's been so long since I did this I have a fairly dumb question. Do I need the two below lines to be set and if so is this the correct setting? Anything here about a score of 5 is considered spam. # bayes_auto_learn_threshold_nonspam 0.1 # bayes_auto_learn_threshold_spam 12.0 Thanks Chris -- Chris KeyID 0xE372A7DA98E6705C 31.11°N 97.89°W (Elev. 1092 ft) 21:38:18 up 7 days, 6:08, 1 user, load average: 0.53, 0.45, 0.34 Mandriva Linux 2010.2, kernel 2.6.33.7-desktop586-2mnb
Re: auto-learn
On Mon, 2014-06-09 at 21:40 -0500, Chris wrote: Since having to wipe my bayes db I've thought about going back to having 'auto-learn' setup for awhile. It's been so long since I did this I have a fairly dumb question. Do I need the two below lines to be set and if so is this the correct setting? Anything here about a score of 5 is considered spam. # bayes_auto_learn_threshold_nonspam 0.1 # bayes_auto_learn_threshold_spam 12.0 Answering the direct questions first: Yes, that is correct syntax. No, you don't need them (commented out), they are default. An auto-learning setup generally isn't a bad idea, and actually default. Depending on your amount of messages, you might want to have a look at the recent train-on-error option. If (since) there was any need to wipe your old Bayes DB and start fresh, I seriously recommend continued manual training. And in any case, always (manually) training spam with low-ish Bayes probability. Likewise for ham that doesn't already have a very low Bayes probability. In non-high-volume environments, there's hardly any down-side on training the extremes, too. Learning hand-confirmed non-extremes is always worth it. [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: auto-learn
On Tue, 2014-06-10 at 05:13 +0200, Karsten Bräckelmann wrote: On Mon, 2014-06-09 at 21:40 -0500, Chris wrote: Since having to wipe my bayes db I've thought about going back to having 'auto-learn' setup for awhile. It's been so long since I did this I have a fairly dumb question. Do I need the two below lines to be set and if so is this the correct setting? Anything here about a score of 5 is considered spam. # bayes_auto_learn_threshold_nonspam 0.1 # bayes_auto_learn_threshold_spam 12.0 Answering the direct questions first: Yes, that is correct syntax. No, you don't need them (commented out), they are default. An auto-learning setup generally isn't a bad idea, and actually default. Depending on your amount of messages, you might want to have a look at the recent train-on-error option. If (since) there was any need to wipe your old Bayes DB and start fresh, I seriously recommend continued manual training. And in any case, always (manually) training spam with low-ish Bayes probability. Likewise for ham that doesn't already have a very low Bayes probability. In non-high-volume environments, there's hardly any down-side on training the extremes, too. Learning hand-confirmed non-extremes is always worth it. [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html Thanks very much Karsten for the quick reply. -- Chris KeyID 0xE372A7DA98E6705C 31.11°N 97.89°W (Elev. 1092 ft) 22:18:08 up 7 days, 6:48, 1 user, load average: 0.56, 0.49, 0.61 Mandriva Linux 2010.2, kernel 2.6.33.7-desktop586-2mnb
Auto Learn Spam
I noticed when reviewing headers today that there was a section for 'autolearn=no' and was wondering what exactly does this mean and wouldn't autolearn be a good thing? I use Amavisd-new which calls out to SpamAssassin modules but I don't have the spamd daemon running physically. The Amavisd-new daemon simply loads the modules for spamd and does the scoring directly saving my mail server from running more daemon's and system resources that it needs to. So below are the headers: X-Spam-Status: No, score=2.808 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HTML_IMAGE_ONLY_24=1.618, HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.377, MIME_HTML_ONLY=0.723, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no The last line is what I am confused about. -Carlos
Re: Auto Learn Spam
On 4/28/10 11:53 AM, Carlos Mennens wrote: I noticed when reviewing headers today that there was a section for 'autolearn=no' its a SPAMASSASSIN thing. (google) it means the score was either not high enough for SA to learn as spam (bayes, and/or AWL) or was not low enough to learn as ham. you should set the triggers high and low enough so that you don't accidentally learn a sneaky spam as ham, etc. -- Michael Scheidell, CTO Phone: 561-999-5000, x 1259 *| *SECNAP Network Security Corporation * Certified SNORT Integrator * 2008-9 Hot Company Award Winner, World Executive Alliance * Five-Star Partner Program 2009, VARBusiness * Best Anti-Spam Product 2008, Network Products Guide * King of Spam Filters, SC Magazine 2008 __ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ __
Re: Auto Learn Spam
On Wed, 2010-04-28 at 11:53 -0400, Carlos Mennens wrote: I noticed when reviewing headers today that there was a section for 'autolearn=no' and was wondering what exactly does this mean and wouldn't autolearn be a good thing? I use Amavisd-new which calls out to SpamAssassin modules but I don't have the spamd daemon running physically. The Amavisd-new daemon simply loads the modules for spamd and does the scoring directly saving my mail server from running more daemon's and system resources that it needs to. So below are the headers: Autolearn kicks in at certain scores. I believe the default is 12.0 for spam and 0.1 for ham. You can customize those settings in your local.cf file. bayes_auto_learn 1 bayes_auto_learn_threshold_nonspam -3.0 bayes_auto_learn_threshold_spam 12.0 I changed the default value for nonspam because the majority of my users don't train bayes and so the default value could cause bayes to learn incorrectly if a spam message scored low (maybe no network rules or URI rules triggered the first few times). X-Spam-Status: No, score=2.808 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HTML_IMAGE_ONLY_24=1.618, HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.377, MIME_HTML_ONLY=0.723, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no This particular message scored a 2.808 so it's not high or low enough for bayes to know which way it should learn the message. --Dennis
Re: Auto Learn Spam
On Wed, Apr 28, 2010 at 12:10 PM, Dennis B. Hopp dh...@coreps.com wrote: Autolearn kicks in at certain scores. I believe the default is 12.0 for spam and 0.1 for ham. You can customize those settings in your local.cf file. bayes_auto_learn 1 bayes_auto_learn_threshold_nonspam -3.0 bayes_auto_learn_threshold_spam 12.0 I checked /etc/mail/spamassassin/local.cf just now and found only the following: required_hits 5 report_safe 0 rewrite_header Subject [SPAM] However I don't know if Amavisd-new is looking at local.cf because I show parameters in my amavisd.conf file for SpamAssassin: $sa_tag_level_deflt = -999.0; # add spam info headers if at, or above that level $sa_tag2_level_deflt = 5.0; # add 'spam detected' headers at that level $sa_kill_level_deflt = 8.0; # triggers spam evasive actions (e.g. blocks mail) $sa_dsn_cutoff_level = 10; # spam level beyond which a DSN is not sent $sa_quarantine_cutoff_level = 12; # spam level beyond which quarantine is off $penpals_bonus_score = 8;# (no effect without a @storage_sql_dsn database) $penpals_threshold_high = $sa_kill_level_deflt; # don't waste time on hi spam $sa_mail_body_size_limit = 400*1024; # don't waste time on SA if mail is larger $sa_local_tests_only = 0;# only tests which do not require internet access? [...] $sa_spam_subject_tag = '***SPAM*** '; $defang_virus = 1; # MIME-wrap passed infected mail $defang_banned = 1; # MIME-wrap passed mail containing banned name # for defanging bad headers only turn on certain minor contents categories: $defang_by_ccat{+CC_BADH.,3} = 1; # NUL or CR character in header $defang_by_ccat{+CC_BADH.,5} = 1; # header line longer than 998 characters When I get a spam message that was scored by SA, it says ***SPAM*** and not [SPAM] so that leaves me to believe that SA parameters are being fed from amavisd.conf file. Does this make sense to you guys? I changed the default value for nonspam because the majority of my users don't train bayes and so the default value could cause bayes to learn incorrectly if a spam message scored low (maybe no network rules or URI rules triggered the first few times). X-Spam-Status: No, score=2.808 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HTML_IMAGE_ONLY_24=1.618, HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.377, MIME_HTML_ONLY=0.723, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no This particular message scored a 2.808 so it's not high or low enough for bayes to know which way it should learn the message. --Dennis
Re: Auto Learn Spam
On Wed, 2010-04-28 at 12:38 -0400, Carlos Mennens wrote: I checked /etc/mail/spamassassin/local.cf just now and found only the following: required_hits 5 report_safe 0 rewrite_header Subject [SPAM] However I don't know if Amavisd-new is looking at local.cf because I show parameters in my amavisd.conf file for SpamAssassin: $sa_tag_level_deflt = -999.0; # add spam info headers if at, or above that level $sa_tag2_level_deflt = 5.0; # add 'spam detected' headers at that level $sa_kill_level_deflt = 8.0; # triggers spam evasive actions (e.g. blocks mail) $sa_dsn_cutoff_level = 10; # spam level beyond which a DSN is not sent $sa_quarantine_cutoff_level = 12; # spam level beyond which quarantine is off $penpals_bonus_score = 8;# (no effect without a @storage_sql_dsn database) $penpals_threshold_high = $sa_kill_level_deflt; # don't waste time on hi spam These settings are for amavisd-new and not spamassassin. Amavisd-new is the glue between your MTA and spamassassin (and virus scanners). Most of the behavior of spamassassin is still controlled through the local.cf (although some settings can be defined in both places and the amavisd.conf file will take precedence). $sa_mail_body_size_limit = 400*1024; # don't waste time on SA if mail is larger $sa_local_tests_only = 0;# only tests which do not require internet access? [...] $sa_spam_subject_tag = '***SPAM*** '; $defang_virus = 1; # MIME-wrap passed infected mail $defang_banned = 1; # MIME-wrap passed mail containing banned name # for defanging bad headers only turn on certain minor contents categories: $defang_by_ccat{+CC_BADH.,3} = 1; # NUL or CR character in header $defang_by_ccat{+CC_BADH.,5} = 1; # header line longer than 998 characters When I get a spam message that was scored by SA, it says ***SPAM*** and not [SPAM] so that leaves me to believe that SA parameters are being fed from amavisd.conf file. Does this make sense to you guys? This is just the setting in amavisd.conf taking precedence. If you were to comment out $sa_spam_subject_tag I *believe* the value in your local.cf would then be used.
Re: Auto Learn Spam
Carlos Mennens wrote: On Wed, Apr 28, 2010 at 12:10 PM, Dennis B. Hopp dh...@coreps.com wrote: Autolearn kicks in at certain scores. I believe the default is 12.0 for spam and 0.1 for ham. You can customize those settings in your local.cf file. bayes_auto_learn 1 bayes_auto_learn_threshold_nonspam -3.0 bayes_auto_learn_threshold_spam 12.0 I checked /etc/mail/spamassassin/local.cf just now and found only the following: required_hits 5 report_safe 0 rewrite_header Subject [SPAM] However I don't know if Amavisd-new is looking at local.cf because I show parameters in my amavisd.conf file for SpamAssassin: $sa_tag_level_deflt = -999.0; # add spam info headers if at, or above that level $sa_tag2_level_deflt = 5.0; # add 'spam detected' headers at that level $sa_kill_level_deflt = 8.0; # triggers spam evasive actions (e.g. blocks mail) $sa_dsn_cutoff_level = 10; # spam level beyond which a DSN is not sent $sa_quarantine_cutoff_level = 12; # spam level beyond which quarantine is off $penpals_bonus_score = 8;# (no effect without a @storage_sql_dsn database) $penpals_threshold_high = $sa_kill_level_deflt; # don't waste time on hi spam $sa_mail_body_size_limit = 400*1024; # don't waste time on SA if mail is larger $sa_local_tests_only = 0;# only tests which do not require internet access? [...] $sa_spam_subject_tag = '***SPAM*** '; $defang_virus = 1; # MIME-wrap passed infected mail $defang_banned = 1; # MIME-wrap passed mail containing banned name # for defanging bad headers only turn on certain minor contents categories: $defang_by_ccat{+CC_BADH.,3} = 1; # NUL or CR character in header $defang_by_ccat{+CC_BADH.,5} = 1; # header line longer than 998 characters When I get a spam message that was scored by SA, it says ***SPAM*** and not [SPAM] so that leaves me to believe that SA parameters are being fed from amavisd.conf file. Does this make sense to you guys? There are a few differences when you run SA through Amavis: 1) Required scores for tagging or rejecting messages are set in the Amavis config (SA settings are ignored) 2) Settings for adding headers/markup to the email are set via Amavis 3) amavisd loads the SA libraries internally, so it is not necessary to run spamd. So your required_hits, report_safe, and rewrite_header options will not be used by amavis. However, the bayes settings along with rules, scores, etc, ARE read from the normal SA configs, so if you want to change the Bayes learning behavior, you can add the settings given above to your local.cf file and then restart amavisd. Keep in mind that the settings shown above are more conservative than the default, so it will result in fewer messages being learned automatically, but it is less likely to learn messages incorrectly (spam being learned as ham or ham being learned as spam). -- Bowie
Auto-Learn Thresholds (was: lottery message scored hammy by bayes)
On Tue, 2009-08-25 at 22:13 -0400, Alex wrote: If you're using autolearning, what are your learning thresholds? What do you recommend for thresholds? I'm considering using autolearning, but very concerned about corrupting the database. I think I would use something like +15 for spam. I generally recommend the defaults, unless you *do* know you need something else. That's why they are defaults. That's = 0.1 for ham and = 12.0 for spam. Keep in mind these scores are calculated using a non-Bayes score set, so they generally differ from the overall score of the message. Also, this does not take various specific rules' scores into account, like Bayes and AWL. Plus some more esoteric constraints. See the docs. [1] There are FNs on occasion in the 2.x range with low bayes numbers (or BAYES_50) that I wouldn't want to be tagged as ham. Should that be a concern? No. Bayes auto-learning is *not* self-feeding. Any overall score of about 2 (with Bayes) is *very* unlikely to cross either threshold when using the respective non-Bayes score-set. Moreover, your concern is with Bayes probability = 50%, and thus a negative score for the BAYES hit. This hit is not considered for auto-learning, though, and as a first rule-of-thumb subtract that score again -- which yields a slightly higher score. Still no way even close to the thresholds. Even mail that has been whitelisted could also contain spam, so would a ham threshold of like -100 work, or present the same problem? 60_whitelist.cf: tflags USER_IN_WHITELIST userconf nice noautolearn Again, as per the docs [1], whitelisting will not be considered for the decision whether to auto-learn or not. guenther [1] http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
auto learn threshold
Clip of /etc/mail/spamassassin/local.cf __ required_score 7 ifplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold bayes_auto_learn_threshold_nonspam 0.1 bayes_auto_learn_threshold_spam 10.0 endif __ Some messages with a SA score of 10 or higher are auto-learned as spam and some are not. Any suggestions? What is the default? Perhaps my bayes_auto_learn_threshold_spam isn't being used. The results seem to be random. Is there a certain rule that is ignored when determining the score SA uses for autolearn? Examples: score: 11.6 autolearn=no score: 12.7 autolearn=no score: 33.9 autolearn=spam score: 15.9 autolearn=no score: 19.0 autolearn=no score: 19.6 autolearn=spam score: 18.4 autolearn=spam Thanks, Dan Schaefer Web Developer/Systems Analyst Performance Administration Corp.
Re: auto learn threshold
On Tuesday 21 July 2009 16:16:53 Dan Schaefer wrote: Clip of /etc/mail/spamassassin/local.cf __ required_score 7 ifplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold bayes_auto_learn_threshold_nonspam 0.1 bayes_auto_learn_threshold_spam 10.0 endif __ Some messages with a SA score of 10 or higher are auto-learned as spam and some are not. Any suggestions? What is the default? Perhaps my bayes_auto_learn_threshold_spam isn't being used. The results seem to be random. Is there a certain rule that is ignored when determining the score SA uses for autolearn? Examples: score: 11.6 autolearn=no score: 12.7 autolearn=no score: 33.9 autolearn=spam score: 15.9 autolearn=no score: 19.0 autolearn=no score: 19.6 autolearn=spam score: 18.4 autolearn=spam Thanks, Dan Schaefer Web Developer/Systems Analyst Performance Administration Corp. Maybe this? perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6. Best regards, Nuno Fernandes
Re: auto learn threshold
Dan Schaefer wrote: Clip of /etc/mail/spamassassin/local.cf __ required_score 7 ifplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold bayes_auto_learn_threshold_nonspam 0.1 bayes_auto_learn_threshold_spam 10.0 endif __ Some messages with a SA score of 10 or higher are auto-learned as spam and some are not. Any suggestions? What is the default? Perhaps my bayes_auto_learn_threshold_spam isn't being used. The results seem to be random. Is there a certain rule that is ignored when determining the score SA uses for autolearn? Examples: score: 11.6 autolearn=no score: 12.7 autolearn=no score: 33.9 autolearn=spam score: 15.9 autolearn=no score: 19.0 autolearn=no score: 19.6 autolearn=spam score: 18.4 autolearn=spam $ man Mail::SpamAssassin::Plugin::AutoLearnThreshold Note that certain tests are ignored when determining whether a message should be trained upon: * rules with tflags set to ’learn’ (the Bayesian rules) * rules with tflags set to ’userconf’ (user configuration) * rules with tflags set to ’noautolearn’ Also note that auto-learning occurs using scores from either scoreset 0 or 1, depending on what scoreset is used during message check. It is likely that the message check and auto-learn scores will be different. snip Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6. So, Bayes rules and certain other rules are ignored. The score is also determined using the scores as they would be if Bayes were disabled. The final score that you see in the email may be significantly different from the score Bayes is using to determine auto-learn. -- Bowie
Re: auto learn threshold
Is there a certain rule that is ignored when determining the score SA uses for autolearn? Maybe this? perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6. This is very possible. I checked a few of my examples and this turned out to be true. Thanks to both Nuno and Bowie for the answer. This is another one of my questions that could have been answered with RTFMP. I am, however, a semi-rookie System Admin and I'm still learning the tricks. -- Dan Schaefer Web Developer/Systems Analyst Performance Administration Corp.
Re: Bayes Auto Learn
Daniel Aquino wrote: Is spam assassin smart enough to not auto-learn (bayesian) spam if the default tests allready detect it as spam... ? No, in fact, that's exactly what you DO NOT want to do. Bayes training is not applicable to just one message. Bits learned from one spam get applied to other spams. What I'm wondering is if the other tests have allready deamed it to be spam, then why would you want to increase the size of your bayesian db... You won't increase the size of the bayes DB.. SA automatically prunes tokens that haven't been used recently in order to keep the token count below a specified limit. (see the conf docs) Bayesian I believe would be better applied to messages that appear to be slipping past the other tests... That is purely misguided. It is certianly more important to get to training messages that are missed, but at the same time it is also important to train fresh spam that is caught. You have to consider that spam is a mutating thing. Even if a spam is caught, and even if it already hits BAYES_99, it can still contain new tokens caused by these mutations. So, if you avoid training the new mutations, and wait until there are enough mutations that that family of spam starts getting missed, you'll have to play catch-up. On the other hand, if you consistently train spam, as they mutate they will continue to have high bayes scores, and likely never get missed at all.
Bayes Auto Learn
Is spam assassin smart enough to not auto-learn (bayesian) spam if the default tests allready detect it as spam... ? What I'm wondering is if the other tests have allready deamed it to be spam, then why would you want to increase the size of your bayesian db... Bayesian I believe would be better applied to messages that appear to be slipping past the other tests...
Re: Bayes Auto Learn
Daniel Aquino wrote: Is spam assassin smart enough to not auto-learn (bayesian) spam if the default tests allready detect it as spam... ? What I'm wondering is if the other tests have allready deamed it to be spam, then why would you want to increase the size of your bayesian db... Bayesian I believe would be better applied to messages that appear to be slipping past the other tests... It has to know which is which. So you would train (Ideally) equally on both. If you trained nothing but ham, it would think everything in the world was ham, other way around for spam. -- Thanks, James
Re: Bayes Auto Learn
Daniel Aquino wrote: Is spam assassin smart enough to not auto-learn (bayesian) spam if the default tests allready detect it as spam... ? What I'm wondering is if the other tests have allready deamed it to be spam, then why would you want to increase the size of your bayesian db... Bayesian I believe would be better applied to messages that appear to be slipping past the other tests... Because you might get a similar message that doesn't trip the same SA tests, and doesn't score 5 points. Maybe the exact wording SA looked for only hits one variation of the message, but other parts are substantially similar from one run to the next. Maybe the first message came from a source that triggers a whole mess of RBLs, but the second one comes from a clean source. Maybe the spammer rotates in a new URL with the same sales pitch, and the new URL hasn't made it into any SURBLs yet. -- Kelson Vibber SpeedGate Communications www.speed.net
Re: auto-learn learned_points
ram01 wrote: auto-learn? no: scored as spam but learner indicated ham is given if if ($learned_points $learner_said_ham_points)where $learner_said_ham_points = -1.0 what exactly is learned_points It is a recalculation of the message score, based on the following changes from the normal score calculation: 1) All userconf tests disabled. ie: whitelist/blacklists. This is to prevent an errant whitelist_from from poisoning the autolearning. 2) All learning subsystems are disabled, ie: bayes and AWL. This is to prevent self feedback. 3) The score set is changed, because bayes is disabled.
Re: [2] auto-learn learned_points
Thanks for the reply, but I think that you are referring to autolearn_points. As computed in PerMsgStatus.pm and is used in AutoLearningThreshold.pm. They are computed in the same function but they are not the same. Notice that in the get_autolearn_points autolearn_points is $score where learned points is $self-{learned_points} += $self-{conf}-{scoreset}-[$orig_scoreset]-{$test}; which is inside a loop and a conditional. I am not very familiar with perl and was kind of lost in the syntactics of the for and the if, but I assume that += means the same as in say c/c++ so this is some kind of cumulative sum of something. On one run of sa-learn in debug mode I got the following numbers back: [28135] dbg: learn: auto-learn: currently using scoreset 3, recomputing score based on scoreset 1 [28135] dbg: learn: auto-learn: message score: 10.955, computed score for autolearn: 12.011 [28135] dbg: learn: auto-learn? ham=12, spam=1, body-points=0, head-points=10.813, learned-points=-2.599 so it is definitely not the same score, but what is it? here's a snippet of AutoLearnThreshold.pm sub autolearn_discriminator { my ($self, $params) = @_; my $scan = $params-{permsgstatus}; my $conf = $scan-{conf}; # Figure out min/max for autolearning. # Default to specified auto_learn_threshold settings my $min = $conf-{bayes_auto_learn_threshold_nonspam}; my $max = $conf-{bayes_auto_learn_threshold_spam}; # Find out what score we should consider this message to have ... my $score = $scan-get_autolearn_points(); my $body_only_points = $scan-get_body_only_points(); my $head_only_points = $scan-get_head_only_points(); my $learned_points = $scan-get_learned_points(); dbg(learn: auto-learn? ham=$min, spam=$max, . body-points=.$body_only_points., . head-points=.$head_only_points., . learned-points=.$learned_points); my $isspam; if ($score $min) { $isspam = 0; } elsif ($score = $max) { $isspam = 1; } else { dbg(learn: auto-learn? no: inside auto-learn thresholds, not considered ham or spam); return; } my $learner_said_ham_points = -1.0; my $learner_said_spam_points = 1.0; if ($isspam) { my $required_body_points = 3; my $required_head_points = 3; if ($body_only_points $required_body_points) { dbg(learn: auto-learn? no: scored as spam but too few body points (. $body_only_points. .$required_body_points.)); return; } if ($head_only_points $required_head_points) { dbg(learn: auto-learn? no: scored as spam but too few head points (. $head_only_points. .$required_head_points.)); return; } if ($learned_points $learner_said_ham_points) { dbg(learn: auto-learn? no: scored as spam but learner indicated ham (. $learned_points. .$learner_said_ham_points.)); return; } if (!$scan-is_spam()) { dbg(learn: auto-learn? no: scored as ham but autolearn wanted spam); return; } } else { if ($learned_points $learner_said_spam_points) { dbg(learn: auto-learn? no: scored as ham but learner indicated spam (. $learned_points. .$learner_said_spam_points.)); return; } if ($scan-is_spam()) { dbg(learn: auto-learn? no: scored as spam but autolearn wanted ham); return; } } dbg(learn: auto-learn? yes, .($isspam?spam ($score $max):ham ($score $min))); return $isspam; } here's a snippet of PerMsgStatus.pm sub _get_autolearn_points { my ($self) = @_; return if (exists $self-{autolearn_points}); # ensure it only gets computed once, even if we return early $self-{autolearn_points} = 0; # This function needs to use use sum($score[scoreset % 2]) not just {score}. # otherwise we shift what we autolearn on and it gets really wierd. - tvd my $orig_scoreset = $self-{conf}-get_score_set(); my $new_scoreset = $orig_scoreset; my $scores = $self-{conf}-{scores}; if (($orig_scoreset 2) == 0) { # we don't need to recompute dbg(learn: auto-learn: currently using scoreset $orig_scoreset); } else { $new_scoreset = $orig_scoreset ~2; dbg(learn: auto-learn: currently using scoreset $orig_scoreset, recomputing score based on scoreset $new_scoreset); $scores = $self-{conf}-{scoreset}-[$new_scoreset]; } my $tflags = $self-{conf}-{tflags}; my $points = 0; # Just in case this function is called multiple times, clear out the # previous calculated values $self-{learned_points} = 0; $self-{body_only_points} = 0; $self-{head_only_points} = 0; foreach my $test (@{$self-{test_names_hit}}) { # According to the documentation, noautolearn, userconf, and learn # rules are ignored for autolearning. if (exists $tflags-{$test}) { next if $tflags-{$test} =~ /\bnoautolearn\b/; next if $tflags-{$test} =~ /\buserconf\b/; # Keep track of the learn points for an additional
auto-learn learned_points
auto-learn? no: scored as spam but learner indicated ham is given if if ($learned_points $learner_said_ham_points)where $learner_said_ham_points = -1.0 what exactly is learned_points -- View this message in context: http://www.nabble.com/auto-learn-learned_points-tf3353775.html#a9326859 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: local.cf auto learn configs and defaults?
Email Lists wrote: - - You can clear the AWL for a sender like this: - - spamassassin --remove-addr-from-whitelist [EMAIL PROTECTED] - - ([EMAIL PROTECTED] is the sender) - - Make sure you do this as the user who is having the problem. - - Thanks and kind regards - - If this doesn't help, post the headers from one of the messages so - that we can see which rules are hitting. - - -- - Bowie Can this removal be a wildcard? [EMAIL PROTECTED] Remember the test rule created was for a whole functional domain I think it has to be done for each address (and also for each recipient). The good news is that the AWL will gradually fix itself. Once these emails are no longer receiving high scores (before the AWL rule), the AWL will start lowering it's score back to reasonable levels. -- Bowie
local.cf auto learn configs and defaults?
I placed with some rules some time back because I didn't like to see list emails from this one person with very poor judgement and taste in his signature lines decisions... Looked like this and I added them to my local.cf # header LOCAL_DEMONSTRATION_ALL ALL =~ /thatjerksdomsin\.com/i score LOCAL_DEMONSTRATION_ALL 9.9 # I did a test domain first and it worked. Then I went live with the real domain. After awhile I removed all of it and restarted everything yet the test domain I did this with at first is still getting really high spam scores and is causing me a problem cause it is a secondary mail account live domain etc. Also, in my local.cf # Use Bayesian classifier (default: 1) # # use_bayes 1 # Bayesian classifier auto-learning (default: 1) # # bayes_auto_learn 1 Please notice that they are commented out and have never been put in service. What I am wondering, is this though, how do I check besides here to see if bayes or auto_learn is on somewhere else Would I just look at the headers? Is that the only way and the only other place to look? I know something is learned and stored somewhere. How do I clear this? Can I do it selectively or does it all have to be dusted. I never knowingly turned on any learning. Let me know if you need more info... Thanks and kind regards - rh -- Robert - Abba Communications Computer Internet Services (509) 624-7159 - www.abbacomm.net
RE: local.cf auto learn configs and defaults?
- I placed with some rules some time back because I didn't like to see list - emails from this one person with very poor judgement and taste in his - signature lines decisions... - - Looked like this and I added them to my local.cf - - # -header LOCAL_DEMONSTRATION_ALL ALL =~ /thatjerksdomsin\.com/i -score LOCAL_DEMONSTRATION_ALL 9.9 - # - - I did a test domain first and it worked. Then I went live with the real - domain. - - After awhile I removed all of it and restarted everything yet the test - domain I did this with at first is still getting really high spam scores - and - is causing me a problem cause it is a secondary mail account live domain - etc. - - Also, in my local.cf - - # Use Bayesian classifier (default: 1) - # - # use_bayes 1 - - # Bayesian classifier auto-learning (default: 1) - # - # bayes_auto_learn 1 - - Please notice that they are commented out and have never been put in - service. - - What I am wondering, is this though, how do I check besides here to see - if - bayes or auto_learn is on somewhere else - - Would I just look at the headers? Is that the only way and the only other - place to look? - - I know something is learned and stored somewhere. - - How do I clear this? Can I do it selectively or does it all have to be - dusted. I never knowingly turned on any learning. - - Let me know if you need more info... - - Thanks and kind regards - - - rh I usually do not reply to my own yet I have more data/info for you In /home/spamd/.spamassassin it looks like this -rw--- 1 spamd spamd 10473472 Sep 28 10:20 auto-whitelist -rw--- 1 spamd spamd 3624 Sep 28 10:20 bayes_journal -rw--- 1 spamd spamd 5177344 Sep 28 10:20 bayes_seen -rw--- 1 spamd spamd 5386240 Sep 28 10:20 bayes_toks So obviously something is happening. I have used spamassassin for a long time, it is just now that I am trying to learn more and get into the nuts and bolts for all involved. Any pointers to what to search for on google or where to make changes would be appreciated. I know where the spamassassin site it, I am just not familiar with the terminologies so I can do better searching and researching please. Thanks again - rh -- Robert - Abba Communications Computer Internet Services (509) 624-7159 - www.abbacomm.net
Re: local.cf auto learn configs and defaults?
On Thu, September 28, 2006 1:08 pm, Email Lists said: # Use Bayesian classifier (default: 1) # # use_bayes 1 # Bayesian classifier auto-learning (default: 1) # # bayes_auto_learn 1 Please notice that they are commented out and have never been put in service. Since those are the default values, commenting them out doesn't have much effect. You would have to uncomment and change them to deactivate those features. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: local.cf auto learn configs and defaults?
After awhile I removed all of it and restarted everything yet the test domain I did this with at first is still getting really high spam scores and is causing me a problem cause it is a secondary mail account live domain etc. Its probably an AWL score, but without showing us a list of the tests hit on one of these emails all we can do is throw straws in the air and guess. Loren
RE: local.cf auto learn configs and defaults?
- - Its probably an AWL score, but without showing us a list of the tests hit - on - one of these emails all we can do is throw straws in the air and guess. - - Loren - Ok, a box of straws will be on the way immediately... Any special colors? ;- I appreciate your time and that of Daniel T. Staal so far... ...as it confirms what I thought... yet I still needed to ask What is AWL? :-) yeah, ill search yet am certainly looking for insight. Oh, how do I properly blow away (from the command line) any saved settings that SA or sa-learn or whatever is looking at that is has learned without frying my systems? Thanks and kind regards - rh -- Robert - Abba Communications Computer Internet Services (509) 624-7159 - www.abbacomm.net
RE: local.cf auto learn configs and defaults?
Email Lists wrote: Its probably an AWL score, but without showing us a list of the tests hit on one of these emails all we can do is throw straws in the air and guess. Loren Ok, a box of straws will be on the way immediately... Any special colors? ;- I appreciate your time and that of Daniel T. Staal so far... ...as it confirms what I thought... yet I still needed to ask What is AWL? :-) yeah, ill search yet am certainly looking for insight. AWL is the Auto White List (although it would be more properly called a score averager). What it does is weight the spam scores towards the sender's previous scores. In this case, it may be providing a rather high positive score to the emails since your rule caused him to have high scores previously. If you look at the message headers, you should see AWL listed if this is what is causing the high score. Oh, how do I properly blow away (from the command line) any saved settings that SA or sa-learn or whatever is looking at that is has learned without frying my systems? You can clear the AWL for a sender like this: spamassassin --remove-addr-from-whitelist [EMAIL PROTECTED] ([EMAIL PROTECTED] is the sender) Make sure you do this as the user who is having the problem. Thanks and kind regards If this doesn't help, post the headers from one of the messages so that we can see which rules are hitting. -- Bowie
RE: local.cf auto learn configs and defaults?
- - You can clear the AWL for a sender like this: - - spamassassin --remove-addr-from-whitelist [EMAIL PROTECTED] - - ([EMAIL PROTECTED] is the sender) - - Make sure you do this as the user who is having the problem. - - Thanks and kind regards - - If this doesn't help, post the headers from one of the messages so - that we can see which rules are hitting. - - -- - Bowie Can this removal be a wildcard? [EMAIL PROTECTED] Remember the test rule created was for a whole functional domain - rh -- Robert - Abba Communications Computer Internet Services (509) 624-7159 - www.abbacomm.net
Re: Bayeys: auto-learn vs. manual training
On Sonntag, 23. April 2006 04:02 Gaute Lund wrote: So, I was hoping to get a different opinion here. I use bayes per server, not per user or domain. I've set autolearn, with everything 8+ points as spam, below +1 as ham. bayes_auto_learn_threshold_spam 8.00 bayes_auto_learn_threshold_nonspam 1.0 In addition, there are several honeypot SPAM harvesters, which are explicitly excluded from scanning via amavis, which are semi-auto fed to SA and learned as SPAM. This is working good for me, although I don't have stats telling me explicitly when bayes made a difference between being tagged or not. I'd like to do this, but didn't find a good tool for that until now. I can say that I get near-zero SPAM, but I have many more tools than SA alone, for example greylisting and such. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgp8qR6fYo2hO.pgp Description: PGP signature
AWL and Auto Learn Bayes
Since finding out the trusted_network issue I question the rest of my local.cf setup. Right now I have AWL turned off and auto learning for bayes turned off. My question is does SA benefit from turning those 2 back on? Of course I would clear out AWL and bayes and start from scratch if I did. But would it make it easier for bayes to be poisoned if I turned auto learn on? Im on SA 3.0.1. Thanks Robert
Re: AWL and Auto Learn Bayes
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Bartlett wrote: Since finding out the trusted_network issue I question the rest of my local.cf setup. Right now I have AWL turned off and auto learning for bayes turned off. My question is does SA benefit from turning those 2 back on? Of course I would clear out AWL and bayes and start from scratch if I did. But would it make it easier for bayes to be poisoned if I turned auto learn on? Im on SA 3.0.1. Opinions vary. Here's mine: AWL: I use it, and it has never caused me pain, however it does sometimes assigned positive, i.e. bad, scores to ham senders (esp. on this list). I'm considering turning it off because it doesn't add *enough* value to my setup to justify the CPU cycles it burns. Bayes: Round here, BAYES_99 hit 92.77% of spam and 0.40% of ham in the last year, BAYES_00 hit 80.31% of ham and 0.41% of spam in the same period. I'd say a well trained bayed db is *very* worthwhile. As for bayes poison, I don't think it exists in the sense it's being used here. Almost every spam I've got with long, random paragraphs in it is scored by bayes just fine. My bayes FP rate doesn't get pushed up either. Regards, C. - -- Craig McLeanhttp://fukka.co.uk [EMAIL PROTECTED] Where the fun never starts Powered by FreeBSD, and GIN! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFDxqNRMDDagS2VwJ4RAtwgAKDQlxuG1W7kLYQb1e/6v/YGQ+QrLwCg+T7w I/LF8QnC/kE+CCm0VwarWbo= =578H -END PGP SIGNATURE-
Re: AWL and Auto Learn Bayes
local.cf setup. Right now I have AWL turned off and auto learning for bayes turned off. My question is does SA benefit from turning those 2 back on? Of You are in opinion territory. Some people like AWL and some don't much care for it. I personally have it turned off, and don't feel any lack. However, it might be of some help for you. Or it might cause problems, depending on the sort of mail people send there. Bayes autolearning *by itself*, is almost always the wrong decision. However, if you have a moderately well trained bayes database to start with (hand trained), AND you have moderately clean ham and spam feeds, then it can potentially help. Again, I have it turned off, and only hand train now and then when it seems necessary. You do have to watch your bayes scores if you are using autoleanring to catch any tendency where bayes might be starting to get the wrong ideas, and do something about it before it becomes a problem. Loren
Re: AWL and Auto Learn Bayes
Robert Bartlett wrote: Since finding out the trusted_network issue I question the rest of my local.cf setup. Right now I have AWL turned off and auto learning for bayes turned off. My question is does SA benefit from turning those 2 back on? Of course I would clear out AWL and bayes and start from scratch if I did. But would it make it easier for bayes to be poisoned if I turned auto learn on? Im on SA 3.0.1. Warning: 3.0.1 is subject to a DOS vulnerability. Unless you're using a distro-port which has backported fixes, consider upgrading to 3.0.5 or 3.1.0. Personally, I don't like the AWL, but it does have its uses. The nice thing about the AWL is you'll avoid FPs from people who frequently mail you. At the same time, people who frequently spam you will be less likely to have a FN, but since spam addresses change constantly this is less common. The AWL can be poisoned by a slightly clever spammer, but at best this gets them a half off your score for the real spam. For this reason, I keep it disabled. Bayes autolearning can be very useful, but I'd suggest adjusting the ham learning threshold. The default of 0.5 is too high for my liking and can sometimes cause problems. I run mine at -0.01 and have added a bunch of simple rules with small negative scores (-0.01 to -0.1) such that common business related ham messages will get autolearned. That said, several people like Justin Mason run the bayes system on autolearning only with the default settings and have no problems. Really most of the sites I've seen where the autolearner went awry started off with no manual training. While most of the time this goes OK, sometimes the bayes DB can start off on the wrong foot due to some low-scoring spam. It seems that this is the biggest risk of the bayes autolearner.
Re: AWL and Auto Learn Bayes
Heute (12.01.2006/20:04 Uhr) schrieb Matt Kettler ([EMAIL PROTECTED]), The AWL can be poisoned by a slightly clever spammer, but at best this gets them a half off your score for the real spam. For this reason, I keep it disabled. Bayes autolearning can be very useful, but I'd suggest adjusting the ham learning threshold. The default of 0.5 is too high for my liking and can sometimes cause problems. how can I change the defaults of learning threshold? I use SA V 3.1.0 I run mine at -0.01 and have added a bunch of simple rules with small negative scores (-0.01 to -0.1) such that common business related ham messages will get autolearned. That said, several people like Justin Mason run the bayes system on autolearning only with the default settings and have no problems. Really most of the sites I've seen where the autolearner went awry started off with no manual training. While most of the time this goes OK, sometimes the bayes DB can start off on the wrong foot due to some low-scoring spam. It seems that this is the biggest risk of the bayes autolearner. -- Viele Gruesse, Kind regards, Jim Knuth [EMAIL PROTECTED] ICQ #277289867 - VoIP: +49 (0) 322 212 044 67 Key ID: 0x1F78066F -- Zufalls-Zitat -- Die Schwierigkeit liegt darin, daß wir als Menschen nicht nur Probleme lösen, sondern auch Probleme schaffen. (Edward Teller) -- Der Text hat nichts mit dem Empfaenger der Mail zu tun -- Virus free. Checked by NOD32 Version 1.1363 Build 6612 12.01.2006
Re: AWL and Auto Learn Bayes
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jim Knuth wrote: [snip] how can I change the defaults of learning threshold? I use SA V 3.1.0 http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html#learning_options Specifically, bayes_auto_learn_threshold_nonspam and bayes_auto_learn_threshold_spam C. - -- Craig McLeanhttp://fukka.co.uk [EMAIL PROTECTED] Where the fun never starts Powered by FreeBSD, and GIN! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFDxquHMDDagS2VwJ4RAu7yAJwMerw6z+HTG1EsYJPz0J/0xocxBACeLDMt gm0hP0p6Zj76V6x98dReqxw= =bjQa -END PGP SIGNATURE-
Re: AWL and Auto Learn Bayes
Jim Knuth wrote: Heute (12.01.2006/20:04 Uhr) schrieb Matt Kettler ([EMAIL PROTECTED]), The AWL can be poisoned by a slightly clever spammer, but at best this gets them a half off your score for the real spam. For this reason, I keep it disabled. Bayes autolearning can be very useful, but I'd suggest adjusting the ham learning threshold. The default of 0.5 is too high for my liking and can sometimes cause problems. how can I change the defaults of learning threshold? I use SA V 3.1.0 See the docs for the autolearner: http://spamassassin.apache.org/full/3.1.x/dist/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html and see the bayes_auto_learn_threshold_nonspam option. (That's assuming you did not disable the loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold in your v310.pre, otherwise you'll have to turn it back on first. )
Re: AWL and Auto Learn Bayes
Craig McLean wrote: Jim Knuth wrote: [snip] how can I change the defaults of learning threshold? I use SA V 3.1.0 http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html#learning_options Specifically, bayes_auto_learn_threshold_nonspam and bayes_auto_learn_threshold_spam Craig, Those are 3.0.x docs, not 3.1.x docs. One thing to note is this option was removed from Mail::SpamAssassin::Conf in 3.1.0 because the autolearner is now a plugin and can actually be removed completely from SA. The threshold options are now documented in the AutoLearnThreshold plugin docs So the docs you are referencing will work in a default 3.1.0 config, but they can be syntax errors in 3.1.0 if someone turned the plugin off.
Re: AWL and Auto Learn Bayes
Jim Knuth wrote: Heute (12.01.2006/20:24 Uhr) schrieb Matt Kettler ([EMAIL PROTECTED]), Bayes autolearning can be very useful, but I'd suggest adjusting the ham learning threshold. The default of 0.5 I see as default 0.1 Yeah, my bad. I thought it was 0.5, but it's not that bad. I still dislike having a positive threshold. http://spamassassin.apache.org/full/3.1.x/dist/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html and this is what I do including in my local.cf? Yep. If you've got the plugin loaded, those are ordinary SA config options.
Re: AWL and Auto Learn Bayes
Heute (12.01.2006/20:51 Uhr) schrieb Matt Kettler ([EMAIL PROTECTED]), Jim Knuth wrote: Heute (12.01.2006/20:24 Uhr) schrieb Matt Kettler ([EMAIL PROTECTED]), Bayes autolearning can be very useful, but I'd suggest adjusting the ham learning threshold. The default of 0.5 I see as default 0.1 Yeah, my bad. I thought it was 0.5, but it's not that bad. I still dislike having a positive threshold. ;) I already thought, I am stupid. http://spamassassin.apache.org/full/3.1.x/dist/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html and this is what I do including in my local.cf? Yep. If you've got the plugin loaded, those are ordinary SA config options. cool, thank you. -- Viele Gruesse, Kind regards, Jim Knuth [EMAIL PROTECTED] ICQ #277289867 - VoIP: +49 (0) 322 212 044 67 Key ID: 0x1F78066F -- Zufalls-Zitat -- Wär die Katze ein Pferd, könnte man die Bäume raufreiten. -- Der Text hat nichts mit dem Empfaenger der Mail zu tun -- Virus free. Checked by NOD32 Version 1.1363 Build 6612 12.01.2006
RE: AWL and Auto Learn Bayes
Need to upgrade heh? I installed it from source and am very weary of doing any updates as it's a production server. But I guess I don't have a chance. Any documents out there you suggest reading on steps to upgrade it? Thanks Robert -Original Message- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Thursday, January 12, 2006 12:04 PM To: Robert Bartlett Cc: users@spamassassin.apache.org Subject: Re: AWL and Auto Learn Bayes Robert Bartlett wrote: Since finding out the trusted_network issue I question the rest of my local.cf setup. Right now I have AWL turned off and auto learning for bayes turned off. My question is does SA benefit from turning those 2 back on? Of course I would clear out AWL and bayes and start from scratch if I did. But would it make it easier for bayes to be poisoned if I turned auto learn on? Im on SA 3.0.1. Warning: 3.0.1 is subject to a DOS vulnerability. Unless you're using a distro-port which has backported fixes, consider upgrading to 3.0.5 or 3.1.0. Personally, I don't like the AWL, but it does have its uses. The nice thing about the AWL is you'll avoid FPs from people who frequently mail you. At the same time, people who frequently spam you will be less likely to have a FN, but since spam addresses change constantly this is less common. The AWL can be poisoned by a slightly clever spammer, but at best this gets them a half off your score for the real spam. For this reason, I keep it disabled. Bayes autolearning can be very useful, but I'd suggest adjusting the ham learning threshold. The default of 0.5 is too high for my liking and can sometimes cause problems. I run mine at -0.01 and have added a bunch of simple rules with small negative scores (-0.01 to -0.1) such that common business related ham messages will get autolearned. That said, several people like Justin Mason run the bayes system on autolearning only with the default settings and have no problems. Really most of the sites I've seen where the autolearner went awry started off with no manual training. While most of the time this goes OK, sometimes the bayes DB can start off on the wrong foot due to some low-scoring spam. It seems that this is the biggest risk of the bayes autolearner.
RE: AWL and Auto Learn Bayes
Hello, Thanks for the help. It seems to be painless, but in the headers it still shows 3.0.1, but when I startup SA in debug mode it says 3.0.5. I vagually remember a file I had to edit to show the proper version in the email headers? Maybe Im thinking of something else? I tried restarting SA, still shows 3.0.1. Or does it really matter? I just want to make sure 3.0.5 is running. Robert -Original Message- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Thursday, January 12, 2006 2:16 PM To: Robert Bartlett Cc: users@spamassassin.apache.org Subject: Re: AWL and Auto Learn Bayes Robert Bartlett wrote: Need to upgrade heh? I installed it from source and am very weary of doing any updates as it's a production server. But I guess I don't have a chance. Any documents out there you suggest reading on steps to upgrade it? Being a production box, I'd suggest doing an upgrade to 3.0.5 instead of 3.1.0. This will fix the significant problems, but doesn't change any configuration options, requirements or APIs.. it's just a basic maintenance release, and in theory should go very smoothly with no hitches. Upgrading to 3.1.0 could be more work as some config items changed in the process. You can get 3.0.5 from the archive: http://archive.apache.org/dist/spamassassin/ You should be able to go through a normal procedure for installing SA to do the upgrade and you shouldn't need to change any configuration. Just untar it, perl Makefile.PL make make install Should be quite painless. The only problem I've heard with the 3.0.5 release is those who use rpmbuild to make custom RPMs from it have to fix the spec file (because they missed updating the tarball name in the specfile). However, if you're just installing from source this isn't relevant.