Re: Inconsistent spam scores between spam headers and rewritten subject line.
On Tue, 2011-08-16 at 22:29 +0930, Rodney Baker wrote: > On Tue, 16 Aug 2011 05:02:20 John Hardin wrote: > > Just as a test, if you comment that bit out of your personal .procmailrc > > does everything work they way you'd expect (i.e. one SA pass, the correct > > score in the X- headers)? > > Yep,that was the first thing that I did. Somehow spamassassin is still > checking the messages, even though they're not being piped through spamc via > procmail. I'm sure that fetchmail isn't doing it, so that leaves sendmail, > dovecot or kmail. So begins the process of elimination (or maybe I just leave > it out of procmailrc and be done with it...). If you don't use Delivery Control Options with fetchmail (see that section in the man pages) like an explicit MDA or SMTP, this should not be where SA gets invoked. You don't, do you? The default is to pass it on to port 25, which should just be your Sendmail. A site-wide procmail configuration doesn't exist, as you mentioned in another reply to this thread. Dovecot will not filter messages. It's an IMAP server that serves what has been delivered already. The dovecot MDA could, but you seem to use procmail for direct delivery into the Maildir store. Another one to rule out. Kmail as an MUA must not modify delivered mail (and doesn't), so while it could call SA again, you won't see SA headers. Both Dovecot and Kmail are after the procmail recipe you initially showed anyway, so there's no chance they could cause the matching issues you reported. Leaves us with Sendmail in the chain to dig further... After all, procmail already sees SA headers, without a filter. What you're hunting for is before procmail in the chain. Regarding "leaving it out of procmail" and being done with it -- maybe. This is likely to bite later, though. If it is before procmail, odds are it's using a site-wide user. Which implies Bayes training has to be done as that user, not the recipient... -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Inconsistent spam scores between spam headers and rewritten subject line.
On 8/16/2011 8:55 AM, Rodney Baker wrote: > On Tue, 16 Aug 2011 07:36:05 Karsten Bräckelmann wrote: > >> After you fixed your mail processing chain to not have SA chew twice on >> the spam -- you should manually train Bayes, feeding it a lot of hand >> classified spam, and possibly ham. Check your 'sa-learn --dump magic' >> numbers. The Bayes score of 0.1 is way out of line. > Agreed. I do run sa-learn --spam (actually now have it scheduled to run > weekly > on a folder into which I drop all the non-classified spam messages) and --ham > (on a folder with messages that were false-positives). When you are trying to fix a Bayes problem, it can be useful to feed it as much as possible. Put *all* your ham and *all* your spam (properly classified or not) into those folders and let Bayes learn from it. -- Bowie
Re: Inconsistent spam scores between spam headers and rewritten subject line.
On Tue, 16 Aug 2011 05:02:20 John Hardin wrote: > On Tue, 16 Aug 2011, Rodney Baker wrote: > > :0fw: spamassassin.lock > > : > > | spamc > > Just as a test, if you comment that bit out of your personal .procmailrc > does everything work they way you'd expect (i.e. one SA pass, the correct > score in the X- headers)? Yep,that was the first thing that I did. Somehow spamassassin is still checking the messages, even though they're not being piped through spamc via procmail. I'm sure that fetchmail isn't doing it, so that leaves sendmail, dovecot or kmail. So begins the process of elimination (or maybe I just leave it out of procmailrc and be done with it...). Thanks, Rodney. -- == Rodney Baker rod...@jeremiah31-10.net web: www.jeremiah31-10.net ==
Re: Inconsistent spam scores between spam headers and rewritten subject line.
On Tue, 16 Aug 2011 07:36:05 Karsten Bräckelmann wrote: > On Tue, 2011-08-16 at 01:07 +0930, Rodney Baker wrote: > > On Tue, 16 Aug 2011 00:48:13 Bowie Bailey wrote: > > > >* ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).* > > > >$HOME/Maildir/.Spam// > > > > > > > > I'm attempting to filter on the modified subject line (which for some > > > > reason isn't working - that rule never seems to match and spam never > > > > gets moved into the Spam folder, even though I've tested the regex > > > > manually). I thought of filtering on the X-Spam-Status header > > > > instead, but when I had a look at a message that was marked as Spam > > > > (according to the subject line) I found something rather strange... > > Yes, filtering on the SA X-Spam Status or Level headers is the way to > go. After you found and fixed where SA gets called a second time > (actually the first time), these won't be harmed and overwritten -- and > useful for filtering. > > Anyway, the secret why the above procmail recipe doesn't work is simply, > because procmail uses a rather limited sub-set of REs and its own > flavor. It's not PCRE. > > In particular procmail does not understand {x,y} range quantifiers, but > treats that part as a plain string to match. Which doesn't. > (Caveat: From memory, not actually looked it up again for verification.) Ah, thankyou. Despite googling for lots of stuff on procmail I've not been able to find a definitive reference for what can and can't be used in a procmail recipe. Maybe I just haven't use the right search terms (or maybe I just haven't understood what I've read). Anyway, thanks for the clarification. > > > > > 3.8 KB_DATE_CONTAINS_TAB KB_DATE_CONTAINS_TAB > > > > 3.0 IMPOTENCE BODY: Impotence cure > > > > > > > >-0.0 BAYES_20 BODY: Bayes spam probability is 5 to > > > >20% > > > > > > > >[score: 0.1050] > > > > > > > > 2.0 KB_FAKED_THE_BAT KB_FAKED_THE_BAT > > > > 1.2 RDNS_NONE Delivered to internal network by a > > > > host with no > > > > > > > >rDNS > > Oh, yeah, these do ring quite some bells... ;) > > After you fixed your mail processing chain to not have SA chew twice on > the spam -- you should manually train Bayes, feeding it a lot of hand > classified spam, and possibly ham. Check your 'sa-learn --dump magic' > numbers. The Bayes score of 0.1 is way out of line. Agreed. I do run sa-learn --spam (actually now have it scheduled to run weekly on a folder into which I drop all the non-classified spam messages) and --ham (on a folder with messages that were false-positives). > > Note though, that a previous site-wide SA filter might use a site-wide > user, not the one owning the procmail recipe. Thus Bayes scores might > suddenly change once it's run per user. Check the numbers and > performance for the user you'll use after fixing the chain issue. > > > > You need to fix whatever is causing the message to be scanned twice. > > > > OK - that makes sense. Now I'm wondering if there is a global mail config > > somewhere that is routing the message through SA, and then my local > > .procmailrc is doing it again. Time to go digging... > > Site-wide /etc/procmailrc, SMTP server milter, transport or similar, or > even something like Amavis in the chain? There is no /etc/procmailrc, no milter that I'm aware of, running fetchmail/sendmail/dovecot. This machine doubles as my home mail server/file server and desktop machine. The only reason I'm running IMAP is so that I can access the same mail from my laptop or netbook when I need to (and I used to run squirrelmail to allow access remotely via https webmail, but not any more). > > > That then leaves the question as to why my procmail recipe isn't > > triggering on the rewritten subject, but that is probably not for this > > list. > > It's sufficiently related. ;) See above. Thanks again. :-) -- == Rodney Baker rod...@jeremiah31-10.net web: www.jeremiah31-10.net ==
Re: Inconsistent spam scores between spam headers and rewritten subject line.
On Tue, 2011-08-16 at 01:07 +0930, Rodney Baker wrote: > On Tue, 16 Aug 2011 00:48:13 Bowie Bailey wrote: > > >* ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).* > > >$HOME/Maildir/.Spam// > > > > > > I'm attempting to filter on the modified subject line (which for some > > > reason isn't working - that rule never seems to match and spam never > > > gets moved into the Spam folder, even though I've tested the regex > > > manually). I thought of filtering on the X-Spam-Status header instead, > > > but when I had a look at a message that was marked as Spam (according to > > > the subject line) I found something rather strange... Yes, filtering on the SA X-Spam Status or Level headers is the way to go. After you found and fixed where SA gets called a second time (actually the first time), these won't be harmed and overwritten -- and useful for filtering. Anyway, the secret why the above procmail recipe doesn't work is simply, because procmail uses a rather limited sub-set of REs and its own flavor. It's not PCRE. In particular procmail does not understand {x,y} range quantifiers, but treats that part as a plain string to match. Which doesn't. (Caveat: From memory, not actually looked it up again for verification.) > > > 3.8 KB_DATE_CONTAINS_TAB KB_DATE_CONTAINS_TAB > > > 3.0 IMPOTENCE BODY: Impotence cure > > >-0.0 BAYES_20 BODY: Bayes spam probability is 5 to 20% > > >[score: 0.1050] > > > 2.0 KB_FAKED_THE_BAT KB_FAKED_THE_BAT > > > 1.2 RDNS_NONE Delivered to internal network by a host > > > with no > > >rDNS Oh, yeah, these do ring quite some bells... ;) After you fixed your mail processing chain to not have SA chew twice on the spam -- you should manually train Bayes, feeding it a lot of hand classified spam, and possibly ham. Check your 'sa-learn --dump magic' numbers. The Bayes score of 0.1 is way out of line. Note though, that a previous site-wide SA filter might use a site-wide user, not the one owning the procmail recipe. Thus Bayes scores might suddenly change once it's run per user. Check the numbers and performance for the user you'll use after fixing the chain issue. > > You need to fix whatever is causing the message to be scanned twice. > > OK - that makes sense. Now I'm wondering if there is a global mail config > somewhere that is routing the message through SA, and then my local > .procmailrc is doing it again. Time to go digging... Site-wide /etc/procmailrc, SMTP server milter, transport or similar, or even something like Amavis in the chain? > That then leaves the question as to why my procmail recipe isn't triggering > on > the rewritten subject, but that is probably not for this list. It's sufficiently related. ;) See above. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Inconsistent spam scores between spam headers and rewritten subject line.
On Tue, 16 Aug 2011, Rodney Baker wrote: :0fw: spamassassin.lock | spamc Just as a test, if you comment that bit out of your personal .procmailrc does everything work they way you'd expect (i.e. one SA pass, the correct score in the X- headers)? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- ...for a nation to tax itself into prosperity is like a man standing in a bucket and trying to lift himself up by the handle. -- Winston Churchill --- Today: the 66th anniversary of the end of World War II
Re: Inconsistent spam scores between spam headers and rewritten subject line.
On Tue, 16 Aug 2011 01:15:11 Walter Hurry wrote: > On Mon, 15 Aug 2011 11:18:13 -0400, Bowie Bailey wrote: > > On 8/15/2011 10:57 AM, Rodney Baker wrote: > > > >>:0 > >> > >>* ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).* $HOME/Maildir/.Spam// > > > > > This message is going through SA twice. > > Indeed. And by the way, for what it is worth, my .procmailrc says (inter > alia) > > :0: > * ^X-Spam-Status: Yes > # The trailing slashdot means do it as MH > # instead of MBOX (the default) > junk/. > > # Otherwise it falls through > > May I suggest that that's rather simpler than the regex which you are > using? > Of course, and that's what I wanted to do, except that if you have a look at my X-Spam-Status header it says "No", which is the opposite of what I expect for a message marked as spam (apparently due, as already suggested, to spamassassin processing the message twice). > In addition, should I in the future decide for some reason to change or > revoke the subject rewriting, I won't need to change .procmailrc. Of course, if I can just get the message flagged as Spam in the headers, I'll be able to do the same. ;-) -- == Rodney Baker rod...@jeremiah31-10.net web: www.jeremiah31-10.net ==
Re: Inconsistent spam scores between spam headers and rewritten subject line.
On Mon, 15 Aug 2011 11:18:13 -0400, Bowie Bailey wrote: > On 8/15/2011 10:57 AM, Rodney Baker wrote: >>:0 >>* ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).* $HOME/Maildir/.Spam// > This message is going through SA twice. Indeed. And by the way, for what it is worth, my .procmailrc says (inter alia) :0: * ^X-Spam-Status: Yes # The trailing slashdot means do it as MH # instead of MBOX (the default) junk/. # Otherwise it falls through May I suggest that that's rather simpler than the regex which you are using? In addition, should I in the future decide for some reason to change or revoke the subject rewriting, I won't need to change .procmailrc.
Re: Inconsistent spam scores between spam headers and rewritten subject line.
On Tue, 16 Aug 2011 00:48:13 Bowie Bailey wrote: > On 8/15/2011 10:57 AM, Rodney Baker wrote: > > Hi all. I'm running spamassassin 3.3.1 on my openSuse 11.2 box at home. > > Mail is collected from multiple ISP mail accounts via fetchmail and > > delivered to local IMAP mail folders via procmail. My user account > > .procmailrc file begins > > > > thus: > >LOGFILE=$HOME/pm.log > > > >:0fw: spamassassin.lock > >: > >| spamc > >: > >:0 > > > >* ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).* > >$HOME/Maildir/.Spam// > > > > I'm attempting to filter on the modified subject line (which for some > > reason isn't working - that rule never seems to match and spam never > > gets moved into the Spam folder, even though I've tested the regex > > manually). I thought of filtering on the X-Spam-Status header instead, > > but when I had a look at a message that was marked as Spam (according to > > the subject line) I found something rather strange... > > > >X-Virus-Flag: no > >X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on > > > > > > > >X-Spam-Level: * > >X-Spam-Status: No, score=1.5 required=6.5 > > > > tests=BAYES_00,IMPOTENCE,NO_RELAYS > > > > autolearn=no version=3.3.1 > > > >X-Spam-Virus: No > >Received: from localhost by > > > > with SpamAssassin (version 3.3.1); > > Mon, 15 Aug 2011 18:58:01 +0930 > > > >From: "Adele Key" > >To: another.u...@iinet.net.au > >Subject: SPAM(10.1) > >Date: Mon, 15 Aug 2011 18:12:48 +0900 > >Message-Id: <165971112.54106003786840@spamdomain.removed> > >MIME-Version: 1.0 > >Content-Type: multipart/mixed; > >boundary="--=_4E48E6A1.127A41A2" > >X-Length: 7330 > >X-UID: 83487 > >X-KMail-Filtered: 61220 > >Status: R > >X-Status: N > >X-KMail-EncryptionState: > >X-KMail-SignatureState: > > > >X-KMail-MDN-Sent: > > Spam detection software, running on the system > > , has > > identified this incoming email as possible spam. The original message > > has been attached to this so you can view it (if it isn't spam) or > > label similar future email. If you have any questions, see > > postmaster for details. > > > > > > Content preview: [...] > > > > > > Content analysis details: (10.1 points, 6.5 required) > > > >pts rule name description > > -- > >-- 3.8 > >KB_DATE_CONTAINS_TAB KB_DATE_CONTAINS_TAB > >3.0 IMPOTENCE BODY: Impotence cure > >-0.0 BAYES_20 BODY: Bayes spam probability is 5 to 20% > > > > [score: 0.1050] > > > >2.0 KB_FAKED_THE_BAT KB_FAKED_THE_BAT > >1.2 RDNS_NONE Delivered to internal network by a host > >with no > > > > rDNS > > > > > > I don't get it - the content analysis shows a score of 10.1, the modified > > subject line shows 10.1, but the X-Spam-Status header shows 1.5! What > > have I messed up in my configuration? > > This message is going through SA twice. > > The first time, it is marked as spam and the message is re-written per > your "report_safe" setting. This generates the analysis shown in the > body itself. > > The second time, the re-written message is scanned by SA. This time, > all of the incriminating stuff has been hidden by the rewrite, so it is > not marked as spam. This is the analysis shown in the header. > > You need to fix whatever is causing the message to be scanned twice. OK - that makes sense. Now I'm wondering if there is a global mail config somewhere that is routing the message through SA, and then my local .procmailrc is doing it again. Time to go digging... That then leaves the question as to why my procmail recipe isn't triggering on the rewritten subject, but that is probably not for this list. Thanks for the pointer. Rodney. -- == Rodney Baker rod...@jeremiah31-10.net web: www.jeremiah31-10.net ==
Re: Inconsistent spam scores between spam headers and rewritten subject line.
On 8/15/2011 10:57 AM, Rodney Baker wrote: > Hi all. I'm running spamassassin 3.3.1 on my openSuse 11.2 box at home. Mail > is collected from multiple ISP mail accounts via fetchmail and delivered to > local IMAP mail folders via procmail. My user account .procmailrc file begins > thus: > >LOGFILE=$HOME/pm.log > >:0fw: spamassassin.lock >| spamc > > >:0 >* ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).* >$HOME/Maildir/.Spam// > > I'm attempting to filter on the modified subject line (which for some reason > isn't working - that rule never seems to match and spam never gets moved into > the Spam folder, even though I've tested the regex manually). I thought of > filtering on the X-Spam-Status header instead, but when I had a look at a > message that was marked as Spam (according to the subject line) I found > something rather strange... > >X-Virus-Flag: no >X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on > >X-Spam-Level: * >X-Spam-Status: No, score=1.5 required=6.5 > tests=BAYES_00,IMPOTENCE,NO_RELAYS > autolearn=no version=3.3.1 >X-Spam-Virus: No >Received: from localhost by > with SpamAssassin (version 3.3.1); > Mon, 15 Aug 2011 18:58:01 +0930 >From: "Adele Key" >To: another.u...@iinet.net.au >Subject: SPAM(10.1) >Date: Mon, 15 Aug 2011 18:12:48 +0900 >Message-Id: <165971112.54106003786840@spamdomain.removed> >MIME-Version: 1.0 >Content-Type: multipart/mixed; >boundary="--=_4E48E6A1.127A41A2" >X-Length: 7330 >X-UID: 83487 >X-KMail-Filtered: 61220 >Status: R >X-Status: N >X-KMail-EncryptionState: >X-KMail-SignatureState: >X-KMail-MDN-Sent: > > Spam detection software, running on the system > , has > identified this incoming email as possible spam. The original message > has been attached to this so you can view it (if it isn't spam) or label > similar future email. If you have any questions, see > postmaster for details. > > > Content preview: [...] > > > Content analysis details: (10.1 points, 6.5 required) > > >pts rule name description > -- -- >3.8 KB_DATE_CONTAINS_TAB KB_DATE_CONTAINS_TAB >3.0 IMPOTENCE BODY: Impotence cure >-0.0 BAYES_20 BODY: Bayes spam probability is 5 to 20% > [score: 0.1050] >2.0 KB_FAKED_THE_BAT KB_FAKED_THE_BAT >1.2 RDNS_NONE Delivered to internal network by a host with no > > rDNS > > > I don't get it - the content analysis shows a score of 10.1, the modified > subject line shows 10.1, but the X-Spam-Status header shows 1.5! What have I > messed up in my configuration? This message is going through SA twice. The first time, it is marked as spam and the message is re-written per your "report_safe" setting. This generates the analysis shown in the body itself. The second time, the re-written message is scanned by SA. This time, all of the incriminating stuff has been hidden by the rewrite, so it is not marked as spam. This is the analysis shown in the header. You need to fix whatever is causing the message to be scanned twice. -- Bowie
Inconsistent spam scores between spam headers and rewritten subject line.
Hi all. I'm running spamassassin 3.3.1 on my openSuse 11.2 box at home. Mail is collected from multiple ISP mail accounts via fetchmail and delivered to local IMAP mail folders via procmail. My user account .procmailrc file begins thus: LOGFILE=$HOME/pm.log :0fw: spamassassin.lock | spamc :0 * ^Subject.*SPAM\([0-9]{1,3}\.[0-9]\).* $HOME/Maildir/.Spam// I'm attempting to filter on the modified subject line (which for some reason isn't working - that rule never seems to match and spam never gets moved into the Spam folder, even though I've tested the regex manually). I thought of filtering on the X-Spam-Status header instead, but when I had a look at a message that was marked as Spam (according to the subject line) I found something rather strange... X-Virus-Flag: no X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on X-Spam-Level: * X-Spam-Status: No, score=1.5 required=6.5 tests=BAYES_00,IMPOTENCE,NO_RELAYS autolearn=no version=3.3.1 X-Spam-Virus: No Received: from localhost by with SpamAssassin (version 3.3.1); Mon, 15 Aug 2011 18:58:01 +0930 From: "Adele Key" To: another.u...@iinet.net.au Subject: SPAM(10.1) Date: Mon, 15 Aug 2011 18:12:48 +0900 Message-Id: <165971112.54106003786840@spamdomain.removed> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="--=_4E48E6A1.127A41A2" X-Length: 7330 X-UID: 83487 X-KMail-Filtered: 61220 Status: R X-Status: N X-KMail-EncryptionState: X-KMail-SignatureState: X-KMail-MDN-Sent: Spam detection software, running on the system , has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see postmaster for details. Content preview: [...] Content analysis details: (10.1 points, 6.5 required) pts rule name description -- -- 3.8 KB_DATE_CONTAINS_TAB KB_DATE_CONTAINS_TAB 3.0 IMPOTENCE BODY: Impotence cure -0.0 BAYES_20 BODY: Bayes spam probability is 5 to 20% [score: 0.1050] 2.0 KB_FAKED_THE_BAT KB_FAKED_THE_BAT 1.2 RDNS_NONE Delivered to internal network by a host with no rDNS I don't get it - the content analysis shows a score of 10.1, the modified subject line shows 10.1, but the X-Spam-Status header shows 1.5! What have I messed up in my configuration? My /etc/mail/spamassassin/local.cf looks like this: # Add your own customisations to this file. See 'man Mail::SpamAssassin::Conf' # for details of what can be tweaked. # # do not change the subject # to change the subject, e.g. use # rewrite_header Subject SPAM(_SCORE_) rewrite_header subject SPAM(_SCORE_) # Set the score required before a mail is considered spam. # required_score 5.00 # uncomment, if you do not want spamassassin to create a new message # in case of detecting spam # report_safe 0 # Enhance the uridnsbl_skip_domain list with some usefull entries # Do not block the web-sites of Novell and SUSE ifplugin Mail::SpamAssassin::Plugin::URIDNSBL uridnsbl_skip_domain suse.de opensuse.org suse.com suse.org uridnsbl_skip_domain novell.com novell.org novell.ru novell.de novell.hu novell.co.uk uridnsbl_skip_domain kernel.org endif # Mail::SpamAssassin::Plugin::URIDNSBL # Everything above this line is as per the installed openSuSE default ok_languages en #The combination of SpamAssassin + The Bat! as mail client can cause false positives. #The reason for the high spam rating is the Reply-To header inserted by mailman, #which seems to have more quoting than The Bat! can do. #If you have such problem activate the next two lines #header IS_MAILMAN exists:X-Mailman-Version #score IS_MAILMAN -2 required_score 6.5 whitelist_from [...] use_bayes 1 report_header 1 fold_headers 1 report_safe 2 Thanks in advance. Rodney. -- == Rodney Baker rod...@jeremiah31-10.net web: www.jeremiah31-10.net ==