Re: Again AWL confusion
On 5-Aug-2009, at 02:15, a...@exys.org wrote: The point is that scores below 2 are never spam, Er... that's certainly not true. -- *** AgentSmith sets mode: +m
Re: Again AWL confusion
On Wed, 05 Aug 2009 10:15:00 +0200 a...@exys.org wrote: > 2 to 5 is the sweetspot. That message in question actually proved it > is working, since the URIBL hits came later. Then it scores >10 so > it gets rejected. I noticed earlier that you were greylisting for only 60s; that seems like a fairly short delay to affect listing. > I think that setup is fairly smart, I don't run my own mta, but I do something analogous, in that I do an initial test with Bogofilter and use the result to delay spam up to 24 hours before it's processed with SA. I think if I were doing greylisting I might use Bogofilter's ham result to bypass it, and the unsure/spam results to set short or long delay. > excluding the problem that i train SA with wrong information. I think if Bayes is being mistrained, you have the autotrain thresholds wrong. And in your situation, it's not going to be possible to reverse it properly since the signature will change with the received headers.
Re: Again AWL confusion
On Wed, 2009-08-05 at 22:21 +0200, Matus UHLAR - fantomas wrote: > turning off AWL and autolearn (optionally only when run at SMTP time) would > help you here. Although using such setup you loose much of advantages (like > AWL ;-) and especially personalising... > There are cases where AWL is a menace. In my case I run SA as part of the 'pipeline'[*] between fetchmail and Postfix because there's a bad interaction between the way Postfix runs SA as a subservice and its always_bcc directive. I found that in my set-up AWL was consistently giving unhelpful scores, so its been turned off for quite a while now. [*] 'pipeline' because fetchmail's mda option feeds a pipeline leading to the Postfix.sendmail utility that passes the mail to Postfix. Martin
Re: Again AWL confusion
>> On 05.08.09 00:31, Martin Gregorie wrote: >>> If, for some (very) odd reason you run greylisting after SA then *of >>> course* your host has (a) seen the mail and (b) passed it through SA. >>> How else can the mail get to the greylister? >>> >>> Would you care to explain why you put a greylister behind SA? Do you >>> know how a greylister works and why it was designed to work that >>> way? > Matus UHLAR - fantomas wrote: >> He already explained that he greylists only mail that scores above a limit. On 05.08.09 10:15, a...@exys.org wrote: > exactly. The point is that scores below 2 are never spam, so i avoid > greylisting. Thats my whitelist (you usually need for greylisting) at > the same time, since i whitelist some hosts in SA. > >> In that case we can assume the spam scored high even before so it got >> greylisted. In such case I doubt it was learned as ham, unless the >> greylisting check is broken... > above 2. The njabl hit would have been enough to hit that. It didn't > score above 10, because that would have been rejected at smtp time. > > My guess is that it scored 2 on the first try, then later it would have > scored above 10 due to surbl listings, but awl kicks in and lowers the > score thinking the greylisted mail was an independent message. that's it! you can look at spamd logs and search for the same message-id. >>> And where else did greylisted mail appear in the log? For the >>> mail to be logged as rejected by a greylister *after* its been >>> through SA it must also have been inspected by AWL and therefore it did >>> affect the AWL database. > oh right, i could look at the SA log, but i already know it passed SA 3 > times. while repeated learning of the same message does not affect bayes, I think this doesn't apply for AWL. >> the question is, why it scored hammy? aep, how did it score before >> greylisting? Are you sure you do not have bug in your greylisting code? > see above. i'm pretty sure the "bug" is passing the same message to SA > multiple times. >> Btw, I'm not sure if it should not be low scoring messages (spams) for which >> greylisting is very good, since you won't become that early recipient... > 2 to 5 is the sweetspot. That message in question actually proved it is > working, since the URIBL hits came later. Then it scores >10 so it gets > rejected. > I think that setup is fairly smart, excluding the problem that i train > SA with wrong information. > > I wonder if i could ask SA to score a message without learning it, > although exim-sa propably doesnt support that. turning off AWL and autolearn (optionally only when run at SMTP time) would help you here. Although using such setup you loose much of advantages (like AWL ;-) and especially personalising... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. "The box said 'Requires Windows 95 or better', so I bought a Macintosh".
Re: Again AWL confusion
a...@exys.org wrote: > exactly. The point is that scores below 2 are never spam, so i avoid > greylisting. Thats my whitelist (you usually need for greylisting) at > the same time, since i whitelist some hosts in SA. Interesting set-up, although I don't think it would be suitable for a high-volume server. So what do you use to do this? exim-sa and what greylisting software? > above 2. The njabl hit would have been enough to hit that. It didn't > score above 10, because that would have been rejected at smtp time. > > My guess is that it scored 2 on the first try, then later it would have > scored above 10 due to surbl listings, but awl kicks in and lowers the > score thinking the greylisted mail was an independent message. With most greylisting systems, the temporary reject is before the data section (which helps save bandwidth), so it's hard to know if it's two attempts to deliver the same message, or two independent messages. Not so in your case, however. What is auto_whitelist_factor set at? > >>> And where else did greylisted mail appear in the log? For the >>> mail to be logged as rejected by a greylister *after* its been >>> through SA it must also have been inspected by AWL and therefore it did >>> affect the AWL database. >> > oh right, i could look at the SA log, but i already know it passed SA 3 > times. Worth doing. >> the question is, why it scored hammy? aep, how did it score before >> greylisting? Are you sure you do not have bug in your greylisting code? >> > see above. i'm pretty sure the "bug" is passing the same message to SA > multiple times. Well, by definition that isn't an SA bug. Or are you suggesting AWL should check to see if the same Message-ID has been seen before, and if it has, not score or learn? That would be an extra database lookup, and it would mean AWL would also be disabled for valid mail that had been delayed by greylisting (maybe OK, because it presumably hasn't been seen before). Bayes *shouldn't* allow learning of the same message more than once (it's doesn't if you train it manually), but maybe autolearn doesn't update bayes_seen (??). I think the simplest solution for your config is just: use_auto_whitelist 0 bayes_auto_learn 0 Setting 'tflags URIBL_BLACK noautolearn' etc. on the remote tests would probably mean the AWL decrease would be less, because AWL is then just smoothing out the scores from the local tests. None of this sounds very efficient with minimising DNS lookups and reducing carbon footprints... CK
Re: Again AWL confusion
Matus UHLAR - fantomas wrote: On 05.08.09 00:31, Martin Gregorie wrote: If, for some (very) odd reason you run greylisting after SA then *of course* your host has (a) seen the mail and (b) passed it through SA. How else can the mail get to the greylister? Would you care to explain why you put a greylister behind SA? Do you know how a greylister works and why it was designed to work that way? He already explained that he greylists only mail that scores above a limit. exactly. The point is that scores below 2 are never spam, so i avoid greylisting. Thats my whitelist (you usually need for greylisting) at the same time, since i whitelist some hosts in SA. In that case we can assume the spam scored high even before so it got greylisted. In such case I doubt it was learned as ham, unless the greylisting check is broken... above 2. The njabl hit would have been enough to hit that. It didn't score above 10, because that would have been rejected at smtp time. My guess is that it scored 2 on the first try, then later it would have scored above 10 due to surbl listings, but awl kicks in and lowers the score thinking the greylisted mail was an independent message. And where else did greylisted mail appear in the log? For the mail to be logged as rejected by a greylister *after* its been through SA it must also have been inspected by AWL and therefore it did affect the AWL database. oh right, i could look at the SA log, but i already know it passed SA 3 times. the question is, why it scored hammy? aep, how did it score before greylisting? Are you sure you do not have bug in your greylisting code? see above. i'm pretty sure the "bug" is passing the same message to SA multiple times. Btw, I'm not sure if it should not be low scoring messages (spams) for which greylisting is very good, since you won't become that early recipient... 2 to 5 is the sweetspot. That message in question actually proved it is working, since the URIBL hits came later. Then it scores >10 so it gets rejected. I think that setup is fairly smart, excluding the problem that i train SA with wrong information. I wonder if i could ask SA to score a message without learning it, although exim-sa propably doesnt support that.
Re: Again AWL confusion
> On Wed, 2009-08-05 at 00:37 +0200, a...@exys.org wrote: > > Matus UHLAR - fantomas wrote: > > > On 04.08.09 20:09, a...@exys.org wrote: > > >> I have obviously never received any mail from that sender, so why does > > >> it hit? > > >> > > > in later mail you mention that you run SA before greylisting. On 05.08.09 00:31, Martin Gregorie wrote: > If, for some (very) odd reason you run greylisting after SA then *of > course* your host has (a) seen the mail and (b) passed it through SA. > How else can the mail get to the greylister? > > Would you care to explain why you put a greylister behind SA? > Do you know how a greylister works and why it was designed to work that > way? He already explained that he greylists only mail that scores above a limit. In that case we can assume the spam scored high even before so it got greylisted. In such case I doubt it was learned as ham, unless the greylisting check is broken... > > nope. i grepped the global log. the only time that sender ever ocurs it > > was temporary rejected due to greylisting. > And where else did greylisted mail appear in the log? > > For the mail to be logged as rejected by a greylister *after* its been > through SA it must also have been inspected by AWL and therefore it did > affect the AWL database. the question is, why it scored hammy? aep, how did it score before greylisting? Are you sure you do not have bug in your greylisting code? Btw, I'm not sure if it should not be low scoring messages (spams) for which greylisting is very good, since you won't become that early recipient... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Posli tento mail 100 svojim znamim - nech vidia aky si idiot Send this email to 100 your friends - let them see what an idiot you are
Re: Again AWL confusion
On Wed, 2009-08-05 at 00:37 +0200, a...@exys.org wrote: > Matus UHLAR - fantomas wrote: > > On 04.08.09 20:09, a...@exys.org wrote: > >> I have obviously never received any mail from that sender, so why does > >> it hit? > >> > in later mail you mention that you run SA before greylisting. If, for some (very) odd reason you run greylisting after SA then *of course* your host has (a) seen the mail and (b) passed it through SA. How else can the mail get to the greylister? Would you care to explain why you put a greylister behind SA? Do you know how a greylister works and why it was designed to work that way? > nope. i grepped the global log. the only time that sender ever ocurs it > was temporary rejected due to greylisting. > And where else did greylisted mail appear in the log? For the mail to be logged as rejected by a greylister *after* its been through SA it must also have been inspected by AWL and therefore it did affect the AWL database. Martin
Re: Again AWL confusion
Matus UHLAR - fantomas wrote: On 04.08.09 20:09, a...@exys.org wrote: See the below message parts (the complete message does not pass the MLs filter) Notably both bayes and AWL are wrong. while I understand why bayes might have done that, i dont understand what AWL is doing here. I have obviously never received any mail from that sender, so why does it hit? in later mail you mention that you run SA before greylisting. Do you use per-use config at that time? nope. Isn't it possible that someone other at your system received mail from the same sender that didn't score much? nope. i grepped the global log. the only time that sender ever ocurs it was temporary rejected due to greylisting. Since most of those checks are RBLs that is collaborative checks, it's possible that similar message received in the past by early recipient scored very low, and if it was autolearned as ham, it could explain the BAYES_00 too. I suspect bayes liked the content. I receive a large amount of non-spam with similar content. Correctly spelled german in spam is rare, especially well formated text-only and utf8. (oh right, i didnt include the content. SAs Spamfilter wouldnt let me because of the URIBL hits :-/ ) Return-path: Envelope-to: a...@exys.org Received: from host231.dhms-domainmanagement.net ([91.199.51.231]) Subject: Virenwarnung - Ihr PC ist=?UTF-8?Q?=20ungesch=C3=BCtzt?= Content-Type: text/plain; charset="UTF-8" Message-ID: To: a...@exys.org X-Spam-Report: Content analysis details: (6.0 points, 5.0 required) pts rule name description -- -- 2.1 RCVD_IN_NJABL_SPAM RBL: NJABL: sender is confirmed spam source [91.199.51.231 listed in combined.njabl.org] -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.] 2.0 URIBL_BLACKContains an URL listed in the URIBL blacklist [URIs: virenschutz-downloadenDOTinfo] 1.9 URIBL_AB_SURBL Contains an URL listed in the AB SURBL blocklist [URIs: virenschutz-downloadenDOTinfo] 1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist [URIs: virenschutz-downloadenDOTinfo] 1.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist [URIs: virenschutz-downloadenDOTinfo] 1.5 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist [URIs: virenschutz-downloadenDOTinfo] 0.2 SARE_SUB_ENC_UTF8 Message uses character set often used in spam -1.9 AWLAWL: From: address is in the auto white-list X-Spam-Flag: YES
Re: Again AWL confusion
On 04.08.09 20:09, a...@exys.org wrote: > See the below message parts > (the complete message does not pass the MLs filter) > Notably both bayes and AWL are wrong. > while I understand why bayes might have done that, i dont understand > what AWL is doing here. > I have obviously never received any mail from that sender, so why does > it hit? in later mail you mention that you run SA before greylisting. Do you use per-use config at that time? Isn't it possible that someone other at your system received mail from the same sender that didn't score much? Since most of those checks are RBLs that is collaborative checks, it's possible that similar message received in the past by early recipient scored very low, and if it was autolearned as ham, it could explain the BAYES_00 too. > Return-path: > Envelope-to: a...@exys.org > Received: from host231.dhms-domainmanagement.net ([91.199.51.231]) > Subject: Virenwarnung - Ihr PC ist=?UTF-8?Q?=20ungesch=C3=BCtzt?= > Content-Type: text/plain; charset="UTF-8" > Message-ID: > To: a...@exys.org > > X-Spam-Report: > Content analysis details: (6.0 points, 5.0 required) > pts rule name description > -- > -- > 2.1 RCVD_IN_NJABL_SPAM RBL: NJABL: sender is confirmed spam source > [91.199.51.231 listed in combined.njabl.org] > -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% > [score: 0.] > 2.0 URIBL_BLACKContains an URL listed in the URIBL blacklist > [URIs: virenschutz-downloadenDOTinfo] > 1.9 URIBL_AB_SURBL Contains an URL listed in the AB SURBL > blocklist > [URIs: virenschutz-downloadenDOTinfo] > 1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL > blocklist > [URIs: virenschutz-downloadenDOTinfo] > 1.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL > blocklist > [URIs: virenschutz-downloadenDOTinfo] > 1.5 URIBL_OB_SURBL Contains an URL listed in the OB SURBL > blocklist > [URIs: virenschutz-downloadenDOTinfo] > 0.2 SARE_SUB_ENC_UTF8 Message uses character set often used in spam > -1.9 AWLAWL: From: address is in the auto white-list > X-Spam-Flag: YES -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. I don't have lysdexia. The Dog wouldn't allow that.
Re: Again AWL confusion
Please do not quote me out of context. Sorry. didnt find an apropriate way to respond to two statements in one sentence. Again, the greylisting prior to receiving this spam is not the reason. SA, or more specifically AWL, does not know about that. It is. I forgot to mention i run SA prior to greylisting, so i only greylist when the message exceeds a treshhold. As for unlearning: Sure! :) See the spamassassin-run [1] man-page, in particular for the --remove-addr-from-whitelist option. Or maybe the --add-addr-to-blacklist option, which fakes an entry with a score of 50. thanks
Re: Again AWL confusion
On Tue, 2009-08-04 at 21:18 +0200, a...@exys.org wrote: > > (missing in your paste) > > the received header was not missing. just stripped. Please do not quote me out of context. I said "From: header address (missing in your paste)". Inserted in the quote below where you ripped it out. > > This assumption is wrong. You did receive a message from the From: > > header address (missing in your paste) and the same originating > > net-block in the past. > > True I did, due to greylisting. Which SA doesn't know about. Unless, of course, the message or already received parts has been fed to SA anyway, despite greylisting. > r...@samir:~$grep 91.199.51.231 /var/log/exim/ -r > ... > F= temporarily rejected after > DATA: greylisted for 60 seconds > F= temporarily rejected after > DATA: greylisted for 60 seconds > <= virenwarndie...@virenschutz-downloaden.info > H=host231.dhms-domainmanagement.net [91.199.51.231] P=esmtp S=3223 > id=knuula.a6m...@localhost I guess that's the Envelope-From? AWL looks at the From: header. Also, SA doesn't necessarily have seen that From / net-block address recently. Any time in the past would do. > Greylisting is rather pointless when SA is going to remove the scoring > gained through later listing again. Should I disable AWL, or can i > unlearn it? That's the *exact* point of AWL. It is a historical score averager. [2] (The name is an artifact due to the fact that humans tend to frequently send using the same From address, and mostly the same sending net-block. Whereas spammers forge the sender, and usually distribute the origin widely. Hence the "white", to protect humans occasionally sending spammy mail. However, since it really is just an averaging system, it actually works both ways.) Again, the greylisting prior to receiving this spam is not the reason. SA, or more specifically AWL, does not know about that. As for unlearning: Sure! :) See the spamassassin-run [1] man-page, in particular for the --remove-addr-from-whitelist option. Or maybe the --add-addr-to-blacklist option, which fakes an entry with a score of 50. guenther [1] http://spamassassin.apache.org/full/3.2.x/doc/spamassassin-run.html [2] http://wiki.apache.org/spamassassin/AutoWhitelist and http://wiki.apache.org/spamassassin/AwlWrongWay -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Again AWL confusion
On Tue, 2009-08-04 at 21:18 +0200, a...@exys.org wrote: > > This assumption is wrong. You did receive a message from the From: > > header address and the same originating > > net-block in the past. > > > > > Should I disable AWL, or can i > unlearn it? Apparently you previously (maybe not this week) received one that scored 4.2. The averaging tool brought the score half-way between the new score and the old score. The effect is that if a good guy tends to send mail that scores 1.0, then one bad message (scoring, say, 7.0) won't get tossed. Similarly, if a brigand tends to send mail scoring 20, and he manages to miss all of the rules for one message, it will still be dropped. Don't get hung up on the "white" part of AWL. It's just as much an automatic black list. -- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX www.austinenergy.com signature.asc Description: This is a digitally signed message part
Re: Again AWL confusion
> (missing in your paste) the received header was not missing. just stripped. Received: from host231.dhms-domainmanagement.net ([91.199.51.231]) This assumption is wrong. You did receive a message from the From: header address and the same originating net-block in the past. True I did, due to greylisting. r...@samir:~$grep 91.199.51.231 /var/log/exim/ -r ... F= temporarily rejected after DATA: greylisted for 60 seconds F= temporarily rejected after DATA: greylisted for 60 seconds <= virenwarndie...@virenschutz-downloaden.info H=host231.dhms-domainmanagement.net [91.199.51.231] P=esmtp S=3223 id=knuula.a6m...@localhost Greylisting is rather pointless when SA is going to remove the scoring gained through later listing again. Should I disable AWL, or can i unlearn it?
Re: Again AWL confusion
On Tue, 2009-08-04 at 20:09 +0200, a...@exys.org wrote: > See the below message parts > (the complete message does not pass the MLs filter) > Notably both bayes and AWL are wrong. > while I understand why bayes might have done that, i dont understand > what AWL is doing here. > I have obviously never received any mail from that sender, so why does > it hit? This assumption is wrong. You did receive a message from the From: header address (missing in your paste) and the same originating net-block in the past. > X-Spam-Report: > Content analysis details: (6.0 points, 5.0 required) > -1.9 AWLAWL: From: address is in the auto white-list -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}