Re: Again AWL confusion

2009-08-06 Thread LuKreme

On 5-Aug-2009, at 02:15, a...@exys.org wrote:

The point is that scores below 2 are never spam,



Er... that's certainly not true.

--
*** AgentSmith sets mode: +m



Re: Again AWL confusion

2009-08-05 Thread RW
On Wed, 05 Aug 2009 10:15:00 +0200
a...@exys.org wrote:


> 2 to 5 is the sweetspot.  That message in question actually proved it
> is working, since the URIBL hits came later. Then it scores >10  so
> it gets rejected.

I noticed earlier that you were greylisting for only 60s; that seems
like a fairly short delay to affect listing.

> I think that setup is fairly smart,  

I don't run my own mta, but I do something analogous, in that I do an
initial test with Bogofilter and use the result to delay spam up to 24
hours before it's processed with SA. 

I think if I were doing greylisting I might use Bogofilter's
ham result to bypass it, and the unsure/spam results to set short
or long delay. 

> excluding the problem that i train SA with wrong information.

I think if Bayes is being mistrained, you have the autotrain thresholds
wrong. And in your situation, it's not going to be possible to reverse
it properly since the signature will change with the received headers.



Re: Again AWL confusion

2009-08-05 Thread Martin Gregorie
On Wed, 2009-08-05 at 22:21 +0200, Matus UHLAR - fantomas wrote:

> turning off AWL and autolearn (optionally only when run at SMTP time) would
> help you here. Although using such setup you loose much of advantages (like
> AWL ;-) and especially personalising...
> 
There are cases where AWL is a menace. In my case I run SA as part of
the 'pipeline'[*] between fetchmail and Postfix because there's a bad
interaction between the way Postfix runs SA as a subservice and its
always_bcc directive. I found that in my set-up AWL was consistently
giving unhelpful scores, so its been turned off for quite a while now.

[*] 'pipeline' because fetchmail's mda option feeds a pipeline leading
to the Postfix.sendmail utility that passes the mail to Postfix.


Martin




Re: Again AWL confusion

2009-08-05 Thread Matus UHLAR - fantomas
>> On 05.08.09 00:31, Martin Gregorie wrote:
>>> If, for some (very) odd reason you run greylisting after SA then *of
>>> course* your host has (a) seen the mail and (b) passed it through SA.
>>> How else can the mail get to the greylister?
>>>
>>> Would you care to explain why you put a greylister behind SA? Do you 
>>> know how a greylister works and why it was designed to work that
>>> way?

> Matus UHLAR - fantomas wrote:
>> He already explained that he greylists only mail that scores above a limit.

On 05.08.09 10:15, a...@exys.org wrote:
> exactly. The point is that scores below 2 are never spam, so i avoid  
> greylisting. Thats my whitelist (you usually need for greylisting)  at  
> the same time, since i whitelist some hosts in SA.
>
>> In that case we can assume the spam scored high even before so it got
>> greylisted. In such case I doubt it was learned as ham, unless the
>> greylisting check is broken...

> above 2. The njabl hit would have been enough to hit that. It didn't  
> score above 10, because that would have been rejected at smtp time.
>
> My guess is that it scored 2 on the first try, then later it would have  
> scored above 10 due to surbl listings, but awl kicks in and lowers the  
> score thinking the greylisted mail was an independent message.

that's it! you can look at spamd logs and search for the same message-id.

>>> And where else did greylisted mail appear in the log? For the 
>>> mail to be logged as rejected by a greylister *after* its been
>>> through SA it must also have been inspected by AWL and therefore it did
>>> affect the AWL database.

> oh right, i could look at the SA log, but i already know it passed SA 3  
> times.

while repeated learning of the same message does not affect bayes, I think
this doesn't apply for AWL.

>> the question is, why it scored hammy?  aep, how did it score before
>> greylisting? Are you sure you do not have bug in your greylisting code?

> see above. i'm pretty sure the "bug" is passing the same message to SA  
> multiple times.

>> Btw, I'm not sure if it should not be low scoring messages (spams) for which
>> greylisting is very good, since you won't become that early recipient...

> 2 to 5 is the sweetspot.  That message in question actually proved it is  
> working, since the URIBL hits came later. Then it scores >10  so it gets  
> rejected.
> I think that setup is fairly smart,  excluding the problem that i train  
> SA with wrong information.
>
> I wonder if i could ask SA to score a message without learning it,  
> although exim-sa propably doesnt support that.

turning off AWL and autolearn (optionally only when run at SMTP time) would
help you here. Although using such setup you loose much of advantages (like
AWL ;-) and especially personalising...

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"The box said 'Requires Windows 95 or better', so I bought a Macintosh".


Re: Again AWL confusion

2009-08-05 Thread Cedric Knight
a...@exys.org wrote:
> exactly. The point is that scores below 2 are never spam, so i avoid
> greylisting. Thats my whitelist (you usually need for greylisting)  at
> the same time, since i whitelist some hosts in SA.

Interesting set-up, although I don't think it would be suitable for a
high-volume server.  So what do you use to do this?  exim-sa and what
greylisting software?

> above 2. The njabl hit would have been enough to hit that. It didn't
> score above 10, because that would have been rejected at smtp time.
> 
> My guess is that it scored 2 on the first try, then later it would have
> scored above 10 due to surbl listings, but awl kicks in and lowers the
> score thinking the greylisted mail was an independent message.

With most greylisting systems, the temporary reject is before the data
section (which helps save bandwidth), so it's hard to know if it's two
attempts to deliver the same message, or two independent messages.  Not
so in your case, however.

What is auto_whitelist_factor set at?

> 
>>> And where else did greylisted mail appear in the log? For the
>>> mail to be logged as rejected by a greylister *after* its been
>>> through SA it must also have been inspected by AWL and therefore it did
>>> affect the AWL database.
>>   
> oh right, i could look at the SA log, but i already know it passed SA 3
> times.

Worth doing.

>> the question is, why it scored hammy?  aep, how did it score before
>> greylisting? Are you sure you do not have bug in your greylisting code?
>>   
> see above. i'm pretty sure the "bug" is passing the same message to SA
> multiple times.

Well, by definition that isn't an SA bug.  Or are you suggesting AWL
should check to see if the same Message-ID has been seen before, and if
it has, not score or learn?  That would be an extra database lookup, and
it would mean AWL would also be disabled for valid mail that had been
delayed by greylisting (maybe OK, because it presumably hasn't been seen
before).

Bayes *shouldn't* allow learning of the same message more than once
(it's doesn't if you train it manually), but maybe autolearn doesn't
update bayes_seen (??).

I think the simplest solution for your config is just:
use_auto_whitelist 0
bayes_auto_learn 0

Setting 'tflags URIBL_BLACK noautolearn' etc. on the remote tests would
probably mean the AWL decrease would be less, because AWL is then just
smoothing out the scores from the local tests.  None of this sounds very
efficient with minimising DNS lookups and reducing carbon footprints...

CK


Re: Again AWL confusion

2009-08-05 Thread aep

Matus UHLAR - fantomas wrote:

On 05.08.09 00:31, Martin Gregorie wrote:
  

If, for some (very) odd reason you run greylisting after SA then *of
course* your host has (a) seen the mail and (b) passed it through SA.
How else can the mail get to the greylister?

Would you care to explain why you put a greylister behind SA? 
Do you know how a greylister works and why it was designed to work that

way?



He already explained that he greylists only mail that scores above a limit.

  
exactly. The point is that scores below 2 are never spam, so i avoid 
greylisting. Thats my whitelist (you usually need for greylisting)  at 
the same time, since i whitelist some hosts in SA.



In that case we can assume the spam scored high even before so it got
greylisted. In such case I doubt it was learned as ham, unless the
greylisting check is broken...

  
above 2. The njabl hit would have been enough to hit that. It didn't 
score above 10, because that would have been rejected at smtp time.


My guess is that it scored 2 on the first try, then later it would have 
scored above 10 due to surbl listings, but awl kicks in and lowers the 
score thinking the greylisted mail was an independent message.


And where else did greylisted mail appear in the log? 

For the mail to be logged as rejected by a greylister *after* its been

through SA it must also have been inspected by AWL and therefore it did
affect the AWL database.



  
oh right, i could look at the SA log, but i already know it passed SA 3 
times.

the question is, why it scored hammy?  aep, how did it score before
greylisting? Are you sure you do not have bug in your greylisting code?
  
see above. i'm pretty sure the "bug" is passing the same message to SA 
multiple times.

Btw, I'm not sure if it should not be low scoring messages (spams) for which
greylisting is very good, since you won't become that early recipient...
  
2 to 5 is the sweetspot.  That message in question actually proved it is 
working, since the URIBL hits came later. Then it scores >10  so it gets 
rejected.
I think that setup is fairly smart,  excluding the problem that i train 
SA with wrong information.


I wonder if i could ask SA to score a message without learning it, 
although exim-sa propably doesnt support that.






Re: Again AWL confusion

2009-08-05 Thread Matus UHLAR - fantomas
> On Wed, 2009-08-05 at 00:37 +0200, a...@exys.org wrote:
> > Matus UHLAR - fantomas wrote:
> > > On 04.08.09 20:09, a...@exys.org wrote:
> > >> I have obviously never received any mail from that sender, so why does
> > >> it hit?
> > >>
> 
> > in later mail you mention that you run SA before greylisting.

On 05.08.09 00:31, Martin Gregorie wrote:
> If, for some (very) odd reason you run greylisting after SA then *of
> course* your host has (a) seen the mail and (b) passed it through SA.
> How else can the mail get to the greylister?
> 
> Would you care to explain why you put a greylister behind SA? 
> Do you know how a greylister works and why it was designed to work that
> way?

He already explained that he greylists only mail that scores above a limit.

In that case we can assume the spam scored high even before so it got
greylisted. In such case I doubt it was learned as ham, unless the
greylisting check is broken...

> > nope. i grepped the global log. the only time that sender ever ocurs it 
> > was temporary rejected due to greylisting.

> And where else did greylisted mail appear in the log? 
> 
> For the mail to be logged as rejected by a greylister *after* its been
> through SA it must also have been inspected by AWL and therefore it did
> affect the AWL database.

the question is, why it scored hammy?  aep, how did it score before
greylisting? Are you sure you do not have bug in your greylisting code?

Btw, I'm not sure if it should not be low scoring messages (spams) for which
greylisting is very good, since you won't become that early recipient...
-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Posli tento mail 100 svojim znamim - nech vidia aky si idiot
Send this email to 100 your friends - let them see what an idiot you are


Re: Again AWL confusion

2009-08-04 Thread Martin Gregorie
On Wed, 2009-08-05 at 00:37 +0200, a...@exys.org wrote:
> Matus UHLAR - fantomas wrote:
> > On 04.08.09 20:09, a...@exys.org wrote:
> >> I have obviously never received any mail from that sender, so why does
> >> it hit?
> >>

> in later mail you mention that you run SA before greylisting.

If, for some (very) odd reason you run greylisting after SA then *of
course* your host has (a) seen the mail and (b) passed it through SA.
How else can the mail get to the greylister?

Would you care to explain why you put a greylister behind SA? 
Do you know how a greylister works and why it was designed to work that
way?

> nope. i grepped the global log. the only time that sender ever ocurs it 
> was temporary rejected due to greylisting.
>
And where else did greylisted mail appear in the log? 

For the mail to be logged as rejected by a greylister *after* its been
through SA it must also have been inspected by AWL and therefore it did
affect the AWL database.


Martin





Re: Again AWL confusion

2009-08-04 Thread aep

Matus UHLAR - fantomas wrote:

On 04.08.09 20:09, a...@exys.org wrote:
  

See the below message parts
(the complete message does not pass the MLs filter)
Notably both bayes and AWL  are wrong.
while I understand  why bayes might have done that, i dont understand
what AWL is doing here.
I have obviously never received any mail from that sender, so why does
it hit?



in later mail you mention that you run SA before greylisting. Do you use
per-use config at that time? 

nope.

Isn't it possible that someone other at your
system received mail from the same sender that didn't score much? 
  
nope. i grepped the global log. the only time that sender ever ocurs it 
was temporary rejected due to greylisting.

Since most of those checks are RBLs that is collaborative checks, it's
possible that similar message received in the past by early recipient scored
very low, and if it was autolearned as ham, it could explain the BAYES_00
too.

  
I suspect bayes liked the content. I receive a large amount of non-spam 
with similar content. Correctly spelled german in spam is rare, 
especially well formated text-only and utf8.


(oh right, i didnt include the content. SAs Spamfilter wouldnt let me 
because of the URIBL hits :-/ )

Return-path: 
Envelope-to: a...@exys.org
Received: from host231.dhms-domainmanagement.net ([91.199.51.231])
Subject: Virenwarnung - Ihr PC ist=?UTF-8?Q?=20ungesch=C3=BCtzt?=
Content-Type: text/plain; charset="UTF-8"
Message-ID: 
To: a...@exys.org

X-Spam-Report:
Content analysis details:   (6.0 points, 5.0 required)
pts rule name  description
 --
--
2.1 RCVD_IN_NJABL_SPAM RBL: NJABL: sender is confirmed spam source
[91.199.51.231 listed in combined.njabl.org]
-2.6 BAYES_00   BODY: Bayesian spam probability is 0 to 1%
[score: 0.]
2.0 URIBL_BLACKContains an URL listed in the URIBL blacklist
[URIs: virenschutz-downloadenDOTinfo]
1.9 URIBL_AB_SURBL Contains an URL listed in the AB SURBL 
blocklist
[URIs: virenschutz-downloadenDOTinfo]
1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL 
blocklist
[URIs: virenschutz-downloadenDOTinfo]
1.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL 
blocklist
[URIs: virenschutz-downloadenDOTinfo]
1.5 URIBL_OB_SURBL Contains an URL listed in the OB SURBL 
blocklist
[URIs: virenschutz-downloadenDOTinfo]
0.2 SARE_SUB_ENC_UTF8  Message uses character set often used in spam
-1.9 AWLAWL: From: address is in the auto white-list
X-Spam-Flag: YES



  




Re: Again AWL confusion

2009-08-04 Thread Matus UHLAR - fantomas
On 04.08.09 20:09, a...@exys.org wrote:
> See the below message parts
> (the complete message does not pass the MLs filter)
> Notably both bayes and AWL  are wrong.
> while I understand  why bayes might have done that, i dont understand
> what AWL is doing here.
> I have obviously never received any mail from that sender, so why does
> it hit?

in later mail you mention that you run SA before greylisting. Do you use
per-use config at that time? Isn't it possible that someone other at your
system received mail from the same sender that didn't score much? 

Since most of those checks are RBLs that is collaborative checks, it's
possible that similar message received in the past by early recipient scored
very low, and if it was autolearned as ham, it could explain the BAYES_00
too.

> Return-path: 
> Envelope-to: a...@exys.org
> Received: from host231.dhms-domainmanagement.net ([91.199.51.231])
> Subject: Virenwarnung - Ihr PC ist=?UTF-8?Q?=20ungesch=C3=BCtzt?=
> Content-Type: text/plain; charset="UTF-8"
> Message-ID: 
> To: a...@exys.org
>
> X-Spam-Report:
>   Content analysis details:   (6.0 points, 5.0 required)
>   pts rule name  description
>    --
> --
>   2.1 RCVD_IN_NJABL_SPAM RBL: NJABL: sender is confirmed spam source
>   [91.199.51.231 listed in combined.njabl.org]
>   -2.6 BAYES_00   BODY: Bayesian spam probability is 0 to 1%
>   [score: 0.]
>   2.0 URIBL_BLACKContains an URL listed in the URIBL blacklist
>   [URIs: virenschutz-downloadenDOTinfo]
>   1.9 URIBL_AB_SURBL Contains an URL listed in the AB SURBL 
> blocklist
>   [URIs: virenschutz-downloadenDOTinfo]
>   1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL 
> blocklist
>   [URIs: virenschutz-downloadenDOTinfo]
>   1.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL 
> blocklist
>   [URIs: virenschutz-downloadenDOTinfo]
>   1.5 URIBL_OB_SURBL Contains an URL listed in the OB SURBL 
> blocklist
>   [URIs: virenschutz-downloadenDOTinfo]
>   0.2 SARE_SUB_ENC_UTF8  Message uses character set often used in spam
>   -1.9 AWLAWL: From: address is in the auto white-list
> X-Spam-Flag: YES

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I don't have lysdexia. The Dog wouldn't allow that.


Re: Again AWL confusion

2009-08-04 Thread aep


Please do not quote me out of context. 
Sorry. didnt find an apropriate way to respond to two statements in one 
sentence.



Again, the greylisting prior to receiving this spam is not the reason.
SA, or more specifically AWL, does not know about that.

  
It is.  I forgot to mention i run SA prior to greylisting, so i only  
greylist when the message exceeds a treshhold.

As for unlearning: Sure! :)  See the spamassassin-run [1] man-page, in
particular for the --remove-addr-from-whitelist option. Or maybe the
--add-addr-to-blacklist option, which fakes an entry with a score of 50.
  

thanks


Re: Again AWL confusion

2009-08-04 Thread Karsten Bräckelmann
On Tue, 2009-08-04 at 21:18 +0200, a...@exys.org wrote:
> >  (missing in your paste)
> 
> the received header was not missing.  just stripped.

Please do not quote me out of context. I said "From: header address
(missing in your paste)". Inserted in the quote below where you ripped
it out.

> > This assumption is wrong. You did receive a message from the From:
> > header address (missing in your paste) and the same originating
> > net-block in the past.
> 
> True I did, due to greylisting.

Which SA doesn't know about. Unless, of course, the message or already
received parts has been fed to SA anyway, despite greylisting.

> r...@samir:~$grep 91.199.51.231 /var/log/exim/ -r 
> ...
> F= temporarily rejected after 
> DATA: greylisted for 60 seconds
> F= temporarily rejected after 
> DATA: greylisted for 60 seconds
> <= virenwarndie...@virenschutz-downloaden.info 
> H=host231.dhms-domainmanagement.net [91.199.51.231] P=esmtp S=3223 
> id=knuula.a6m...@localhost

I guess that's the Envelope-From? AWL looks at the From: header.

Also, SA doesn't necessarily have seen that From / net-block address
recently. Any time in the past would do.

> Greylisting is rather pointless when SA is going to remove the scoring 
> gained through later listing again.  Should I disable AWL, or can i 
> unlearn it?

That's the *exact* point of AWL. It is a historical score averager. [2]

(The name is an artifact due to the fact that humans tend to frequently
send using the same From address, and mostly the same sending net-block.
Whereas spammers forge the sender, and usually distribute the origin
widely. Hence the "white", to protect humans occasionally sending spammy
mail. However, since it really is just an averaging system, it actually
works both ways.)

Again, the greylisting prior to receiving this spam is not the reason.
SA, or more specifically AWL, does not know about that.


As for unlearning: Sure! :)  See the spamassassin-run [1] man-page, in
particular for the --remove-addr-from-whitelist option. Or maybe the
--add-addr-to-blacklist option, which fakes an entry with a score of 50.

  guenther


[1] http://spamassassin.apache.org/full/3.2.x/doc/spamassassin-run.html
[2] http://wiki.apache.org/spamassassin/AutoWhitelist and
http://wiki.apache.org/spamassassin/AwlWrongWay

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Again AWL confusion

2009-08-04 Thread McDonald, Dan
On Tue, 2009-08-04 at 21:18 +0200, a...@exys.org wrote:

> > This assumption is wrong. You did receive a message from the From:
> > header address  and the same originating
> > net-block in the past.
> >
> >   
>  Should I disable AWL, or can i 
> unlearn it?

Apparently you previously (maybe not this week) received one that scored
4.2.  The averaging tool brought the score half-way between the new
score and the old score.  The effect is that if a good guy tends to send
mail that scores 1.0, then one bad message (scoring, say, 7.0) won't get
tossed.  Similarly, if a brigand tends to send mail scoring 20, and he
manages to miss all of the rules for one message, it will still be
dropped.

Don't get hung up on the "white" part of AWL.  It's just as much an
automatic black list.


-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
www.austinenergy.com


signature.asc
Description: This is a digitally signed message part


Re: Again AWL confusion

2009-08-04 Thread aep

>  (missing in your paste)

the received header was not missing.  just stripped.

Received: from host231.dhms-domainmanagement.net ([91.199.51.231])



This assumption is wrong. You did receive a message from the From:
header address  and the same originating
net-block in the past.

  

True I did, due to greylisting.

r...@samir:~$grep 91.199.51.231 /var/log/exim/ -r 
...

F= temporarily rejected after 
DATA: greylisted for 60 seconds
F= temporarily rejected after 
DATA: greylisted for 60 seconds
<= virenwarndie...@virenschutz-downloaden.info 
H=host231.dhms-domainmanagement.net [91.199.51.231] P=esmtp S=3223 
id=knuula.a6m...@localhost


Greylisting is rather pointless when SA is going to remove the scoring 
gained through later listing again.  Should I disable AWL, or can i 
unlearn it?






Re: Again AWL confusion

2009-08-04 Thread Karsten Bräckelmann
On Tue, 2009-08-04 at 20:09 +0200, a...@exys.org wrote:
> See the below message parts
> (the complete message does not pass the MLs filter)
> Notably both bayes and AWL  are wrong.
> while I understand  why bayes might have done that, i dont understand
> what AWL is doing here.
> I have obviously never received any mail from that sender, so why does
> it hit?

This assumption is wrong. You did receive a message from the From:
header address (missing in your paste) and the same originating
net-block in the past.


> X-Spam-Report:
>   Content analysis details:   (6.0 points, 5.0 required)

>   -1.9 AWLAWL: From: address is in the auto white-list

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}