Re: DNSBL Comparison 20091010

2009-10-13 Thread Benny Pedersen

On tir 13 okt 2009 16:22:55 CEST, "McDonald, Dan" wrote


On Tue, 2009-10-13 at 15:42 +0200, Matus UHLAR - fantomas wrote:

> On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote
>> On Sat, Oct 10, 2009 at 16:44, Warren Togami  wrote:
>>> Given that zen.spamhaus.org is a combination of XBL and PBL, this
>>> data seems to confirm the good reputation of Spamhaus.
>> Er.. Zen is a combination of SBL, XBL, and PBL.  Not just the  
XBL and PBL.


On 11.10.09 03:10, Benny Pedersen wrote:
> and also CSS

CSS is included in SBL :)


Not as far as SpamAssassin is concerned.  RCVD_IN_SBL only checks for
127.0.0.2 in zen, while CSS returns 127.0.0.3, so a new rule has to be
added to include the CSS data.  My rule is:

header RCVD_IN_CSS  eval:check_rbl('zen-lastexternal',  
'zen.spamhaus.org.', '127.0.0.3')

describe RCVD_IN_CSSReceived via a relay in Spamhaus CSS
tflags RCVD_IN_CSS  net
score RCVD_IN_CSS 0 0.509 0 0.905 # n=0 n=2


this rule will make another dns lookup :/

use check_rbl_sub to avoid it, i posted a rule here that does it, and  
one of the ninjas make the same error :/


http://www.nabble.com/New-spamhaus-list-not-included-td25736766.html

if check_rbl is cached dns sorry for my own mistake :)

--
xpoint



Re: DNSBL Comparison 20091010

2009-10-13 Thread McDonald, Dan
On Tue, 2009-10-13 at 15:42 +0200, Matus UHLAR - fantomas wrote:
> > On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote
> >> On Sat, Oct 10, 2009 at 16:44, Warren Togami  wrote:
> >>> Given that zen.spamhaus.org is a combination of XBL and PBL, this
> >>> data seems to confirm the good reputation of Spamhaus.
> >> Er.. Zen is a combination of SBL, XBL, and PBL.  Not just the XBL and PBL.
> 
> On 11.10.09 03:10, Benny Pedersen wrote:
> > and also CSS
> 
> CSS is included in SBL :)

Not as far as SpamAssassin is concerned.  RCVD_IN_SBL only checks for
127.0.0.2 in zen, while CSS returns 127.0.0.3, so a new rule has to be
added to include the CSS data.  My rule is:

header RCVD_IN_CSS  eval:check_rbl('zen-lastexternal', 
'zen.spamhaus.org.', '127.0.0.3')
describe RCVD_IN_CSSReceived via a relay in Spamhaus CSS
tflags RCVD_IN_CSS  net
score RCVD_IN_CSS 0 0.509 0 0.905 # n=0 n=2



-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
www.austinenergy.com


signature.asc
Description: This is a digitally signed message part


Re: DNSBL Comparison 20091010

2009-10-13 Thread Matus UHLAR - fantomas
> On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote
>> On Sat, Oct 10, 2009 at 16:44, Warren Togami  wrote:
>>> Given that zen.spamhaus.org is a combination of XBL and PBL, this
>>> data seems to confirm the good reputation of Spamhaus.
>> Er.. Zen is a combination of SBL, XBL, and PBL.  Not just the XBL and PBL.

On 11.10.09 03:10, Benny Pedersen wrote:
> and also CSS

CSS is included in SBL :)

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Christian Science Programming: "Let God Debug It!".


Re: DNSBL Comparison 20091010

2009-10-11 Thread Marc Perkel



Warren Togami wrote:
The following is an apples to apples comparisons of DNSBL lastexternal 
rules against the October 10th, 2009 weekly_mass_check corpora. 
HOSTKARMA and SEM are new.  Hopefully these masscheck results can help 
to identify problems so list quality can improve over time.


http://ruleqa.spamassassin.org/20091010-r823821-n
128161 Spam
185199 Ham

The results below are only as good as the data submitted by nightly 
masscheck volunteers.  Please join us in nightly masschecks to 
increase the sample size of the corpora so we can have greater 
confidence in the nightly statistics.



DNSBL lastexternal by Safety

SPAM%HAM%RANK RULE
10.0975% 0.0022% 0.93 RCVD_IN_PSBL
11.4278% 0.0173% 0.91 RCVD_IN_XBL
18.7561% 0.0616% 0.87 RCVD_IN_SEMBLACK
81.8252% 0.1825% 0.83 RCVD_IN_PBL
27.4342% 0.2327% 0.77 RCVD_IN_SORBS_DUL
91.5505% 0.3974% 0.76 RCVD_IN_BRBL_LASTEXT
13.1272% 0.5027% 0.67 RCVD_IN_HOSTKARMA_BL

RANK is heavily influenced by the false positive rate, thus it seems 
to be a rough approximation of safety.  RANK alone says little about 
the effectiveness of a particular rule against spam.  These numbers 
show that Barracuda and PBL are by far the most extensive blacklists, 
but the false positive rates suggest that Barracuda is aggressive at 
the expense of safety.  Given that zen.spamhaus.org is a combination 
of XBL and PBL, this data seems to confirm the good reputation of 
Spamhaus.


Overlap analysis shows the majority of XBL and PBL are also listed by 
Barracuda.  Furthermore Barracuda's list seems to have a similar hit % 
as XBL + PBL combined.  Is Barracuda known to aggregate Spamhaus data 
with their own?  If so we might be adding redundant scores in a 
dangerous and undesirable manner.


Adam Katz sa-update channels contains DNSBL rule overlap adjustments 
in an attempt to compensate for what he calls "incestuous" 
blacklists.  I am beginning to think this is a good idea to explore 
for spamassassin upstream if in fact one blacklist is aggregating data 
from another blacklist.


http://ruleqa.spamassassin.org/20091010-r823821-n/
In related news, these results indicate that RCVD_IN_HOSTKARMA_BR and 
RCVD_IN_SEMBACKSCATTER have so few hits that they are likely not worth 
the overhead of the extra DNS query to use in production.  Unless the 
list owners object, I will remove them from the sandbox before next 
Saturday's network masscheck.


===
Spamcop
===
SPAM%HAM%RANK RULE
16.8663% 2.5994% 0.56 RCVD_IN_BL_SPAMCOP_NET

I did not include SpamCop in the above chart because it is not the 
same type of lastexternal DNSBL.  I'm confused.  With such a poor 
false positive rate how does it have a high score generated by the GA?


Warren Togami
wtog...@redhat.com



Just a few comments. First - can _ get a list of IPs that you consider 
false positives? I'd like to take a look at them to see what I'm doing 
wrong on the HOSTKARMA list. Also, we are only filtering a few thousand 
domains so in some ways hitting 13% of the spam is good for being a 
fairly small operation. Our blacklist is mostly spambots and our list 
self tunes to the spambots that are spamming our customers. So people 
who we filter for have more hits that people who don't. We actually 
block almost 100% of spambot spam. It makes me wonder if the spam were 
collected from domains where the high numbered MX record were pointing 
to our tarbaby server how the numbers would change.


But I am concerned about the FP count so any info about that would be 
helpful.




Re: DNSBL Comparison 20091010

2009-10-11 Thread Karsten Bräckelmann
Just a few comments and corrections.

On Sat, 2009-10-10 at 19:44 -0400, Warren Togami wrote:
> The following is an apples to apples comparisons of DNSBL lastexternal 

Minor nit: Not entirely correct. Different lists have different listing
policies and criteria. A PBL listing for example does NOT necessarily
indicate that IP ever has sent a single spam.

While all (most) of these might be apples, I strongly prefer green ones
over red. ;)


> Overlap analysis shows the majority of XBL and PBL are also listed by 
> Barracuda.  Furthermore Barracuda's list seems to have a similar hit % 
> as XBL + PBL combined.  Is Barracuda known to aggregate Spamhaus data 
> with their own?

No, they don't. They don't even list PBL style IPs just because of that.
Barracuda BRBL appears to be an independently collected set, as one
easily can find out about:
  http://www.barracudacentral.org/rbl


> In related news, these results indicate that RCVD_IN_HOSTKARMA_BR and 
> RCVD_IN_SEMBACKSCATTER have so few hits that they are likely not worth 
> the overhead of the extra DNS query to use in production.  Unless the 
> list owners object, I will remove them from the sandbox before next 
> Saturday's network masscheck.

Hostkarma BROWN does NOT require a DNS query. It's a check_rbl_sub()
eval rule, and thus comes essentially for free. Any possible Hostkarma
listing is based on the very same, single DNS query.

Backscatter is not spam. ;)


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: DNSBL Comparison 20091010

2009-10-11 Thread Benny Pedersen

On søn 11 okt 2009 07:19:47 CEST, Adam Katz wrote


different return code to indicate the hit anyway so that I can act on it
anyway.  *Especially* while DNSWLs lack an abuse-reporting mechanism.


spamassassin have firsttrusted for dnsbl same can go for dnswl testing

that mean if you have none or just very few trusted_networks dnswl  
cant hit if used with firsttrusted


in case of dnswl.org send email to abuse with the ip or there id you  
like to change for sending spam


and default sa does not have much trusted_networks, where is the  
problem hidded ?


abuse ?, http://www.dnswl.org/ i have no problem with abuse

do you refer maybe to another whitelist that are ip based ?


I have seen SO much DNSWL'd spam that I've had to migrate to using
confirmation; like whitelist_from vs whitelist_auth on a DNSWL level.


whitelist_from is a joke (read candidate for being removed in sa)

whitelist_auth is power


In my khop-bl sa-update channel, any DNSWL'd message that doesn't pass
DKIM or SPF gains a point while any that does loses 2.25 (unless it's
already been lowered by overlapping DNSWL scores).  ... actually, I'm
surprised I gave it such a swing given spammers' increasing use of SPF
and DKIM.


thats why newer make such stupid meta rules :)

only whitelist non spammers, if a spf or dkim spams remove from whitelist

did you blindly do whitelist_auth *...@hotmail.com ? :)

--
xpoint



Re: DNSBL Comparison 20091010

2009-10-10 Thread Henrik K
On Sun, Oct 11, 2009 at 01:19:47AM -0400, Adam Katz wrote:
> *Especially* while DNSWLs lack an abuse-reporting mechanism.
> 
> I have seen SO much DNSWL'd spam that I've had to migrate to using

Just to be clear, what DNSWLs are you talking about? It's a bit confusing as
the official DNSWL is called "DNSWL". While it doesn't(?) have an automated
"abuse-reporting mechanism", it sure accepts such reports.

Maybe it's just me, but there is currently only one proven DNSWL.



Re: DNSBL Comparison 20091010

2009-10-10 Thread Adam Katz
Warren Togami wrote:
> Overlap analysis shows the majority of XBL and PBL are also listed by
> Barracuda.  Furthermore Barracuda's list seems to have a similar hit
> % as XBL + PBL combined.  Is Barracuda known to aggregate Spamhaus
> data with their own?  If so we might be adding redundant scores in a 
> dangerous and undesirable manner.
> 
> Adam Katz sa-update channels contains DNSBL rule overlap adjustments
> in an attempt to compensate for what he calls "incestuous"
> blacklists.  I am beginning to think this is a good idea to explore
> for spamassassin upstream if in fact one blacklist is aggregating
> data from another blacklist.

I should say more about my overlap rules (which is the PC version of
what I called in earlier versions and in comments as "incestuous").

I've noticed that a lot of these blocklists have a lot of overlap on the
same ham.  Some of them syndicate common upstream sources, but more
importantly, they share the same propagation methods.

Spam traps are limited in what they can pick up while still staying
pure; using list subscription + unsubscription, catch-all accounts on
guessable or subtly "advertised" domains, cleaned-up stale email
accounts, feeding addresses to spam bots, and perhaps a few other bags
of tricks.

This fishing for spam will lure the same spammers across the board, thus
the overlap.  This overlap is a problem because some spammers are smart
enough to cycle through relays and hope for one known (rightly or not)
for sending ham, or at least *not* known for sending spam.  Overlap from
DNSBLs can completely kill ham, and I think a multifaceted system like
SpamAssassin should not apply 5+ points (out of 5) to a message solely
from DNSBLs** when there are so many other tools available.  Real spam
will bump into something else.


That brings me to a big pet peeve of mine on DNSBLs:  they 'clean'
themselves of this problem by using DNSWLs ... and spammers know this.
The 'whitelisting' supplied by a DNSWL is in my opinion not appropriate
for a DNSBL to use.  Instead, a DNSBL-dedicated reference is needed,
perhaps even one that is not publicly available.

As to how such a thing would be populated ... that's a great question.
If it's anything that could be publicly accessible, I'd prefer DNSBLs to
either use NOTHING and let their users cross-check or else use a
different return code to indicate the hit anyway so that I can act on it
anyway.  *Especially* while DNSWLs lack an abuse-reporting mechanism.

I have seen SO much DNSWL'd spam that I've had to migrate to using
confirmation; like whitelist_from vs whitelist_auth on a DNSWL level.
In my khop-bl sa-update channel, any DNSWL'd message that doesn't pass
DKIM or SPF gains a point while any that does loses 2.25 (unless it's
already been lowered by overlapping DNSWL scores).  ... actually, I'm
surprised I gave it such a swing given spammers' increasing use of SPF
and DKIM.


** Another pet peeve:  Mail should not be able to be marked as spam from
a single category of detection mechanisms, aside from blacklists and
perhaps a fully trained and moderated learning algorithm.  I'd like to
set a hard cap of mechanism categories to something like 3.5, perhaps
4.0 for something dynamically generated by incoming data (e.g. Bayes,
AWL), but SA makes facilitating this kind of capping *really* hard.

DNSBL/URIBL/DNSWLs are the only place that this sticks out enough for me
to have remedied.  My IXHASH rule is specifically designed to avoid this
exact problem.  It uses the plugin's defaults of 0.1 per server hit and
make their union the rule that gets the larger amount of points.  If I
had masscheck results, some servers scores might go up, but the bulk
would still be applied by the meta rule.  SA 3.4 (or 3.3 if it's not too
late...) should (IMHO) include that sort of mechanism for DNSBLs.  Not
quite a cap, but close enough.


The overlap rules in question are a part of my khop-bl channel, which is
published at http://khopesh.com/Anti-spam#sa-update_channels not too far
above my iXhash meta rule, which now includes the workaround update
discussed here not too long ago.


Re: DNSBL Comparison 20091010

2009-10-10 Thread Warren Togami

On 10/10/2009 09:10 PM, Benny Pedersen wrote:

On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote

On Sat, Oct 10, 2009 at 16:44, Warren Togami  wrote:

Given that zen.spamhaus.org is a combination of XBL and PBL, this
data seems to confirm the good reputation of Spamhaus.

Er.. Zen is a combination of SBL, XBL, and PBL. Not just the XBL and PBL.


and also CSS



http://ruleqa.spamassassin.org/20091010-r823821-n
I know, but SBL and CSS had negligible and zero hits so I didn't bother 
mentioning it.


Warren


Re: DNSBL Comparison 20091010

2009-10-10 Thread Benny Pedersen

On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote

On Sat, Oct 10, 2009 at 16:44, Warren Togami  wrote:

Given that zen.spamhaus.org is a combination of XBL and PBL, this
data seems to confirm the good reputation of Spamhaus.

Er.. Zen is a combination of SBL, XBL, and PBL.  Not just the XBL and PBL.


and also CSS

--
xpoint



Re: DNSBL Comparison 20091010

2009-10-10 Thread Warren Togami

On 10/10/2009 08:55 PM, João Gouveia wrote:

Hi Warren,

If you don't mind me asking, how does this kind of comparison take into
account the dynamic nature of zombie infected machines? For example, an
IP address may be infected at some point, and be listed in XBL, but
later the client IP address changes (e.g. new DHCP lease) or simply gets
"cleaned" and eventually expires from XBL. If I remember correctly,
these comparisons are made using a spam/ham corpus that doesn't change
that often. Wouldn't that cause FPs or FNs that in a real time scenario
would not show up?


Right, these results are not entirely precise to reflect how these 
blacklists behave right at this very moment.  It is impressive however 
that despite PSBL or XBL listing current active abusers, their numbers 
demonstrate very high safety ratings.


If you look at the ruleqa URL and click on those individual rules you 
can see how well those rules worked for the past week and 2nd week. 
Those counts are closer to current results.


Warren Togami
wtog...@redhat.com


Re: DNSBL Comparison 20091010

2009-10-10 Thread John Rudd
On Sat, Oct 10, 2009 at 16:44, Warren Togami  wrote:
> Given that zen.spamhaus.org is a combination of XBL and PBL, this
> data seems to confirm the good reputation of Spamhaus.

Er.. Zen is a combination of SBL, XBL, and PBL.  Not just the XBL and PBL.