Re: DNSBL Comparison 20091010
On tir 13 okt 2009 16:22:55 CEST, "McDonald, Dan" wrote On Tue, 2009-10-13 at 15:42 +0200, Matus UHLAR - fantomas wrote: > On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote >> On Sat, Oct 10, 2009 at 16:44, Warren Togami wrote: >>> Given that zen.spamhaus.org is a combination of XBL and PBL, this >>> data seems to confirm the good reputation of Spamhaus. >> Er.. Zen is a combination of SBL, XBL, and PBL. Not just the XBL and PBL. On 11.10.09 03:10, Benny Pedersen wrote: > and also CSS CSS is included in SBL :) Not as far as SpamAssassin is concerned. RCVD_IN_SBL only checks for 127.0.0.2 in zen, while CSS returns 127.0.0.3, so a new rule has to be added to include the CSS data. My rule is: header RCVD_IN_CSS eval:check_rbl('zen-lastexternal', 'zen.spamhaus.org.', '127.0.0.3') describe RCVD_IN_CSSReceived via a relay in Spamhaus CSS tflags RCVD_IN_CSS net score RCVD_IN_CSS 0 0.509 0 0.905 # n=0 n=2 this rule will make another dns lookup :/ use check_rbl_sub to avoid it, i posted a rule here that does it, and one of the ninjas make the same error :/ http://www.nabble.com/New-spamhaus-list-not-included-td25736766.html if check_rbl is cached dns sorry for my own mistake :) -- xpoint
Re: DNSBL Comparison 20091010
On Tue, 2009-10-13 at 15:42 +0200, Matus UHLAR - fantomas wrote: > > On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote > >> On Sat, Oct 10, 2009 at 16:44, Warren Togami wrote: > >>> Given that zen.spamhaus.org is a combination of XBL and PBL, this > >>> data seems to confirm the good reputation of Spamhaus. > >> Er.. Zen is a combination of SBL, XBL, and PBL. Not just the XBL and PBL. > > On 11.10.09 03:10, Benny Pedersen wrote: > > and also CSS > > CSS is included in SBL :) Not as far as SpamAssassin is concerned. RCVD_IN_SBL only checks for 127.0.0.2 in zen, while CSS returns 127.0.0.3, so a new rule has to be added to include the CSS data. My rule is: header RCVD_IN_CSS eval:check_rbl('zen-lastexternal', 'zen.spamhaus.org.', '127.0.0.3') describe RCVD_IN_CSSReceived via a relay in Spamhaus CSS tflags RCVD_IN_CSS net score RCVD_IN_CSS 0 0.509 0 0.905 # n=0 n=2 -- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX www.austinenergy.com signature.asc Description: This is a digitally signed message part
Re: DNSBL Comparison 20091010
> On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote >> On Sat, Oct 10, 2009 at 16:44, Warren Togami wrote: >>> Given that zen.spamhaus.org is a combination of XBL and PBL, this >>> data seems to confirm the good reputation of Spamhaus. >> Er.. Zen is a combination of SBL, XBL, and PBL. Not just the XBL and PBL. On 11.10.09 03:10, Benny Pedersen wrote: > and also CSS CSS is included in SBL :) -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Christian Science Programming: "Let God Debug It!".
Re: DNSBL Comparison 20091010
Warren Togami wrote: The following is an apples to apples comparisons of DNSBL lastexternal rules against the October 10th, 2009 weekly_mass_check corpora. HOSTKARMA and SEM are new. Hopefully these masscheck results can help to identify problems so list quality can improve over time. http://ruleqa.spamassassin.org/20091010-r823821-n 128161 Spam 185199 Ham The results below are only as good as the data submitted by nightly masscheck volunteers. Please join us in nightly masschecks to increase the sample size of the corpora so we can have greater confidence in the nightly statistics. DNSBL lastexternal by Safety SPAM%HAM%RANK RULE 10.0975% 0.0022% 0.93 RCVD_IN_PSBL 11.4278% 0.0173% 0.91 RCVD_IN_XBL 18.7561% 0.0616% 0.87 RCVD_IN_SEMBLACK 81.8252% 0.1825% 0.83 RCVD_IN_PBL 27.4342% 0.2327% 0.77 RCVD_IN_SORBS_DUL 91.5505% 0.3974% 0.76 RCVD_IN_BRBL_LASTEXT 13.1272% 0.5027% 0.67 RCVD_IN_HOSTKARMA_BL RANK is heavily influenced by the false positive rate, thus it seems to be a rough approximation of safety. RANK alone says little about the effectiveness of a particular rule against spam. These numbers show that Barracuda and PBL are by far the most extensive blacklists, but the false positive rates suggest that Barracuda is aggressive at the expense of safety. Given that zen.spamhaus.org is a combination of XBL and PBL, this data seems to confirm the good reputation of Spamhaus. Overlap analysis shows the majority of XBL and PBL are also listed by Barracuda. Furthermore Barracuda's list seems to have a similar hit % as XBL + PBL combined. Is Barracuda known to aggregate Spamhaus data with their own? If so we might be adding redundant scores in a dangerous and undesirable manner. Adam Katz sa-update channels contains DNSBL rule overlap adjustments in an attempt to compensate for what he calls "incestuous" blacklists. I am beginning to think this is a good idea to explore for spamassassin upstream if in fact one blacklist is aggregating data from another blacklist. http://ruleqa.spamassassin.org/20091010-r823821-n/ In related news, these results indicate that RCVD_IN_HOSTKARMA_BR and RCVD_IN_SEMBACKSCATTER have so few hits that they are likely not worth the overhead of the extra DNS query to use in production. Unless the list owners object, I will remove them from the sandbox before next Saturday's network masscheck. === Spamcop === SPAM%HAM%RANK RULE 16.8663% 2.5994% 0.56 RCVD_IN_BL_SPAMCOP_NET I did not include SpamCop in the above chart because it is not the same type of lastexternal DNSBL. I'm confused. With such a poor false positive rate how does it have a high score generated by the GA? Warren Togami wtog...@redhat.com Just a few comments. First - can _ get a list of IPs that you consider false positives? I'd like to take a look at them to see what I'm doing wrong on the HOSTKARMA list. Also, we are only filtering a few thousand domains so in some ways hitting 13% of the spam is good for being a fairly small operation. Our blacklist is mostly spambots and our list self tunes to the spambots that are spamming our customers. So people who we filter for have more hits that people who don't. We actually block almost 100% of spambot spam. It makes me wonder if the spam were collected from domains where the high numbered MX record were pointing to our tarbaby server how the numbers would change. But I am concerned about the FP count so any info about that would be helpful.
Re: DNSBL Comparison 20091010
Just a few comments and corrections. On Sat, 2009-10-10 at 19:44 -0400, Warren Togami wrote: > The following is an apples to apples comparisons of DNSBL lastexternal Minor nit: Not entirely correct. Different lists have different listing policies and criteria. A PBL listing for example does NOT necessarily indicate that IP ever has sent a single spam. While all (most) of these might be apples, I strongly prefer green ones over red. ;) > Overlap analysis shows the majority of XBL and PBL are also listed by > Barracuda. Furthermore Barracuda's list seems to have a similar hit % > as XBL + PBL combined. Is Barracuda known to aggregate Spamhaus data > with their own? No, they don't. They don't even list PBL style IPs just because of that. Barracuda BRBL appears to be an independently collected set, as one easily can find out about: http://www.barracudacentral.org/rbl > In related news, these results indicate that RCVD_IN_HOSTKARMA_BR and > RCVD_IN_SEMBACKSCATTER have so few hits that they are likely not worth > the overhead of the extra DNS query to use in production. Unless the > list owners object, I will remove them from the sandbox before next > Saturday's network masscheck. Hostkarma BROWN does NOT require a DNS query. It's a check_rbl_sub() eval rule, and thus comes essentially for free. Any possible Hostkarma listing is based on the very same, single DNS query. Backscatter is not spam. ;) -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: DNSBL Comparison 20091010
On søn 11 okt 2009 07:19:47 CEST, Adam Katz wrote different return code to indicate the hit anyway so that I can act on it anyway. *Especially* while DNSWLs lack an abuse-reporting mechanism. spamassassin have firsttrusted for dnsbl same can go for dnswl testing that mean if you have none or just very few trusted_networks dnswl cant hit if used with firsttrusted in case of dnswl.org send email to abuse with the ip or there id you like to change for sending spam and default sa does not have much trusted_networks, where is the problem hidded ? abuse ?, http://www.dnswl.org/ i have no problem with abuse do you refer maybe to another whitelist that are ip based ? I have seen SO much DNSWL'd spam that I've had to migrate to using confirmation; like whitelist_from vs whitelist_auth on a DNSWL level. whitelist_from is a joke (read candidate for being removed in sa) whitelist_auth is power In my khop-bl sa-update channel, any DNSWL'd message that doesn't pass DKIM or SPF gains a point while any that does loses 2.25 (unless it's already been lowered by overlapping DNSWL scores). ... actually, I'm surprised I gave it such a swing given spammers' increasing use of SPF and DKIM. thats why newer make such stupid meta rules :) only whitelist non spammers, if a spf or dkim spams remove from whitelist did you blindly do whitelist_auth *...@hotmail.com ? :) -- xpoint
Re: DNSBL Comparison 20091010
On Sun, Oct 11, 2009 at 01:19:47AM -0400, Adam Katz wrote: > *Especially* while DNSWLs lack an abuse-reporting mechanism. > > I have seen SO much DNSWL'd spam that I've had to migrate to using Just to be clear, what DNSWLs are you talking about? It's a bit confusing as the official DNSWL is called "DNSWL". While it doesn't(?) have an automated "abuse-reporting mechanism", it sure accepts such reports. Maybe it's just me, but there is currently only one proven DNSWL.
Re: DNSBL Comparison 20091010
Warren Togami wrote: > Overlap analysis shows the majority of XBL and PBL are also listed by > Barracuda. Furthermore Barracuda's list seems to have a similar hit > % as XBL + PBL combined. Is Barracuda known to aggregate Spamhaus > data with their own? If so we might be adding redundant scores in a > dangerous and undesirable manner. > > Adam Katz sa-update channels contains DNSBL rule overlap adjustments > in an attempt to compensate for what he calls "incestuous" > blacklists. I am beginning to think this is a good idea to explore > for spamassassin upstream if in fact one blacklist is aggregating > data from another blacklist. I should say more about my overlap rules (which is the PC version of what I called in earlier versions and in comments as "incestuous"). I've noticed that a lot of these blocklists have a lot of overlap on the same ham. Some of them syndicate common upstream sources, but more importantly, they share the same propagation methods. Spam traps are limited in what they can pick up while still staying pure; using list subscription + unsubscription, catch-all accounts on guessable or subtly "advertised" domains, cleaned-up stale email accounts, feeding addresses to spam bots, and perhaps a few other bags of tricks. This fishing for spam will lure the same spammers across the board, thus the overlap. This overlap is a problem because some spammers are smart enough to cycle through relays and hope for one known (rightly or not) for sending ham, or at least *not* known for sending spam. Overlap from DNSBLs can completely kill ham, and I think a multifaceted system like SpamAssassin should not apply 5+ points (out of 5) to a message solely from DNSBLs** when there are so many other tools available. Real spam will bump into something else. That brings me to a big pet peeve of mine on DNSBLs: they 'clean' themselves of this problem by using DNSWLs ... and spammers know this. The 'whitelisting' supplied by a DNSWL is in my opinion not appropriate for a DNSBL to use. Instead, a DNSBL-dedicated reference is needed, perhaps even one that is not publicly available. As to how such a thing would be populated ... that's a great question. If it's anything that could be publicly accessible, I'd prefer DNSBLs to either use NOTHING and let their users cross-check or else use a different return code to indicate the hit anyway so that I can act on it anyway. *Especially* while DNSWLs lack an abuse-reporting mechanism. I have seen SO much DNSWL'd spam that I've had to migrate to using confirmation; like whitelist_from vs whitelist_auth on a DNSWL level. In my khop-bl sa-update channel, any DNSWL'd message that doesn't pass DKIM or SPF gains a point while any that does loses 2.25 (unless it's already been lowered by overlapping DNSWL scores). ... actually, I'm surprised I gave it such a swing given spammers' increasing use of SPF and DKIM. ** Another pet peeve: Mail should not be able to be marked as spam from a single category of detection mechanisms, aside from blacklists and perhaps a fully trained and moderated learning algorithm. I'd like to set a hard cap of mechanism categories to something like 3.5, perhaps 4.0 for something dynamically generated by incoming data (e.g. Bayes, AWL), but SA makes facilitating this kind of capping *really* hard. DNSBL/URIBL/DNSWLs are the only place that this sticks out enough for me to have remedied. My IXHASH rule is specifically designed to avoid this exact problem. It uses the plugin's defaults of 0.1 per server hit and make their union the rule that gets the larger amount of points. If I had masscheck results, some servers scores might go up, but the bulk would still be applied by the meta rule. SA 3.4 (or 3.3 if it's not too late...) should (IMHO) include that sort of mechanism for DNSBLs. Not quite a cap, but close enough. The overlap rules in question are a part of my khop-bl channel, which is published at http://khopesh.com/Anti-spam#sa-update_channels not too far above my iXhash meta rule, which now includes the workaround update discussed here not too long ago.
Re: DNSBL Comparison 20091010
On 10/10/2009 09:10 PM, Benny Pedersen wrote: On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote On Sat, Oct 10, 2009 at 16:44, Warren Togami wrote: Given that zen.spamhaus.org is a combination of XBL and PBL, this data seems to confirm the good reputation of Spamhaus. Er.. Zen is a combination of SBL, XBL, and PBL. Not just the XBL and PBL. and also CSS http://ruleqa.spamassassin.org/20091010-r823821-n I know, but SBL and CSS had negligible and zero hits so I didn't bother mentioning it. Warren
Re: DNSBL Comparison 20091010
On søn 11 okt 2009 02:31:58 CEST, John Rudd wrote On Sat, Oct 10, 2009 at 16:44, Warren Togami wrote: Given that zen.spamhaus.org is a combination of XBL and PBL, this data seems to confirm the good reputation of Spamhaus. Er.. Zen is a combination of SBL, XBL, and PBL. Not just the XBL and PBL. and also CSS -- xpoint
Re: DNSBL Comparison 20091010
On 10/10/2009 08:55 PM, João Gouveia wrote: Hi Warren, If you don't mind me asking, how does this kind of comparison take into account the dynamic nature of zombie infected machines? For example, an IP address may be infected at some point, and be listed in XBL, but later the client IP address changes (e.g. new DHCP lease) or simply gets "cleaned" and eventually expires from XBL. If I remember correctly, these comparisons are made using a spam/ham corpus that doesn't change that often. Wouldn't that cause FPs or FNs that in a real time scenario would not show up? Right, these results are not entirely precise to reflect how these blacklists behave right at this very moment. It is impressive however that despite PSBL or XBL listing current active abusers, their numbers demonstrate very high safety ratings. If you look at the ruleqa URL and click on those individual rules you can see how well those rules worked for the past week and 2nd week. Those counts are closer to current results. Warren Togami wtog...@redhat.com
Re: DNSBL Comparison 20091010
On Sat, Oct 10, 2009 at 16:44, Warren Togami wrote: > Given that zen.spamhaus.org is a combination of XBL and PBL, this > data seems to confirm the good reputation of Spamhaus. Er.. Zen is a combination of SBL, XBL, and PBL. Not just the XBL and PBL.