On 2011-09-19 23:37, [email protected] wrote:
On 09/19, Mark wrote:
[email protected] wrote:
The other 38 were notifications from livejournal.com, nothing spam
related, from 2011-08-02 to 2011-08-11. It looks like you just had
livejournal.com listed as a spammer for those 10 days. Those emails
are not hitting this rule now.
livejournal.com has been whitelisted for years, so it's certainly not expected
behaviour.
Any SA dev folks have opinions on this? I'm up for assuming there was
somehow a problem on my end and removing these from my corpora if that's
what you devs think I should do.
Mark, I encourage you to include [email protected] in your
replies.
Perhaps you were using a DNS server that returned bad results. Some
governments (e.g. China) intercept DNS requests and return their own IP. Some
ISP's think they can do that too for NXDOMAIN results.
It seems unlikely. I'm using a local bind server with two forwarders to my
hosting provider, linode.com, which is very open-source oriented, and
seems unlikely to pull something like that. Although I'm happy to ask
them via a support request if there was a related incident during this
time period.
The relevant rule is:
urirhssub URIBL_WS_SURBL multi.surbl.org. A 4
Does that mean it could've matched anything ending in .4, or only
127.0.0.4?
Man page is Mail::SpamAssassin::Plugin::URIDNSBL
That should be preventable to a large extent by checking if the return code is
within the 127/8 IP range.
Devs, if urirhssub with a value of "4" does not constrain to 127/8,
we should change the rules to match only, for example, 127.0.0.4.
We don't control external DNS servers of course, so if one of them decides to
return a 127/8 code due to whatever cause (e.g. cache poisoning), it will
cause a false detection signal.
Indeed.
Another possibility is DNS client error. That is known to occur with
multithreaded and asynchronous dns clients. Typical is a race condition while
accessing memory, causing a mix up of query returns.
Seems unlikely, mostly because of the time frame.
Did the livejournal.com hits have specific subdomains?
I just looked for notifications from livejournal that didn't hit this rule
in the same time frame - there were none. Everything I got from
livejournal.com from August 2nd to August 11th hit URIBL_WS_SURBL. And all
included these urls:
http://news.livejournal.com/
http://www.livejournal.com/manage/subscriptions/
Other URLs were generally of a subdomain<user>.livejournal.com.
Also, I would expect that there would not be any query to SURBL for a domain
that is on SA's internal frequently queried whitelist. livejournal.com should
be on that list. Can you see if there were any changes/updates to SA that
could have caused this?
The rules currently include:
25_uribl.cf:uridnsbl_skip_domain juno.com kernel.org livejournal.com lycos.com
Certainly looks to me like that shouldn't allow livejournal.com to be
looked up against SURBL.
Closest backup of those config files I have is 2011-08-23, and that file
has an md5 checksum identical to my current 25_uribl.cf. Same as the
backup from 2011-07-01:
# md5sum
panic-2011-07-01/var/lib/spamassassin/3.004000/updates_spamassassin_org/25_uribl.cf
64a27859c0a7cdafbd856dce3461c2f3
panic-2011-07-01/var/lib/spamassassin/3.004000/updates_spamassassin_org/25_uribl.cf
$ md5sum /var/lib/spamassassin/3.004000/updates_spamassassin_org/25_uribl.cf
64a27859c0a7cdafbd856dce3461c2f3
/var/lib/spamassassin/3.004000/updates_spamassassin_org/25_uribl.cf
So it shouldn't be possible for spamassassin.com to hit URIBL_WS_SURBL.
I've removed the examples from my corpora. I'd still like to know how it
happened. Here's the simplest example I can find:
http://www.chaosreigns.com/sa/ws_surbl.txt
Only URLs that could hit URIBL_WS_SURBL are www.livejournal.com and
news.livejournal.com, right? Yep.
spamassassin -D 2>&1 | grep multi.surbl | grep starting | less
Sep 19 17:22:39.564 [9037] dbg: async: starting: URI-DNSBL,
DNSBL:multi.surbl.org.:news.livejournal.com (timeout 15.0s, min 3.0s)
Sep 19 17:22:39.569 [9037] dbg: async: starting: URI-DNSBL,
DNSBL:multi.surbl.org.:www.livejournal.com (timeout 15.0s, min 3.0s)
That's current trunk output, so there's a bug causing uridnsbl_skip_domain
to not work? Opened bug:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6662
Even without uridnsbl_skip_domain I still can't explain why this rule hit,
and that still bothers me.
from what I'm seeing:
livejournal.com is in 20_aux_tlds.cf
util_rb_2tld livejournal.com
the uridnsbl_skip_domain rule applies to parent domain, not to subdomains.
You are trusting a third party DNS (as your forwarder) which *could* be
manipulating your queries.
If you have a local resolver, why do the extra query hop?
or am I missing something?