Re: Plugin for filtering based on local criteria

Philip Prindeville Tue, 17 Jun 2014 19:50:28 -0700

I’ve contributed fixes to Apache itself since 1997 (though not with any 
regularity), but can’t remember if I’ve ever had to furnish a CLA or not.

Sure, opening a bug is fine.

As to your last questions: for someone who doesn’t need the complexity of using 
an DNSBL, doesn’t want the wide scope of using a DNSBL, want to have to 
configure it, or perhaps just wants a significantly more precise tool to solve 
a very limited problem, local blacklisting lets you do this.

As an example, we were recently hit by a volley of SPAM from a variety of mail 
relays, but they all had something in common.  All of them contained HTML with 
URL’s pointing to websites hosted by “Solar VPS”, and in particular on the 
subnet 65.181.64.0/18 (in some cases, the web hosts had additional A records on 
the subnet 192.99.0.0/16).

It took a couple of hours to get URIDNSBL configured, scored appropriately, and 
working… and verifying that the ill-behaved hosts had corresponding entries in 
multi.uribl.com without prior understanding of the record encoding also took 
some time (since the use of DNS RR’s is an overloading of their intended use, 
it’s less than intuitive).

When it was all over, it occurred to me that a trivial configuration like:

uri_block_cidr L_BLOCK_CIDR     65.181.64.0/18 192.99.0.0/16
body L_BLOCK_CIDR               eval:check_uri_local_bl("L_BLOCK_CIDR")
describe L_BLOCK_CIDR           Block URI's pointing to bad CIDR's
score L_BLOCK_CIDR              5.20

would be a lot more of a pinpoint fix to my issue, rather than the overly 
generalized approach of using multi.uribl.com. And I didn’t want to score 
everyone that was in that DNSBL, just to particular subnets.

After that, it occurred to me that I had never seen a legitimate email with a 
URL pointing to Vietnam or Nigeria in my life, and it would be nice to restrict 
those as well.  So the plugin later evolved to:

uri_block_cc L_BLOCK_CC         cn vn ro bg ru ng eg
body L_BLOCK_CC                 eval:check_uri_local_bl("L_BLOCK_CC")
describe L_BLOCK_CC             Block URI's pointing to countries with no CERT 
or anti-SPAM laws
score L_BLOCK_CC                5.65

In the case of the 65.181.0.0/16 SPAM which provided this call to action, here 
are some subject lines you might recognize:

News alert: you could apply for a CNA education program
Wireless Internet plans online
You've Been Accepted into the Who's Who
Don't overpay for a phone. Try a free* one today
Is your home missing something? How about custom blinds?
Could you study at a CNA education program?
cable service is a possibility

etc. All within a 6 hour spam.

Looking at some recent traffic on the SpamAssassin users mailing list, it 
seemed that other people had had a similar idea at the same time to provide 
surgical blacklisting locally.

At this point, I’m thinking of adding whitelisting support to the country, ISP, 
and CIDR blacklists. For example, we’ve had issues with ServerBeach being 
proactive about Spam or even acknowledging complaints in a timely fashion: that 
said, we get legitimate traffic with URL’s pointing to a Fedora Project 
resource hosted on one of their networks. So we couldn’t blacklist that entire 
ISP without “punching a hole” for Fedora build reports.

The whitelisting would either take individual IP addresses and/or host names as 
they appear in the URL’s.

Hope that answers your questions.

On Jun 17, 2014, at 9:24 AM, Kevin A. McGrail <[email protected]> wrote:

> Philip,
> 
> Do you have a CLA with the ASF? From checking, I don't believe so.  Can you 
> please take a look at http://wiki.apache.org/spamassassin/AboutClas
> 
> What might help you is that since this is a plugin, we could open a bug, add 
> it to trunk, etc.  for people to more readily test it. it wouldn't be enabled 
> by default but should allow more people to readily implement it and provide 
> feedback.
> 
> However, for me I know I am curious if you could do a bit more description on 
> why this is good to implement, what time of spam you use it to block, etc. in 
> the pm?
> 
> Regards,
> KAM
> 
> On 6/15/2014 10:47 PM, Philip Prindeville wrote:
>> Here’s a first attempt at a module.  I based it on Plugin::URIDetail.
>> 
>> It depends on Net::CIDR::Lite and Geo::IP.  If it detects a valid (though 
>> not necessarily current) ISP database, it will publish a handler for that. 
>> Same with the IP-Lite (or licensed IP) database from MaxMind.
>> 
>> We’ve been using the MaxMind database for a couple of years on a commercial 
>> project with good success.
>> 
>> Currently the filtering is done by country code, ISP name, and explicit CIDR 
>> blocks.
>> 
>> The last test is the least costly, but also the most fine grained… you can 
>> configure rules to run in whichever order suits your needs best.
>> 
>> I personally sort by country (cn ru bg vn ro ng ir) and then by ISP (won’t 
>> name them here, but one of them is Over tHere in France), and lastly by CIDR 
>> block.
>> 
>> The only real wart on these plugins is that they all index their databases 
>> by IP address, and do their own (implicit or explicit) name or IP mapping.  
>> Obviously, this is both blocking and repetitive.
>> 
>> Not sure why PerMsgStatus.pm can’t do the asynchronous name lookups when 
>> get_uri_detail_list() runs so we have that handy for each of the plugins.  
>> If I had the mappings already available, I’d definitely use that.
>> 
>> That is, instead of having:
>> 
>> hosts => {
>>    ‘nqtel.com’ => ‘nqtel.com’
>> }
>> 
>> why not instead have:
>> 
>> hosts =>
>>    ‘nqtel.com’ => [ ‘107.158.259.74’ ]
>> }
>> 
>> or even both, e.g. [ ‘nqtel.com’, ‘107.158.259.74’ ] (i.e. the domain at 
>> index 0 followed by the list of A records).
>> 
>> One other shortcoming I noticed was the somewhat limited list of error 
>> returns such as MISSING_REQUIRED_VALUE, INVALID_VALUE, 
>> INVALID_HEADER_FIELD_NAME… what about MISSING_DEPENDENCY or MISSING_RESOURCE?
>> 
>> What if we want to filter on Geo::IP’s ISP database, but the database isn’t 
>> present?
>> 
>> I don’t do a lot of volume (maybe 10 messages per second peak), so doing 
>> blocking lookups isn’t a problem.  But obviously this might be an issue for 
>> some high volume sites.
>> 
>> Feedback is welcome.
>> 
>> -Philip
>

Re: Plugin for filtering based on local criteria

Reply via email to