Here’s a first attempt at a module.  I based it on Plugin::URIDetail.

It depends on Net::CIDR::Lite and Geo::IP.  If it detects a valid (though not 
necessarily current) ISP database, it will publish a handler for that. Same 
with the IP-Lite (or licensed IP) database from MaxMind.

We’ve been using the MaxMind database for a couple of years on a commercial 
project with good success.

Currently the filtering is done by country code, ISP name, and explicit CIDR 
blocks.

The last test is the least costly, but also the most fine grained… you can 
configure rules to run in whichever order suits your needs best.

I personally sort by country (cn ru bg vn ro ng ir) and then by ISP (won’t name 
them here, but one of them is Over tHere in France), and lastly by CIDR block.

The only real wart on these plugins is that they all index their databases by 
IP address, and do their own (implicit or explicit) name or IP mapping.  
Obviously, this is both blocking and repetitive.

Not sure why PerMsgStatus.pm can’t do the asynchronous name lookups when 
get_uri_detail_list() runs so we have that handy for each of the plugins.  If I 
had the mappings already available, I’d definitely use that.

That is, instead of having:

hosts => {
   ‘nqtel.com’ => ‘nqtel.com’
}

why not instead have:

hosts =>
   ‘nqtel.com’ => [ ‘107.158.259.74’ ]
}

or even both, e.g. [ ‘nqtel.com’, ‘107.158.259.74’ ] (i.e. the domain at index 
0 followed by the list of A records).

One other shortcoming I noticed was the somewhat limited list of error returns 
such as MISSING_REQUIRED_VALUE, INVALID_VALUE, INVALID_HEADER_FIELD_NAME… what 
about MISSING_DEPENDENCY or MISSING_RESOURCE?

What if we want to filter on Geo::IP’s ISP database, but the database isn’t 
present?

I don’t do a lot of volume (maybe 10 messages per second peak), so doing 
blocking lookups isn’t a problem.  But obviously this might be an issue for 
some high volume sites.

Feedback is welcome.

-Philip

# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at:
# 
#     http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
#
# TODO: where are the tests?

=head1 NAME

URILocalBL - blacklist URIs using local information (address lists and country codes)

=head1 SYNOPSIS

This plugin creates some new rule test types, such as "uri_block_cc",
"uri_block_cidr", and "uri_block_isp".  These rules apply to all URIs
found in the message.

  loadplugin    Mail::SpamAssassin::Plugin::URILocalBL

=head1 RULE DEFINITIONS AND PRIVILEGED SETTINGS

The format for defining a rule is as follows:

  uri_block_cc SYMBOLIC_TEST_NAME cc1 cc2 cc3 cc4

or:

  uri_block_cidr SYMBOLIC_TEST_NAME a.a.a.a b.b.b.b/cc d.d.d.d-e.e.e.e

or:

  uri_block_isp SYMBOLIC_TEST_NAME "DataRancid" "McCarrier" "Phishers-r-Us"

Example rule for matching a URI in China:

  uri_block_cc TEST1 cn

This would block the URL http://www.baidu.com/index.htm.  Similarly, to
match a Spam-haven netblock:

  uri_block_cidr TEST2 65.181.64.0/18

would match a netblock where several phishing sites were recently hosted.

And to block all CIDR blocks registered to an ISP, one might use:

  uri_block_isp TEST3 "ColoCrossing"

if one didn't trust URL's pointing to that organization's clients.

=cut

package Mail::SpamAssassin::Plugin::URILocalBL;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(untaint_var);

use Geo::IP;
use Net::CIDR::Lite;
use Socket;

use strict;
use warnings;
use bytes;
use re 'taint';

use vars qw(@ISA);
@ISA = qw(Mail::SpamAssassin::Plugin);

# constructor
sub new {
  my $class = shift;
  my $mailsaobject = shift;

  # some boilerplate...
  $class = ref($class) || $class;
  my $self = $class->SUPER::new($mailsaobject);
  bless ($self, $class);

  # how to handle failure to get the database handle?
  # and we don't really have a valid return value...
  # can we defer getting this handle until we actually see
  # a uri_block_cc rule?

  # this code burps an ugly message if it fails, but that's redirected elsewhere
  $self->{geoip} = Geo::IP->new(GEOIP_MEMORY_CACHE | GEOIP_CHECK_CACHE);
  $self->{geoisp} = Geo::IP->open_type(GEOIP_ISP_EDITION, GEOIP_MEMORY_CACHE | GEOIP_CHECK_CACHE);

  $self->register_eval_rule("check_uri_local_bl");

  $self->set_config($mailsaobject->{conf});

  return $self;
}

sub set_config {
  my ($self, $conf) = @_;
  my @cmds;

  my $pluginobj = $self;        # allow use inside the closure below

  push (@cmds, {
    setting => 'uri_block_cc',
    is_priv => 1,
    code => sub {
      my ($self, $key, $value, $line) = @_;

      if ($value !~ /^(\S+)\s+(.+)$/) {
	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      }
      my $name = $1;
      my $def = $2;
      my $added_criteria = 0;

      $conf->{parser}->{conf}->{uri_local_bl}->{$name}->{countries} = {};

      # this should match all country codes including satellite providers
      while ($def =~ m/^\s*([a-z][a-z0-9])(\s+(.*)|)$/) {
	my $cc = $1;
	my $rest = $2;

	#dbg("config: uri_block_cc adding %s to %s\n", $cc, $name);
        $conf->{parser}->{conf}->{uri_local_bl}->{$name}->{countries}->{uc($cc)} = 1;
	$added_criteria = 1;

        $def = $rest;
      }

      if ($added_criteria == 0) {
        warn "config: no arguments";
	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      } elsif ($def ne '') {
        warn "config: failed to add invalid rule $name";
	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      }

      dbg("config: uri_block_cc added %s\n", $name);

      $conf->{parser}->add_test($name, 'check_uri_local_bl()', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
    }
  }) if (defined $self->{geoip});

  push (@cmds, {
    setting => 'uri_block_isp',
    is_priv => 1,
    code => sub {
      my ($self, $key, $value, $line) = @_;

      if ($value !~ /^(\S+)\s+(.+)$/) {
	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      }
      my $name = $1;
      my $def = $2;
      my $added_criteria = 0;

      $conf->{parser}->{conf}->{uri_local_bl}->{$name}->{isps} = {};

      # gather up quoted strings
      while ($def =~ m/^\s*"([^"]*)"(\s+(.*)|)$/) {
	my $isp = $1;
	my $rest = $2;

	#dbg("config: uri_block_isp adding %s to %s\n", $isp, $name);
        $conf->{parser}->{conf}->{uri_local_bl}->{$name}->{isps}->{$isp} = 1;
	$added_criteria = 1;

        $def = $rest;
      }

      if ($added_criteria == 0) {
        warn "config: no arguments";
	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      } elsif ($def ne '') {
        warn "config: failed to add invalid rule $name";
	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      }

      #dbg("config: uri_block_isp added %s\n", $name);

      $conf->{parser}->add_test($name, 'check_uri_local_bl()', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
    }
  }) if (defined $self->{geoisp});

  push (@cmds, {
    setting => 'uri_block_cidr',
    is_priv => 1,
    code => sub {
      my ($self, $key, $value, $line) = @_;

      if ($value !~ /^(\S+)\s+(.+)$/) {
	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      }
      my $name = $1;
      my $def = $2;
      my $added_criteria = 0;

      $conf->{parser}->{conf}->{uri_local_bl}->{$name}->{cidr} = new Net::CIDR::Lite;

      # match individual IP's, subnets, and ranges
      while ($def =~ m/^\s*(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(\/\d{1,2}|-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})?)(\s+(.*)|)$/) {
	my $addr = $1;
	my $rest = $3;

	#dbg("config: uri_block_cidr adding %s to %s\n", $addr, $name);

        eval { $conf->{parser}->{conf}->{uri_local_bl}->{$name}->{cidr}->add_any($addr) };
        last if ($@);

	$added_criteria = 1;

        $def = $rest;
      }

      if ($added_criteria == 0) {
        warn "config: no arguments";
	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      } elsif ($def ne '') {
        warn "config: failed to add invalid rule $name";
	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      }

      # optimize the ranges
      $conf->{parser}->{conf}->{uri_local_bl}->{$name}->{cidr}->clean();

      dbg("config: uri_block_cidr added %s\n", $name);
      $conf->{parser}->add_test($name, 'check_uri_local_bl()', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
    }
  });
  
  $conf->{parser}->register_commands(\@cmds);
}  

sub check_uri_local_bl {
  my ($self, $permsg) = @_;

  my %uri_detail = %{ $permsg->get_uri_detail_list() };
  my $test = $permsg->{current_rule_name}; 
  my $rule = $permsg->{conf}->{uri_local_bl}->{$test};

  #dbg("check: uri_local_bl rule %s\n", $test);

  while (my ($raw, $info) = each %uri_detail) {

    next unless $info->{hosts};

    for my $host (keys $info->{hosts}) {

      if (exists $rule->{countries}) {
        #dbg("check: uri_local_bl countries %s\n", join(' ', sort keys $rule->{countries}));

        # this method does the name to address lookup for us;
        # we should probably do this ourselves asynchronously
        # instead.
        my $cc = $self->{geoip}->country_code_by_name($host);

        #dbg("check: uri_local_bl host %s maps to %s\n", $host, (defined $cc ? $cc : "(undef)"));

        # handle there being no associated country
        next unless defined $cc;

        # not in blacklist
        next unless (exists $rule->{countries}->{$cc});

        #dbg("check: uri_block_cc host %s matched\n", $host);

        if (would_log('dbg', 'rules') > 1) {
          dbg("check: uri_block_cc criteria for $test met");
        }
    
        $permsg->got_hit($test);

        # reset hash
        keys %uri_detail;

        return 0;
      }

      if (exists $rule->{isps}) {
        dbg("check: uri_local_bl isps %s\n", join(' ', sort keys $rule->{isps}));

        # this method does the name to address lookup for us;
        # we should probably do this ourselves asynchronously
        # instead.
        my $isp = $self->{geoisp}->isp_by_name($host);

        dbg("check: uri_local_bl isp %s maps to %s\n", $host, (defined $isp ? $isp : "(undef)"));

        # handle there being no associated country
        next unless defined $isp;

        # not in blacklist
        next unless (exists $rule->{isps}->{$isp});

        #dbg("check: uri_block_isp host %s matched\n", $host);

        if (would_log('dbg', 'rules') > 1) {
          dbg("check: uri_block_isp criteria for $test met");
        }
    
        $permsg->got_hit($test);

        # reset hash
        keys %uri_detail;

        return 0;
      }

      if (exists $rule->{cidr}) {
        #dbg("check: uri_block_cidr list %s\n", join(' ', $rule->{cidr}->list_range()));

        # again, this would be best cached from prior lookups
        my @addrs = gethostbyname($host);

        # convert to string values address list
	@addrs = map { inet_ntoa($_); } @addrs[4..$#addrs];

        for my $ip (@addrs) {
          next unless ($rule->{cidr}->find($ip));

          #dbg("check: uri_block_cidr host %s matched\n", $host);

          if (would_log('dbg', 'rules') > 1) {
            dbg("check: uri_block_cidr criteria for $test met");
          }

          $permsg->got_hit($test);

          # reset hash
          keys %uri_detail;

          return 0;
        }
      }
    }
  }

  #dbg("check: uri_local_bl no match\n");

  return 0;
}

1;



Reply via email to