in trying to get a spamd server to eat a boatload of RBLs,
  i've come across what i believe is a situation in which
  it would be desirable for spamd-setup to not perform
  the supernet/sort/nonoverlap functions in collapse_blacklist().

  this test host is freebsd 6.2-RC2 running amd64 on a dual cpu dual
  core dell 2850 with 12GB RAM ( freebsd as i had no success
  getting openbsd to run on it with i386+PAE or amd64 and see
  more than 4GB in despite of the patch i found on tech@ ).
  if i make it beyond testing successfully, in production will
  likely be a 9th gen dell box with some more RAM, anyway...

  here's the "wc -l" of the RBLs i'm trying to get in there:

1387159 /var/rbldns/sorbs/safe.dnsbl.sorbs.net.sorted
4975141 /var/rbldns/cbl/cbl.sorted
10379650 /var/rbldns/dsbl/dsbl.sorted
3191173 /var/rbldns/spamhaus/rsync/spamhaus.sorted
235 /var/rbldns/internal/customers-block.sorted
1618 /var/rbldns/internal/outside-block.sorted
19934976 total

  where each of the 'sorted' suffixes denote that i have
  run 'sort -nut. -k{1\,1,2\,2,3\,3,4\,4}' on them, in
  addition to pumping them through Net::CIDR::Lite to
  have it perform the supernetting.

  spamd.conf was then asked to use each of them as a
  seperate blacklist, and then spamd-setup was run
  on an empty <spamd>.

  the runtime, as reported by the time function builtin
  to "@(#)PD KSH v5.2.14.2 99/07/13.2" was 10531 seconds.

  i then took the contents of <spamd> and dumped to a file:

# pfctl -t spamd -Ts | wc -l
16175328
# pfctl -t spamd -Ts > rbls.ouch

  and reconfigured spamd.conf to use only that 'ouch' rbl and
  no others.

  ran spamd-setup again.  this time:

# time /usr/local/sbin/spamd-setup            
14000.20s real 8515.07s user 5473.58s system
# grep -v '#' /usr/local/etc/spamd.conf
all:ouch:
ouch:black:msg="ouch":file=/var/rbldns/rbls.ouch
# ls -l /var/rbldns/rbls.ouch
-rw-r--r-- 1 root wheel 282116850 Feb 13 14:20 /var/rbldns/rbls.ouch
# wc -l /var/rbldns/rbls.ouch
16175328 /var/rbldns/rbls.ouch
# pfctl -t spamd -Ts | wc -l
5550409

  i was initially confused about the discrepancy between
  5550409 and 16175328 until i ran the following small test
  that showed the difference:

---
# cat file1
1.2.3.0/30
# cat file2
1.2.3.1/32

# grep -ve '#' -e '^$' /usr/local/etc/spamd.conf
all:test:test2:
test:black:msg="test":file=/var/rbldns/file1
test2:black:msg="test":file=/var/rbldns/file2

# /usr/local/sbin/spamd-setup
# pfctl -t spamd -Ts
1.2.3.0/30
1.2.3.1
---

  so it seems that spamd-setup's collapse_blacklist()
  supernets/sorts *per* blacklist such that if a blacklist
  B enumerates an IP block that is within a larger block
  in blacklist A, both entries are kicked to pf (and perhaps
  spamd(8)?).  this could totally be the expected behaviour,
  and may well be obvious to anyone reading the source, but
  this trait isn't what i'm particularly concerned with.

  if i take that resultant 5,550,409 entry <spamd> table
  that is the culmination of all the possible supernetting
  and sorting and non-overlapping-ness of the component
  RBLs, and save that to a file, and then spamd-setup(8)
  *that* file, such that spamd-setup(8) has the least
  amount of work (i'm guessing) possible to do in preparation
  for pushing that data to pf(4) (note, it is inconsequential
  to me that i am losing the granularity of being able to
  send a 'msg' to any particular connecting IP %A about which
  RBL that IP was found on as we'll have all spamd.conf
  'msg's configured to go to an external lookup page that
  will show all the matching RBLs an IP was found on), it
  still takes a hefty amount of time.

---
# pfctl -t spamd -Ts > rbls.ouch2

# wc -l rbls.ouch2
 5550409 rbls.ouch2

# grep -ve '#' -e '^$' /usr/local/etc/spamd.conf
all:ouch2:
ouch2:black:msg="ouch2":file=/var/rbldns/rbls.ouch2

# time /usr/local/sbin/spamd-setup
1670.45s real 1056.12s user 610.64s system

  as opposed to:

# time pfctl -t spamd -Tr -f rbls.ouch2
5550409 addresses added.
2 addresses deleted.
23.66s real 10.68s user 12.97s system
# time pfctl -t spamd -Tr -f rbls.ouch2
no changes.
17.27s real 10.66s user 6.60s system
# time pfctl -t spamd -Tr -f rbls.ouch2
no changes.
17.28s real 10.71s user 6.56s system

  while i'm only totally guessing, i imagine that
  if spamd-setup could be told to not go through the
  motions of supernetting/sorting/uniqing the source
  data (where it would then be my responsibility to
  ensure that the data is "as it should be", or otherwise
  just like it would be after spamd-setup would be done
  collapsing it anyway), i'd be surprised if the entire
  spamd-setup process of shipping the data to spamd(8)
  and then pfctl'ing it into <spamd> took longer than
  two minutes - perhaps less.

  under two minutes would be entirely usable in a production
  environment given that we're talking about 5 million
  CIDR blocks, but 27 minutes is a bit suboptimal.

  would it be trivial to skip the expensive parts of
  collapse_blacklist such that it would essentially assume
  that all the input data was already "sanitized" and just
  go about populating spamd and pf? 

-- 

  jared

Reply via email to