This is a tough question to answer. spamdyke loads and parses each IP
blacklist file once, every time it starts. Each line is only read
once. So in terms of computational expense, the amount of work will
grow linearly, proportional to the number of lines in the file(s). The
performance is O(1), in other words.
However, in absolute terms, so much depends on your specific hardware,
operating system and traffic load that it's almost impossible to predict
when/if you'll start having problems. Even with thousands of IP
addresses, the file size won't be very large, so the operating system
and disk hardware should cache the data (unless your server is very busy
doing other things). Normally it isn't necessary to use thousands of
lines to blacklist thousands of IPs -- if you added them manually, you
would use ranges and subnet masks to reduce the amount of work. From a
script, that isn't really possible.
You could always test the process. Create a few files with 65536 lines
of bogus IP addresses and add them to your spamdyke configuration.
Watch your server load and I/O load. If it goes up, you'll know you're
going to eventually have problems. If there's no change, you're fine.
Overall however, several thoughts occur to me about what you're doing.
First, be careful not to let the files grow beyond 65536 lines.
spamdyke enforces that limit (arbitrarily) to guard against someone
accidentally specifying the wrong file in their configuration file. If
you need more lines, split the content into two files and use two
ip-blacklist-file entries in your spamdyke configuration.
Second, you're going to have a problem with stale data. RBL operators
clean their lists of bad entries when the IP address stops spamming (or
they're supposed to). However, if you're keeping a big list of
addresses forever, you'll never be able to clean them (unless you store
time/date information somewhere else, like a database). You'll just be
rejecting connections without knowing how long you've been doing it.
Third, I'm not sure you're actually gaining very much by doing this.
Your approach suggests that you expect to see repeated connections from
the same IP addresses for a long time (long enough for the script to run
and make an entry in the file). If an IP connects only a few times and
never again, your approach will actually cause slower performance, not
faster. (A pattern of continually finding new IP addresses is more
realistic, given your statement of adding new entries constantly.) Even
if an IP does connect many times for a long time, your nameserver should
cache the RBL results, making lookups very fast (and very cheap if the
nameserver is running on the mail server). I could be wrong about all
that -- some statistical analysis of your log files would be very
enlightening, as they would show the number of IPs being blocked by RBLs
(new IPs) versus the number of IPs blocked by the blacklist (old IPs).
If you really want to continue with this idea, there is another approach
that might work better. It would answer your concerns about file size
and it would allow you to remove entries from the blacklist after some
amount of time. Instead of using a blacklist IP file, use a
configuration directory. Each blocked IP address would have its own
file, which spamdyke would find with only one function call (much
faster). Also, each file would carry a timestamp, so your script could
easily find and remove files that were older than a certain threshold.
To do this, create a folder (e.g. /etc/spamdyke/config.d) and add the
following line to your spamdyke configuration file:
config-dir=/etc/spamdyke/config.d
When your script runs, parse the IP address ww.xx.yy.zz into its four
octets. Use the first three to create a folder:
mkdir -p /etc/spamdyke/config.d/_ip_/ww/xx/yy
Then put the following line into a file named using all four octets
(e.g. /etc/spamdyke/config.d/_ip_/ww/xx/yy/zz):
filter-level=reject-all
If the file already exists when your script runs, just touch it to
update the timestamp so it won't be deleted.
Good luck!
-- Sam Clippinger
TazaTek wrote:
Eric Shubert wrote:
I gather that you're trying to reduce the load on the network by
essentially using the blacklist_ip file as a sort of RBL facility.
Is RBL processing actually creating that much network traffic, or is
this just a guess?
Do you have a caching nameserver installed on your server? You should,
as that will drastically reduce network traffic.
How many / which RBLs are you presently using? You shouldn't need more
than a few. Also, if you've specified an unresponsive or slow RBL, that
can hinder your performance quite a bit
I had some problems with my nameserver last week ... long story... so I
was looking at my RBL list, and thought that I could reduce the overhead
of the RBL lookups by adding the IP to my blacklist_ip file... I've got
the makings of a