Re: [spamdyke-users] Blacklist Performance question

2008-11-02 Thread Eric Shubert
TazaTek wrote:
 I have about 1000 IP's in my blacklist_ip file ... and have been adding 
 more every week.
 
 At what point does the number of IP's become a performance penalty ?  I 
 was trying to reduce the load on the network by taking analyzed RBL 
 matches and place them in the blacklist file
 
 but if there becomes a penalty for adding too many IP's then maybe I 
 don't want to do this.
 
 Any feed back on when too many is really too many? 
 
 BTW - I'm on a VPS with 2 GB mem and quad-core CPUS with minimal traffic 
 on the machine, so plenty of horsepower but I'd like to leave it all 
 in reserves for my next  Digg/Slashdot opportunity :)
 
 Thanks
 
 Matt
 
 --

I gather that you're trying to reduce the load on the network by 
essentially using the blacklist_ip file as a sort of RBL facility.
Is RBL processing actually creating that much network traffic, or is 
this just a guess?

Do you have a caching nameserver installed on your server? You should, 
as that will drastically reduce network traffic.

How many / which RBLs are you presently using? You shouldn't need more 
than a few. Also, if you've specified an unresponsive or slow RBL, that 
can hinder your performance quite a bit.

-- 
-Eric 'shubes'

___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] Blacklist Performance question

2008-11-02 Thread Sam Clippinger
This is a tough question to answer.  spamdyke loads and parses each IP 
blacklist file once, every time it starts.  Each line is only read 
once.  So in terms of computational expense, the amount of work will 
grow linearly, proportional to the number of lines in the file(s).  The 
performance is O(1), in other words.

However, in absolute terms, so much depends on your specific hardware, 
operating system and traffic load that it's almost impossible to predict 
when/if you'll start having problems.  Even with thousands of IP 
addresses, the file size won't be very large, so the operating system 
and disk hardware should cache the data (unless your server is very busy 
doing other things).  Normally it isn't necessary to use thousands of 
lines to blacklist thousands of IPs -- if you added them manually, you 
would use ranges and subnet masks to reduce the amount of work.  From a 
script, that isn't really possible.

You could always test the process.  Create a few files with 65536 lines 
of bogus IP addresses and add them to your spamdyke configuration.  
Watch your server load and I/O load.  If it goes up, you'll know you're 
going to eventually have problems.  If there's no change, you're fine.

Overall however, several thoughts occur to me about what you're doing.  
First, be careful not to let the files grow beyond 65536 lines.  
spamdyke enforces that limit (arbitrarily) to guard against someone 
accidentally specifying the wrong file in their configuration file.  If 
you need more lines, split the content into two files and use two 
ip-blacklist-file entries in your spamdyke configuration.

Second, you're going to have a problem with stale data.  RBL operators 
clean their lists of bad entries when the IP address stops spamming (or 
they're supposed to).  However, if you're keeping a big list of 
addresses forever, you'll never be able to clean them (unless you store 
time/date information somewhere else, like a database).  You'll just be 
rejecting connections without knowing how long you've been doing it.

Third, I'm not sure you're actually gaining very much by doing this.  
Your approach suggests that you expect to see repeated connections from 
the same IP addresses for a long time (long enough for the script to run 
and make an entry in the file).  If an IP connects only a few times and 
never again, your approach will actually cause slower performance, not 
faster.  (A pattern of continually finding new IP addresses is more 
realistic, given your statement of adding new entries constantly.)  Even 
if an IP does connect many times for a long time, your nameserver should 
cache the RBL results, making lookups very fast (and very cheap if the 
nameserver is running on the mail server).  I could be wrong about all 
that -- some statistical analysis of your log files would be very 
enlightening, as they would show the number of IPs being blocked by RBLs 
(new IPs) versus the number of IPs blocked by the blacklist (old IPs).

If you really want to continue with this idea, there is another approach 
that might work better.  It would answer your concerns about file size 
and it would allow you to remove entries from the blacklist after some 
amount of time.  Instead of using a blacklist IP file, use a 
configuration directory.  Each blocked IP address would have its own 
file, which spamdyke would find with only one function call (much 
faster).  Also, each file would carry a timestamp, so your script could 
easily find and remove files that were older than a certain threshold.  
To do this, create a folder (e.g. /etc/spamdyke/config.d) and add the 
following line to your spamdyke configuration file:
config-dir=/etc/spamdyke/config.d
When your script runs, parse the IP address ww.xx.yy.zz into its four 
octets.  Use the first three to create a folder:
mkdir -p /etc/spamdyke/config.d/_ip_/ww/xx/yy
Then put the following line into a file named using all four octets 
(e.g. /etc/spamdyke/config.d/_ip_/ww/xx/yy/zz):
filter-level=reject-all
If the file already exists when your script runs, just touch it to 
update the timestamp so it won't be deleted.

Good luck!

-- Sam Clippinger

TazaTek wrote:
 Eric Shubert wrote:
   
 I gather that you're trying to reduce the load on the network by 
 essentially using the blacklist_ip file as a sort of RBL facility.
 Is RBL processing actually creating that much network traffic, or is 
 this just a guess?

 Do you have a caching nameserver installed on your server? You should, 
 as that will drastically reduce network traffic.

 How many / which RBLs are you presently using? You shouldn't need more 
 than a few. Also, if you've specified an unresponsive or slow RBL, that 
 can hinder your performance quite a bit
 
 I had some problems with my nameserver last week ... long story...  so I 
 was looking at my RBL list, and thought that I could reduce the overhead 
 of the RBL lookups by adding the IP to my blacklist_ip file... I've got 
 the makings of a 

[spamdyke-users] Blacklist Performance question

2008-11-01 Thread TazaTek
I have about 1000 IP's in my blacklist_ip file ... and have been adding 
more every week.


At what point does the number of IP's become a performance penalty ?  I 
was trying to reduce the load on the network by taking analyzed RBL 
matches and place them in the blacklist file


but if there becomes a penalty for adding too many IP's then maybe I 
don't want to do this.


Any feed back on when too many is really too many? 

BTW - I'm on a VPS with 2 GB mem and quad-core CPUS with minimal traffic 
on the machine, so plenty of horsepower but I'd like to leave it all 
in reserves for my next  Digg/Slashdot opportunity :)


Thanks

Matt

--
Kettlewell Enterprises, Inc
www.kettlewell.net

find me on Twitter - http://twitter.com/kettlewell


___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users