Hi,

I *HIGHLY* recommend mod_throttle for Apache.  It is very
configurable.  You can get the software at
http://www.snert.com/Software/mod_throttle/index.shtml .

The best thing about it is the ability to throttle based
on bandwidth and client IP.  We had problems with robots
as well as malicious end users who would flood our
server with requests.

mod_throttle allows you to set up rules to prevent one
IP address from making more than x requests for the
same document in y time period.  Our mod_perl servers,
for example, track the last 50 client IPs.  If one of
those clients goes about 50 requests, it is blocked
out.  The last client that requests a document is put
at the top of the list, so even very active legit users
tend to fall off the bottom, but things like robots
stay blocked.

I highly recommend you look into it.  We were doing some
custom writting functions to block this kind of thing,
but the Apache module makes it so much nicer.

Jeremy

-----Original Message-----
From: Bill Moseley [mailto:[EMAIL PROTECTED]]
Sent: Thursday, April 18, 2002 10:56 PM
To: [EMAIL PROTECTED]
Subject: Throttling, once again


Hi,

Wasn't there just a thread on throttling a few weeks ago?

I had a machine hit hard yesterday with a spider that ignored robots.txt.  

Load average was over 90 on a dual CPU Enterprise 3500 running Solaris 2.6.
 It's a mod_perl server, but has a few CGI scripts that it handles, and the
spider was hitting one of the CGI scripts over and over.  They were valid
requests, but coming in faster than they were going out.

Under normal usage the CGI scripts are only accessed a few times a day, so
it's not much of a problem have them served by mod_perl.  And under normal
peak loads RAM is not a problem.  

The machine also has bandwidth limitation (packet shaper is used to share
the bandwidth).  That combined with the spider didn't help things.  Luckily
there's 4GB so even at a load average of 90 it wasn't really swapping much.
 (Well not when I caught it, anyway).  This spider was using the same IP
for all requests.

Anyway, I remember Randal's Stonehenge::Throttle discussed not too long
ago.  That seems to address this kind of problem.  Is there anything else
to look into?  Since the front-end is mod_perl, it mean I can use mod_perl
throttling solution, too, which is cool.

I realize there's some fundamental hardware issues to solve, but if I can
just keep the spiders from flooding the machine then the machine is getting
by ok.

Also, does anyone have suggestions for testing once throttling is in place?
 I don't want to start cutting off the good customers, but I do want to get
an idea how it acts under load.  ab to the rescue, I suppose.

Thanks much,


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Reply via email to