Hi, I *HIGHLY* recommend mod_throttle for Apache. It is very configurable. You can get the software at http://www.snert.com/Software/mod_throttle/index.shtml .
The best thing about it is the ability to throttle based on bandwidth and client IP. We had problems with robots as well as malicious end users who would flood our server with requests. mod_throttle allows you to set up rules to prevent one IP address from making more than x requests for the same document in y time period. Our mod_perl servers, for example, track the last 50 client IPs. If one of those clients goes about 50 requests, it is blocked out. The last client that requests a document is put at the top of the list, so even very active legit users tend to fall off the bottom, but things like robots stay blocked. I highly recommend you look into it. We were doing some custom writting functions to block this kind of thing, but the Apache module makes it so much nicer. Jeremy -----Original Message----- From: Bill Moseley [mailto:[EMAIL PROTECTED]] Sent: Thursday, April 18, 2002 10:56 PM To: [EMAIL PROTECTED] Subject: Throttling, once again Hi, Wasn't there just a thread on throttling a few weeks ago? I had a machine hit hard yesterday with a spider that ignored robots.txt. Load average was over 90 on a dual CPU Enterprise 3500 running Solaris 2.6. It's a mod_perl server, but has a few CGI scripts that it handles, and the spider was hitting one of the CGI scripts over and over. They were valid requests, but coming in faster than they were going out. Under normal usage the CGI scripts are only accessed a few times a day, so it's not much of a problem have them served by mod_perl. And under normal peak loads RAM is not a problem. The machine also has bandwidth limitation (packet shaper is used to share the bandwidth). That combined with the spider didn't help things. Luckily there's 4GB so even at a load average of 90 it wasn't really swapping much. (Well not when I caught it, anyway). This spider was using the same IP for all requests. Anyway, I remember Randal's Stonehenge::Throttle discussed not too long ago. That seems to address this kind of problem. Is there anything else to look into? Since the front-end is mod_perl, it mean I can use mod_perl throttling solution, too, which is cool. I realize there's some fundamental hardware issues to solve, but if I can just keep the spiders from flooding the machine then the machine is getting by ok. Also, does anyone have suggestions for testing once throttling is in place? I don't want to start cutting off the good customers, but I do want to get an idea how it acts under load. ab to the rescue, I suppose. Thanks much, -- Bill Moseley mailto:[EMAIL PROTECTED]