Guys We also have a problem with evil clients. It's not always spiders... in fact more often than not it's some smart-ass with a customised perl script designed to screen-scrape all our data (usually to get email addresses for spam purposes).
Our solution, which works pretty well, is to have a LogHandler that checks the IP address of an incoming request and stores some information in the DB about that client; when it was last seen, how many requests it's made in the past n seconds, etc. It means a DB hit on every request but it's pretty light, all things considered. We then have an external process that wakes up every minute or so and checks the DB for badly-behaved clients. If it finds such clients, we get email and the IP is written into a file that is read by mod_rewrite, which sends bad clients to, well, wherever... http://www.microsoft.com is a good one :-) It works great. Of course, mod_throttle sounds pretty cool and maybe I'll test it out on our servers. There are definitely more ways to do this... Which reminds me, you HAVE to make sure that your apache children are size-limited and you have a MaxClients setting where MaxClients * SizeLimit < Free Memory. If you don't, and you get slammed by one of these wankers, your server will swap and then you'll lose all the benefits of shared memory that apache and mod_perl offer us. Check the thread out that was all over the list about a month ago for more information. Basically, avoid swapping at ALL costs. Kyle Dawkins Central Park Software On Friday 19 April 2002 08:55, Marc Slagle wrote: > We never tried mod_throttle, it might be the best solution. Also, one > thing to keep in mind is that some search engines will come from multiple > IP addresses/user-agents at once, making them more difficult to stop.