How about adding a MD5 watermark for the cookie ? Well, it is becoming complicated ....
Peter Bi ----- Original Message ----- From: "kyle dawkins" <[EMAIL PROTECTED]> To: "Peter Bi" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, April 19, 2002 8:29 AM Subject: Re: Throttling, once again > Peter > > Storing the last access time, etc in a cookie won't work for a perl script > that's abusing your site, or pretty much any spider, or even for anyone > browsing without cookies, for that matter. > > The hit on the DB is so short and sweet and happens after the response has > been sent to the user so they don't notice any delay and the apache child > takes all of five hundredths of a second more to clean up. > > Kyle Dawkins > Central Park Software > > On Friday 19 April 2002 11:18, Peter Bi wrote: > > If merely the last access time and number of requests within a given time > > interval are needed, I think the fastest way is to record them in a cookie, > > and check them via an access control. Unfortunately, access control is > > called before content handler, so the idea can't be used for CPU or > > bandwidth throttles. In the later cases, one has to call DB/file/memory for > > history. > > > > Peter Bi > > > > > > ----- Original Message ----- > > From: "kyle dawkins" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Friday, April 19, 2002 8:02 AM > > Subject: Re: Throttling, once again > > > > > Guys > > > > > > We also have a problem with evil clients. It's not always spiders... in > > > > fact > > > > > more often than not it's some smart-ass with a customised perl script > > > designed to screen-scrape all our data (usually to get email addresses > > > for spam purposes). > > > > > > Our solution, which works pretty well, is to have a LogHandler that > > > checks > > > > the > > > > > IP address of an incoming request and stores some information in the DB > > > > about > > > > > that client; when it was last seen, how many requests it's made in the > > > > past n > > > > > seconds, etc. It means a DB hit on every request but it's pretty light, > > > > all > > > > > things considered. > > > > > > We then have an external process that wakes up every minute or so and > > > > checks > > > > > the DB for badly-behaved clients. If it finds such clients, we get email > > > > and > > > > > the IP is written into a file that is read by mod_rewrite, which sends > > > bad clients to, well, wherever... http://www.microsoft.com is a good one > > > :-) > > > > > > It works great. Of course, mod_throttle sounds pretty cool and maybe > > > I'll test it out on our servers. There are definitely more ways to do > > > this... > > > > > > Which reminds me, you HAVE to make sure that your apache children are > > > size-limited and you have a MaxClients setting where MaxClients * > > > > SizeLimit < > > > > > Free Memory. If you don't, and you get slammed by one of these wankers, > > > > your > > > > > server will swap and then you'll lose all the benefits of shared memory > > > > that > > > > > apache and mod_perl offer us. Check the thread out that was all over the > > > list about a month ago for more information. Basically, avoid swapping > > > > at > > > > > ALL costs. > > > > > > > > > Kyle Dawkins > > > Central Park Software > > > > > > On Friday 19 April 2002 08:55, Marc Slagle wrote: > > > > We never tried mod_throttle, it might be the best solution. Also, one > > > > thing to keep in mind is that some search engines will come from > > > > multiple > > > > > > IP addresses/user-agents at once, making them more difficult to stop. > >