Peter

Storing the last access time, etc in a cookie won't work for a perl script 
that's abusing your site, or pretty much any spider, or even for anyone 
browsing without cookies, for that matter.

The hit on the DB is so short and sweet and happens after the response has 
been sent to the user so they don't notice any delay and the apache child 
takes all of five hundredths of a second more to clean up.

Kyle Dawkins
Central Park Software

On Friday 19 April 2002 11:18, Peter Bi wrote:
> If merely the last access time and number of requests within a given time
> interval are needed, I think the fastest way is to record them in a cookie,
> and check them via an access control. Unfortunately, access control is
> called before content handler, so the idea can't be used for CPU or
> bandwidth throttles. In the later cases, one has to call DB/file/memory for
> history.
>
> Peter Bi
>
>
> ----- Original Message -----
> From: "kyle dawkins" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Friday, April 19, 2002 8:02 AM
> Subject: Re: Throttling, once again
>
> > Guys
> >
> > We also have a problem with evil clients. It's not always spiders... in
>
> fact
>
> > more often than not it's some smart-ass with a customised perl script
> > designed to screen-scrape all our data (usually to get email addresses
> > for spam purposes).
> >
> > Our solution, which works pretty well, is to have a LogHandler that
> > checks
>
> the
>
> > IP address of an incoming request and stores some information in the DB
>
> about
>
> > that client; when it was last seen, how many requests it's made in the
>
> past n
>
> > seconds, etc.  It means a DB hit on every request but it's pretty light,
>
> all
>
> > things considered.
> >
> > We then have an external process that wakes up every minute or so and
>
> checks
>
> > the DB for badly-behaved clients.  If it finds such clients, we get email
>
> and
>
> > the IP is written into a file that is read by mod_rewrite, which sends
> > bad clients to, well, wherever... http://www.microsoft.com is a good one
> > :-)
> >
> > It works great.  Of course, mod_throttle sounds pretty cool and maybe
> > I'll test it out on our servers.  There are definitely more ways to do
> > this...
> >
> > Which reminds me, you HAVE to make sure that your apache children are
> > size-limited and you have a MaxClients setting where MaxClients *
>
> SizeLimit <
>
> > Free Memory.  If you don't, and you get slammed by one of these wankers,
>
> your
>
> > server will swap and then you'll lose all the benefits of shared memory
>
> that
>
> > apache and mod_perl offer us.  Check the thread out that was all over the
> > list about a  month ago for more information.  Basically, avoid swapping
>
> at
>
> > ALL costs.
> >
> >
> > Kyle Dawkins
> > Central Park Software
> >
> > On Friday 19 April 2002 08:55, Marc Slagle wrote:
> > > We never tried mod_throttle, it might be the best solution.  Also, one
> > > thing to keep in mind is that some search engines will come from
>
> multiple
>
> > > IP addresses/user-agents at once, making them more difficult to stop.

Reply via email to