If merely the last access time and number of requests within a given time
interval are needed, I think the fastest way is to record them in a cookie,
and check them via an access control. Unfortunately, access control is
called before content handler, so the idea can't be used for CPU or
bandwidth throttles. In the later cases, one has to call DB/file/memory for
history.

Peter Bi


----- Original Message -----
From: "kyle dawkins" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, April 19, 2002 8:02 AM
Subject: Re: Throttling, once again


> Guys
>
> We also have a problem with evil clients. It's not always spiders... in
fact
> more often than not it's some smart-ass with a customised perl script
> designed to screen-scrape all our data (usually to get email addresses for
> spam purposes).
>
> Our solution, which works pretty well, is to have a LogHandler that checks
the
> IP address of an incoming request and stores some information in the DB
about
> that client; when it was last seen, how many requests it's made in the
past n
> seconds, etc.  It means a DB hit on every request but it's pretty light,
all
> things considered.
>
> We then have an external process that wakes up every minute or so and
checks
> the DB for badly-behaved clients.  If it finds such clients, we get email
and
> the IP is written into a file that is read by mod_rewrite, which sends bad
> clients to, well, wherever... http://www.microsoft.com is a good one :-)
>
> It works great.  Of course, mod_throttle sounds pretty cool and maybe I'll
> test it out on our servers.  There are definitely more ways to do this...
>
> Which reminds me, you HAVE to make sure that your apache children are
> size-limited and you have a MaxClients setting where MaxClients *
SizeLimit <
> Free Memory.  If you don't, and you get slammed by one of these wankers,
your
> server will swap and then you'll lose all the benefits of shared memory
that
> apache and mod_perl offer us.  Check the thread out that was all over the
> list about a  month ago for more information.  Basically, avoid swapping
at
> ALL costs.
>
>
> Kyle Dawkins
> Central Park Software
>
> On Friday 19 April 2002 08:55, Marc Slagle wrote:
> > We never tried mod_throttle, it might be the best solution.  Also, one
> > thing to keep in mind is that some search engines will come from
multiple
> > IP addresses/user-agents at once, making them more difficult to stop.
>
>

Reply via email to