How about adding a MD5 watermark for the cookie ? Well, it is becoming
complicated ....

Peter Bi

----- Original Message -----
From: "kyle dawkins" <[EMAIL PROTECTED]>
To: "Peter Bi" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Friday, April 19, 2002 8:29 AM
Subject: Re: Throttling, once again


> Peter
>
> Storing the last access time, etc in a cookie won't work for a perl script
> that's abusing your site, or pretty much any spider, or even for anyone
> browsing without cookies, for that matter.
>
> The hit on the DB is so short and sweet and happens after the response has
> been sent to the user so they don't notice any delay and the apache child
> takes all of five hundredths of a second more to clean up.
>
> Kyle Dawkins
> Central Park Software
>
> On Friday 19 April 2002 11:18, Peter Bi wrote:
> > If merely the last access time and number of requests within a given
time
> > interval are needed, I think the fastest way is to record them in a
cookie,
> > and check them via an access control. Unfortunately, access control is
> > called before content handler, so the idea can't be used for CPU or
> > bandwidth throttles. In the later cases, one has to call DB/file/memory
for
> > history.
> >
> > Peter Bi
> >
> >
> > ----- Original Message -----
> > From: "kyle dawkins" <[EMAIL PROTECTED]>
> > To: <[EMAIL PROTECTED]>
> > Sent: Friday, April 19, 2002 8:02 AM
> > Subject: Re: Throttling, once again
> >
> > > Guys
> > >
> > > We also have a problem with evil clients. It's not always spiders...
in
> >
> > fact
> >
> > > more often than not it's some smart-ass with a customised perl script
> > > designed to screen-scrape all our data (usually to get email addresses
> > > for spam purposes).
> > >
> > > Our solution, which works pretty well, is to have a LogHandler that
> > > checks
> >
> > the
> >
> > > IP address of an incoming request and stores some information in the
DB
> >
> > about
> >
> > > that client; when it was last seen, how many requests it's made in the
> >
> > past n
> >
> > > seconds, etc.  It means a DB hit on every request but it's pretty
light,
> >
> > all
> >
> > > things considered.
> > >
> > > We then have an external process that wakes up every minute or so and
> >
> > checks
> >
> > > the DB for badly-behaved clients.  If it finds such clients, we get
email
> >
> > and
> >
> > > the IP is written into a file that is read by mod_rewrite, which sends
> > > bad clients to, well, wherever... http://www.microsoft.com is a good
one
> > > :-)
> > >
> > > It works great.  Of course, mod_throttle sounds pretty cool and maybe
> > > I'll test it out on our servers.  There are definitely more ways to do
> > > this...
> > >
> > > Which reminds me, you HAVE to make sure that your apache children are
> > > size-limited and you have a MaxClients setting where MaxClients *
> >
> > SizeLimit <
> >
> > > Free Memory.  If you don't, and you get slammed by one of these
wankers,
> >
> > your
> >
> > > server will swap and then you'll lose all the benefits of shared
memory
> >
> > that
> >
> > > apache and mod_perl offer us.  Check the thread out that was all over
the
> > > list about a  month ago for more information.  Basically, avoid
swapping
> >
> > at
> >
> > > ALL costs.
> > >
> > >
> > > Kyle Dawkins
> > > Central Park Software
> > >
> > > On Friday 19 April 2002 08:55, Marc Slagle wrote:
> > > > We never tried mod_throttle, it might be the best solution.  Also,
one
> > > > thing to keep in mind is that some search engines will come from
> >
> > multiple
> >
> > > > IP addresses/user-agents at once, making them more difficult to
stop.
>
>

Reply via email to