When this happened to our clients servers we ended up trying some of the
mod_perl based solutions.  We tried some of the modules that used shared
memory, but the traffic on our site quickly filled our shared memory and
made the module unuseable.  After that we tried blocking the agents
altogether, and there is example code in the Eagle book (Apache::BlockAgent)
that worked pretty well.

You might be able to place some of that code in your CGI, denying the search
engines agents/IPs from accessing it, while allowing real users in.  That
way the search engines can still get static pages.

We never tried mod_throttle, it might be the best solution.  Also, one thing
to keep in mind is that some search engines will come from multiple IP
addresses/user-agents at once, making them more difficult to stop.

> Hi,
>
> Wasn't there just a thread on throttling a few weeks ago?
>
> I had a machine hit hard yesterday with a spider that ignored robots.txt.
>
> Load average was over 90 on a dual CPU Enterprise 3500 running Solaris
2.6.
>  It's a mod_perl server, but has a few CGI scripts that it handles, and
the
> spider was hitting one of the CGI scripts over and over.  They were valid
> requests, but coming in faster than they were going out.
>
> Under normal usage the CGI scripts are only accessed a few times a day, so
> it's not much of a problem have them served by mod_perl.  And under normal
> peak loads RAM is not a problem.
>
> The machine also has bandwidth limitation (packet shaper is used to share
> the bandwidth).  That combined with the spider didn't help things.
Luckily
> there's 4GB so even at a load average of 90 it wasn't really swapping
much.
>  (Well not when I caught it, anyway).  This spider was using the same IP
> for all requests.
>
> Anyway, I remember Randal's Stonehenge::Throttle discussed not too long
> ago.  That seems to address this kind of problem.  Is there anything else
> to look into?  Since the front-end is mod_perl, it mean I can use mod_perl
> throttling solution, too, which is cool.
>
> I realize there's some fundamental hardware issues to solve, but if I can
> just keep the spiders from flooding the machine then the machine is
getting
> by ok.
>
> Also, does anyone have suggestions for testing once throttling is in
place?
>  I don't want to start cutting off the good customers, but I do want to
get
> an idea how it acts under load.  ab to the rescue, I suppose.
>
> Thanks much,
>
>
> --
> Bill Moseley
> mailto:[EMAIL PROTECTED]
>

Reply via email to