On Nov 16, 2009, at 2:43 PM, John Lauro wrote:

> Oopps, my bad...  It's actually tc and not iptables.  Google    tc qdisc
> for some info.
> 
> You could allow your local ips go unrestricted, and throttle all other IPs
> to 512kb/sec for example.

Hmmm... The problem isn't the data rate, it's the work associated with incoming 
requests. As soon as a 500 byte request hits, the web server has to do a lot of 
work. 

> What software is the running on?  I assume it's not running under apache or
> there would be some ways to tune apache.  As other have mentioned, telling
> the crawlers to behave themselves or totally ignore the wiki with a robots
> file is probably best.

Well the web server is Apache, but surprisingly Apache doesn't allow for tuning 
this particular case. Suppose normal request traffic looks like (A are users)

Time ->

A  A   AA  A    A   AAA  A    AA A

With the bot this becomes

ABBBBBBBBBB A BBBBA BBA BBBBBA AABBBBBB

So you can see that normal users are just swamped out of "slots". The webserver 
can render about 9 pages at the same time without impact, but it takes a second 
or more to render. At first I set MaxClients to 9, which makes it so the web 
server doesn't swap to death, but if the bots have 8 requests queued up, and 
then another 8, and another 8, regular users have no chance of decent 
interactivity...

This may be a corner case due to slow serving, because I'm having a hard time 
finding a way to throttle the bots. I suppose that normally you'd just add 
servers...

Wout.

Reply via email to