Make sure you set KeepAlive to off in Apache. That keeps more than one request being queued at a time without multiple connections being open. You can also have haproxy do this for you with option httpclose even if it's enabled in Apache.
You could then use --histcount with iptables rules and limit on the number of connections / sec based on ip addresses... > -----Original Message----- > From: Wout Mertens [mailto:wout.mert...@gmail.com] > Sent: Monday, November 16, 2009 9:19 AM > To: John Lauro > Cc: haproxy@formilux.org > Subject: Re: Preventing bots from starving other users? > > On Nov 16, 2009, at 2:43 PM, John Lauro wrote: > > > Oopps, my bad... It's actually tc and not iptables. Google tc > qdisc > > for some info. > > > > You could allow your local ips go unrestricted, and throttle all > other IPs > > to 512kb/sec for example. > > Hmmm... The problem isn't the data rate, it's the work associated with > incoming requests. As soon as a 500 byte request hits, the web server > has to do a lot of work. > > > What software is the running on? I assume it's not running under > apache or > > there would be some ways to tune apache. As other have mentioned, > telling > > the crawlers to behave themselves or totally ignore the wiki with a > robots > > file is probably best. > > Well the web server is Apache, but surprisingly Apache doesn't allow > for tuning this particular case. Suppose normal request traffic looks > like (A are users) > > Time -> > > A A AA A A AAA A AA A > > With the bot this becomes > > ABBBBBBBBBB A BBBBA BBA BBBBBA AABBBBBB > > So you can see that normal users are just swamped out of "slots". The > webserver can render about 9 pages at the same time without impact, but > it takes a second or more to render. At first I set MaxClients to 9, > which makes it so the web server doesn't swap to death, but if the bots > have 8 requests queued up, and then another 8, and another 8, regular > users have no chance of decent interactivity... > > This may be a corner case due to slow serving, because I'm having a > hard time finding a way to throttle the bots. I suppose that normally > you'd just add servers... > > Wout. > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.425 / Virus Database: 270.14.60/2495 - Release Date: > 11/16/09 07:43:00