Re: URL scanning by bots

Tom Evans Fri, 03 May 2013 03:27:07 -0700

On Fri, May 3, 2013 at 10:54 AM, André Warnier <a...@ice-sa.com> wrote:
> So here is a challenge for the Apache devs : describe how a bot-writer could
> update his software to avoid the consequences of the scheme that I am
> advocating, without consequences on the effectivity of their URL-scanning.


This has been explained several times. The bot makes requests
asynchronously with a short select() timeout. If it doesn't have a
response from one of its current requests due to artificial delays, it
makes an additional request, not necessarily to the same server.

The fact that a single response takes longer to arrive is not
relevant, the bot can overall process roughly as many requests in the
same period as without a delay. The amount of concurrency that would
be required would be proportional to the artificial delay and the
network RTT.

There is a little overhead due to the extra concurrency, but not much
- you are not processing any more requests in a specific time period,
nor using more network traffic than without concurrency, the only real
cost is more simultaneous network connections, most of which are idle
waiting for the artificial delay to expire.

I would not be surprised if bots already behave like this, as it is a
useful way of increasing scanning rate if you have servers that are
slow to respond already, or have high network RTT.

Tom

Re: URL scanning by bots

Reply via email to