Tom Evans wrote:
On Fri, May 3, 2013 at 10:54 AM, André Warnier <a...@ice-sa.com> wrote:
So here is a challenge for the Apache devs : describe how a bot-writer could
update his software to avoid the consequences of the scheme that I am
advocating, without consequences on the effectivity of their URL-scanning.
This has been explained several times. The bot makes requests
asynchronously with a short select() timeout. If it doesn't have a
response from one of its current requests due to artificial delays, it
makes an additional request, not necessarily to the same server.
The fact that a single response takes longer to arrive is not
relevant, the bot can overall process roughly as many requests in the
same period as without a delay. The amount of concurrency that would
be required would be proportional to the artificial delay and the
network RTT.
There is a little overhead due to the extra concurrency, but not much
- you are not processing any more requests in a specific time period,
nor using more network traffic than without concurrency, the only real
cost is more simultaneous network connections, most of which are idle
waiting for the artificial delay to expire.
I would not be surprised if bots already behave like this, as it is a
useful way of increasing scanning rate if you have servers that are
slow to respond already, or have high network RTT.
Ok, maybe I am understanding this wrongly. But I am open to be proven wrong.
Suppose a bot is scanning 10000 IP's, 100 at a time concurrently (*), for 20
potentially
vulnerable URLs per server. That is thus 200,000 HTTP requests to make.
And let's suppose that the bot cannot tell, from the delay experienced when
receiving any
particular response, if this is a server that is artifically delaying
responses, or if
this is a normal delay due to whatever condition (**).
And let's also suppose that, on the total of 200,000 requests, only 1% (2000) will be
"hits" (where the URL actually responds by other than a 404 response). That leaves 99% of
requests (198,000) responding with a 404.
And let's suppose that the bot is extra-smart, and always keeps his "pool" of 100 outgoing
connections busy, in the sense that as soon as a response was received on one connection,
that connection is closed and immediately re-opened for another HTTP request.
If no webserver implements the scheme, we assume 10 ms per 404 response.
So the bot launches the first batch of 100 requests (taking 10 ms to do so), then goes
back to check its first connection and finds a response. If the response is not a 404,
it's a "hit" and gets added to the table of vulnerable IP's
(and to gain some extra time, it means that if there would have been extra URLs to scan
for the same server, they could now be canceled - although this could be disputed).
If the response is a 404, it's a "miss". But it doesn't mean that there are no other
vulnerable URLs on that server, so it still needs to scan the others.
All in all, if the bot can keep issuing requests and processing responses at the rate of
100 per 10 ms on average, it will take it a total of 200,000 / 100 * 10 ms = 2,000 ms to
perform the scan of the 200,000 URLs, and it will have collected 2000 hits after doing so.
Now let's suppose that out of these 10000 servers, 10% of them implement the scheme, and
delay their 404 responses by an average of 1000 ms.
So now the bot launches the first 100 requests in 10 ms, then goes back to check the
status of the first one. With a probability of 0.1, this could be one of the delayed ones.
In that case, no response will be there yet, and the bot skips to the next
connection.
At the end of this pass, the bot will thus have received 90 responses (10 are still
delayed), and re-issued 90 new requests. Then on the next pass, the same 10 delayed
responses would still be delayed (on average), and among the 90 new ones, 9 would also be.
So now it can only issue 81 new requests, and when it comes back to check, 10 + 9 + 8 = 27
will be delayed.
Basically, after a few cycles like this, all his 100 pool connections will be waiting for
a response, and it would have no choice between either waiting, or starting to kill the
connections that have been waiting more than a certain amount of time.
Or, increasing its number of connections and become more conspicuous (***).
If it choses to wait, then its time to complete the scan of the 10000 IP's will have
increased by 200,000 * 10% * 1000 ms = 20,000,000 ms.
If it chooses not to wait, then it will never know if this URL was vulnerable
or not.
Is there a flaw in this reasoning ?
If not, then the avoidance-scheme based on becoming more parallel would be quite
ineffective, no ?
(*) I pick 100 at a time, imagining that as the number of established outgoing
connections
increases, a bot becomes more and more visible on the host it is running on. So
I imagine
that there is a reasonable limit to how many of them it can open at a time.
(**) this being because the server varies the individual 404 delay randomly
between 2
reasonable values (100 ms and 2000 ms e.g.)which can happen on any normal
server.
(***) I would say that a bot which would be opening 100 outgoing connections in parallel
on average would already be *very* conspicuous.