Re: URL scanning by bots

Christian Folini Tue, 30 Apr 2013 04:50:30 -0700

Hey André,

I do not think your protection mechanism is very good (for reasons
mentioned before) But you can try it out for yourself easily with 
2-3 ModSecurity rules and the "pause" directive.


Regs,

Christian


On Tue, Apr 30, 2013 at 12:03:28PM +0200, André Warnier wrote:
> Dear Apache developers,
> 
> This is a suggestion relative to the code of the Apache httpd webserver, and 
> a possible
> default new default option in the standard distribution of Apache httpd.
> It also touches on WWW security, which is why I felt that it belongs on this 
> list, rather
> than on the general user's list. Please correct me if I am mistaken.
> 
> According to Netcraft, there are currently some 600 Million webservers on the 
> WWW, with
> more than 60% of those identified as "Apache".
> I currently administer about 25 Apache httpd/Tomcat of these webservers, not 
> remarkable in
> any way (business applications for medium-sized companies).
> In the logs of these servers, every day, there are episodes like the 
> following :
> 
> 209.212.145.91 - - [03/Apr/2013:00:52:32 +0200] "GET /muieblackcat HTTP/1.1" 
> 404 362 "-" "-"
> 209.212.145.91 - - [03/Apr/2013:00:52:36 +0200] "GET //admin/index.php 
> HTTP/1.1" 404 365
> "-" "-"
> 209.212.145.91 - - [03/Apr/2013:00:52:36 +0200] "GET //admin/pma/index.php 
> HTTP/1.1" 404
> 369 "-" "-"
> 209.212.145.91 - - [03/Apr/2013:00:52:36 +0200] "GET 
> //admin/phpmyadmin/index.php
> HTTP/1.1" 404 376 "-" "-"
> 209.212.145.91 - - [03/Apr/2013:00:52:37 +0200] "GET //db/index.php HTTP/1.1" 
> 404 362 "-" "-"
> 209.212.145.91 - - [03/Apr/2013:00:52:37 +0200] "GET //dbadmin/index.php 
> HTTP/1.1" 404 367
> "-" "-"
> ... etc..
> 
> Such lines are the telltale trace of a "URL-scanning bot" or of the 
> "URL-scanning" part of
> a bot, and I am sure that you are all familiar with them.  Obviously, these 
> bots are
> trying to find webservers which exhibit poorly-designed or poorly-configured 
> applications,
> with the aim of identifying hosts which can be submitted to various kinds of 
> attacks, for
> various purposes.  As far as I can tell from my own unremarkable servers, I 
> would surmise
> that many or most webservers facing the Internet are submitted to this type 
> of scan every
> day.
> 
> Hopefully, most webservers are not really vulnerable to this type of scan.
> But the fact is that *these scans are happening, every day, on millions of 
> webservers*.
> And they are at least a nuisance, and at worst a serious security problem  
> when, as a
> result of poorly configured webservers or applications, they lead to 
> break-ins and
> compromised systems.
> 
> It is basically a numbers game, like malicious emails : it costs very little 
> to do this,
> and if even a tiny proportion of webservers exhibit one of these 
> vulnerabilities, because
> of the numbers involved, it is worth doing it.
> If there are 600 Million webservers, and 50% of them are scanned every day, 
> and 0.01% of
> these webservers are vulnerable because of one of these URLs, then it means 
> that every
> day, 30,000 (600,000,000 x 0.5 x 0.0001) vulnerable servers will be 
> identified.
> 
> About the "cost" aspect : from the data in my own logs, such bots seem to be 
> scanning
> about  20-30 URLs per pass, at a rate of about 3-4 URLs per second.
> Since it is taking my Apache httpd servers approximately 10 ms on average to 
> respond (by a
> 404 Not Found) to one of these requests, and they only request 1 URL per 250 
> ms, I would
> imagine that these bots have some built-in rate-limiting mechanism, to avoid 
> being
> "caught" by various webserver-protection tools.  Maybe also they are smart, 
> and scan
> several servers in parallel, so as to limit the rate at which they "burden" 
> any server in
> particular. (In this rough calculation, I am ignoring network latency for 
> now).
> 
> So if we imagine a smart bot which is scanning 10 servers in parallel, 
> issuing 4 requests
> per second to each of them, for a total of 20 URLs per server, and we assume 
> that all
> these requests result in 404 responses with an average response time of 10 
> ms, then it
> "costs" this bot only about 2 seconds to complete the scan of 10 servers.
> If there are 300 Million servers to scan, then the total cost for scanning 
> all the
> servers, by any number of such bots working cooperatively, is an aggregated 
> 60 Million
> seconds.  And if one of such "botnets" has 10,000 bots, that boils down to 
> only 6,000
> seconds per bot.
> 
> Scary, that 50% of all Internet webservers can be scanned for vulnerabilities 
> in less than
> 2 hours, and that such a scan may result in "harvesting" several thousand 
> hosts,
> candidates for takeover.
> 
> Now, how about making it so that without any special configuration or add-on 
> software or
> skills on the part of webserver administrators, it would cost these same bots 
> *about 100
> times as long (several days)* to do their scan ?
> 
> The only cost would a relatively small change to the Apache webservers, which 
> is what my
> suggestion consists of : adding a variable delay (say between 100 ms and 2000 
> ms) to any
> 404 response.
> 
> The suggestion is based on the observation that there is a dichotomy between 
> this kind of
> access by bots, and the kind of access made by legitimate HTTP users/clients 
> : legitimate
> users/clients (including the "good bots") are accessing mostly links "which 
> work", so they
> rarely get "404 Not Found" responses.  Malicious URL-scanning bots on the 
> other hand, by
> the very nature of what they are scanning for, are getting many "404 Not 
> Found" responses.
> 
> As a general idea thus, anything which impacts the delay to obtain a 404 
> response, should
> impact these bots much more than it impacts legitimate users/clients.
> 
> How much ?
> 
> Let us imagine for a moment that this suggestion is implemented in the Apache 
> webservers,
> and is enabled in the default configuration.  And let's imagine that after a 
> while, 20% of
> the Apache webservers deployed on the Internet have this feature enabled, and 
> are now
> delaying any 404 response by an average of 1000 ms.
> And let's re-use the numbers above, and redo the calculation.
> The same "botnet" of 10,000 bots is thus still scanning 300 Million 
> webservers, each bot
> scanning 10 servers at a time for 20 URLs per server.  Previously, this took 
> about 6000
> seconds.
> However now, instead of an average delay of 10 ms to obtain a 404 response, 
> in 20% of the
> cases (60 Million webservers) they will experience an average 1000 ms 
> additional delay per
> URL scanned.
> This adds (60,000,000 / 10 * 20 URLs * 1000 ms) 120,000,000 seconds to the 
> scan.
> Divided by 10,000 bots, this is 12,000 additional seconds per bot (roughly 3 
> 1/2 hours).
> 
> So with a small change to the code, no add-ons, no special configuration 
> skills on the
> part of the webserver administrator, no firewalls, no filtering, no need for 
> updates to
> any list of URLs or bot characteristics, little inconvenience to legitimate 
> users/clients,
> and a very partial adoption over time, it seems that this scheme could more 
> than double
> the cost for bots to acquire the same number of targets.  Or, seen another 
> way, it could
> more than halve the number of webservers being scanned every day.
> 
> I know that this is a hard sell.  The basic idea sounds a bit too simple to 
> be effective.
> It will not kill the bots, and it will not stop the bots from scanning 
> Internet servers in
> other ways that they use. It does not miraculously protect any single server 
> against such
> scans, and the benefit of any one server implementing this is diluted over 
> all webservers
> on the Internet.
> But it is also not meant as an absolute weapon.  It is targeted specifically 
> at a
> particular type of scan done by a particular type of bot for a particular 
> purpose, and is
> is just a scheme to make this more expensive for them.  It may or may not 
> discourage these
> bots from continuing with this type of scan (if it does, that would be a very 
> big result).
> But at the same time, compared to any other kind of tool that can be used 
> against these
> scans, this one seems really cheap to implement, it does not seem to be easy 
> to
> circumvent, and it seems to have at least a potential of bringing big 
> benefits to the WWW
> at large.
> 
> If there are reasonable objections to it, I am quite prepared to accept that, 
> and drop it.
>  I have already floated the idea in a couple of other places, and gotten what 
> could be
> described as "tepid" responses.  But it seems to me that most of the 
> negative-leaning
> responses which I received so far, were more of the a-priori "it will never 
> work" kind,
> rather than real objections based on real facts.
> 
> So my hope here is that someone has the patience to read through this, and 
> would have the
> additional patience to examine the idea "professionally".
> 

-- 
Christian Folini - <christian.fol...@netnea.com>

Re: URL scanning by bots

Reply via email to