Hey André, I do not think your protection mechanism is very good (for reasons mentioned before) But you can try it out for yourself easily with 2-3 ModSecurity rules and the "pause" directive.
Regs, Christian On Tue, Apr 30, 2013 at 12:03:28PM +0200, André Warnier wrote: > Dear Apache developers, > > This is a suggestion relative to the code of the Apache httpd webserver, and > a possible > default new default option in the standard distribution of Apache httpd. > It also touches on WWW security, which is why I felt that it belongs on this > list, rather > than on the general user's list. Please correct me if I am mistaken. > > According to Netcraft, there are currently some 600 Million webservers on the > WWW, with > more than 60% of those identified as "Apache". > I currently administer about 25 Apache httpd/Tomcat of these webservers, not > remarkable in > any way (business applications for medium-sized companies). > In the logs of these servers, every day, there are episodes like the > following : > > 209.212.145.91 - - [03/Apr/2013:00:52:32 +0200] "GET /muieblackcat HTTP/1.1" > 404 362 "-" "-" > 209.212.145.91 - - [03/Apr/2013:00:52:36 +0200] "GET //admin/index.php > HTTP/1.1" 404 365 > "-" "-" > 209.212.145.91 - - [03/Apr/2013:00:52:36 +0200] "GET //admin/pma/index.php > HTTP/1.1" 404 > 369 "-" "-" > 209.212.145.91 - - [03/Apr/2013:00:52:36 +0200] "GET > //admin/phpmyadmin/index.php > HTTP/1.1" 404 376 "-" "-" > 209.212.145.91 - - [03/Apr/2013:00:52:37 +0200] "GET //db/index.php HTTP/1.1" > 404 362 "-" "-" > 209.212.145.91 - - [03/Apr/2013:00:52:37 +0200] "GET //dbadmin/index.php > HTTP/1.1" 404 367 > "-" "-" > ... etc.. > > Such lines are the telltale trace of a "URL-scanning bot" or of the > "URL-scanning" part of > a bot, and I am sure that you are all familiar with them. Obviously, these > bots are > trying to find webservers which exhibit poorly-designed or poorly-configured > applications, > with the aim of identifying hosts which can be submitted to various kinds of > attacks, for > various purposes. As far as I can tell from my own unremarkable servers, I > would surmise > that many or most webservers facing the Internet are submitted to this type > of scan every > day. > > Hopefully, most webservers are not really vulnerable to this type of scan. > But the fact is that *these scans are happening, every day, on millions of > webservers*. > And they are at least a nuisance, and at worst a serious security problem > when, as a > result of poorly configured webservers or applications, they lead to > break-ins and > compromised systems. > > It is basically a numbers game, like malicious emails : it costs very little > to do this, > and if even a tiny proportion of webservers exhibit one of these > vulnerabilities, because > of the numbers involved, it is worth doing it. > If there are 600 Million webservers, and 50% of them are scanned every day, > and 0.01% of > these webservers are vulnerable because of one of these URLs, then it means > that every > day, 30,000 (600,000,000 x 0.5 x 0.0001) vulnerable servers will be > identified. > > About the "cost" aspect : from the data in my own logs, such bots seem to be > scanning > about 20-30 URLs per pass, at a rate of about 3-4 URLs per second. > Since it is taking my Apache httpd servers approximately 10 ms on average to > respond (by a > 404 Not Found) to one of these requests, and they only request 1 URL per 250 > ms, I would > imagine that these bots have some built-in rate-limiting mechanism, to avoid > being > "caught" by various webserver-protection tools. Maybe also they are smart, > and scan > several servers in parallel, so as to limit the rate at which they "burden" > any server in > particular. (In this rough calculation, I am ignoring network latency for > now). > > So if we imagine a smart bot which is scanning 10 servers in parallel, > issuing 4 requests > per second to each of them, for a total of 20 URLs per server, and we assume > that all > these requests result in 404 responses with an average response time of 10 > ms, then it > "costs" this bot only about 2 seconds to complete the scan of 10 servers. > If there are 300 Million servers to scan, then the total cost for scanning > all the > servers, by any number of such bots working cooperatively, is an aggregated > 60 Million > seconds. And if one of such "botnets" has 10,000 bots, that boils down to > only 6,000 > seconds per bot. > > Scary, that 50% of all Internet webservers can be scanned for vulnerabilities > in less than > 2 hours, and that such a scan may result in "harvesting" several thousand > hosts, > candidates for takeover. > > Now, how about making it so that without any special configuration or add-on > software or > skills on the part of webserver administrators, it would cost these same bots > *about 100 > times as long (several days)* to do their scan ? > > The only cost would a relatively small change to the Apache webservers, which > is what my > suggestion consists of : adding a variable delay (say between 100 ms and 2000 > ms) to any > 404 response. > > The suggestion is based on the observation that there is a dichotomy between > this kind of > access by bots, and the kind of access made by legitimate HTTP users/clients > : legitimate > users/clients (including the "good bots") are accessing mostly links "which > work", so they > rarely get "404 Not Found" responses. Malicious URL-scanning bots on the > other hand, by > the very nature of what they are scanning for, are getting many "404 Not > Found" responses. > > As a general idea thus, anything which impacts the delay to obtain a 404 > response, should > impact these bots much more than it impacts legitimate users/clients. > > How much ? > > Let us imagine for a moment that this suggestion is implemented in the Apache > webservers, > and is enabled in the default configuration. And let's imagine that after a > while, 20% of > the Apache webservers deployed on the Internet have this feature enabled, and > are now > delaying any 404 response by an average of 1000 ms. > And let's re-use the numbers above, and redo the calculation. > The same "botnet" of 10,000 bots is thus still scanning 300 Million > webservers, each bot > scanning 10 servers at a time for 20 URLs per server. Previously, this took > about 6000 > seconds. > However now, instead of an average delay of 10 ms to obtain a 404 response, > in 20% of the > cases (60 Million webservers) they will experience an average 1000 ms > additional delay per > URL scanned. > This adds (60,000,000 / 10 * 20 URLs * 1000 ms) 120,000,000 seconds to the > scan. > Divided by 10,000 bots, this is 12,000 additional seconds per bot (roughly 3 > 1/2 hours). > > So with a small change to the code, no add-ons, no special configuration > skills on the > part of the webserver administrator, no firewalls, no filtering, no need for > updates to > any list of URLs or bot characteristics, little inconvenience to legitimate > users/clients, > and a very partial adoption over time, it seems that this scheme could more > than double > the cost for bots to acquire the same number of targets. Or, seen another > way, it could > more than halve the number of webservers being scanned every day. > > I know that this is a hard sell. The basic idea sounds a bit too simple to > be effective. > It will not kill the bots, and it will not stop the bots from scanning > Internet servers in > other ways that they use. It does not miraculously protect any single server > against such > scans, and the benefit of any one server implementing this is diluted over > all webservers > on the Internet. > But it is also not meant as an absolute weapon. It is targeted specifically > at a > particular type of scan done by a particular type of bot for a particular > purpose, and is > is just a scheme to make this more expensive for them. It may or may not > discourage these > bots from continuing with this type of scan (if it does, that would be a very > big result). > But at the same time, compared to any other kind of tool that can be used > against these > scans, this one seems really cheap to implement, it does not seem to be easy > to > circumvent, and it seems to have at least a potential of bringing big > benefits to the WWW > at large. > > If there are reasonable objections to it, I am quite prepared to accept that, > and drop it. > I have already floated the idea in a couple of other places, and gotten what > could be > described as "tepid" responses. But it seems to me that most of the > negative-leaning > responses which I received so far, were more of the a-priori "it will never > work" kind, > rather than real objections based on real facts. > > So my hope here is that someone has the patience to read through this, and > would have the > additional patience to examine the idea "professionally". > -- Christian Folini - <christian.fol...@netnea.com>