Package: apache2-data Version: 2.4.20-1 Severity: normal Tags: patch Dear maintainer,
Apache2 default site page includes links to manpages.debian.org. This is not a very good idea since many sites are left unconfigured by default and there are many (badly programmed) robots roaming the Internet and indexing sites. Last Monday 11th, DSA had to disable the 'manpages.debian.org' vhost service in glinka.debian.org because it was consuming continuously a large amount of CPU and affecting other services. Upon investigation, we have found that the service is being queried constantly for the following pages: (a2ensite, a2dissite, a2enmod, a2dismod, and a2ensite). The number of daily queries have ranged from 6000 to 11000 thousand and, starting May 8th, this has spiked to 93.000 to 141.000 daily queries! (you can see the details in the attached text file) These queries are distributed, in a single day we have identified at least 590 distinct hosts making them based on at least 309 misconfigured web servers. The culprit seems to be some strange script (programmed in GO, since the user agent is 'Go-http-client/1.1') which looks for websites and traverses them. When they hits sites like http://teplosnab24.ru/ they start traversing all URLs, including external connections. We have enhanced the service configuration used so that we can withstand the excess of (useless) queries for these manpages (as described in [1]). The issue does not exactly lie on the apache2-data current page, as these are scripts that are going awry, but this page is the "detonator" that has translated this problem into a service problem. Both DSA and I believe that the Apache2 default configuration should avoid this misbehaviour by not including links to external sites. Please find attached a patch that removes those links from the index.html page which is added by default to all Apache sites installed in Debian. Alternatively, if you consider the manual pages to be useful, I would suggest they are included (in HTML format) as part of the Apache2-data package itself instead of linking to the external manpages.debian.org service. This change will at least prevent our service from getting hammered by these misconfigured robots. Thanks for your help, Javier Fernandez-Sanguino [1] https://lists.debian.org/debian-doc/2016/04/msg00055.html -- System Information: Debian Release: stretch/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: i386 (i686) Kernel: Linux 4.4.0-1-686-pae (SMP w/4 CPU cores) Locale: LANG=es_ES.utf8, LC_CTYPE=es_ES.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system)
--- index.html.orig 2016-04-17 16:41:46.000000000 +0200 +++ index.html 2016-04-17 16:42:41.000000000 +0200 @@ -293,17 +293,17 @@ *-available/ counterparts. These should be managed by using our helpers <tt> - <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2enmod">a2enmod</a>, - <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2dismod">a2dismod</a>, + a2enmod, + a2dismod, </tt> <tt> - <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2ensite">a2ensite</a>, - <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2dissite">a2dissite</a>, + a2ensite, + a2dissite, </tt> and <tt> - <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2enconf">a2enconf</a>, - <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=a2disconf">a2disconf</a> + a2enconf, + a2disconf </tt>. See their respective man pages for detailed information. </li>
Logs of queries to the manpages.debian.org service associated with Apache2 manual pages. The list below marks the number of access and the HTTP answer codes returned. This information has been extracted by running, in glinka's /var/log/apache2, the following code: ~ (for i in manpages.debian.org-access.log-*gz ; do echo -n "$i: "; zgrep "query=a2" $i | wc -l; zcat $i |grep "query=a2" | awk '{print $12" "$9}' | sort | uniq -c | sort -nr | head -5 ; done ) 2>&1 --------------------------------------------------------------------------------------------------------------------- manpages.debian.org-access.log-20160331.gz: 9467 9464 "-" 200 3 "-" 304 manpages.debian.org-access.log-20160401.gz: 9582 9578 "-" 200 3 "-" 304 1 "-" 206 manpages.debian.org-access.log-20160402.gz: 11784 11783 "-" 200 1 "-" 304 manpages.debian.org-access.log-20160403.gz: 15585 15582 "-" 200 2 "-" 304 1 "-" 206 manpages.debian.org-access.log-20160404.gz: 6705 6704 "-" 200 1 "-" 304 manpages.debian.org-access.log-20160405.gz: 8657 8652 "-" 200 5 "-" 304 manpages.debian.org-access.log-20160406.gz: 9979 9971 "-" 200 8 "-" 304 manpages.debian.org-access.log-20160407.gz: 8334 8330 "-" 200 3 "-" 304 1 mini.com/" 200 manpages.debian.org-access.log-20160408.gz: 93729 93617 "-" 200 90 "-" 500 16 "-" 504 2 "-" 304 2 "-" 206 manpages.debian.org-access.log-20160409.gz: 141661 141660 "-" 200 1 "-" 304 manpages.debian.org-access.log-20160410.gz: 140425 140423 "-" 200 2 "-" 304 manpages.debian.org-access.log-20160411.gz: 140878 138953 "-" 200 1840 "-" 504 82 "-" 500 3 "-" 304 manpages.debian.org-access.log-20160412.gz: 73254 73157 "-" 200 68 "-" 504 27 "-" 500 2 "-" 304 manpages.debian.org-access.log-20160416.gz: 100905 53898 "-" 301 42356 "Go-http-client/1.1" 200 2811 "-" 200 1154 "Mozilla/5.0 200 204 "Mozilla/4.0 200