On Mon, May 16, 2011 at 6:42 AM, Richard Guenther <richard.guent...@gmail.com> wrote: >>> >>> httpd being in the top-10 always, fiddling with bugzilla URLs? >>> (Note, I don't have access to gcc.gnu.org, I'm relaying info from multiple >>> instances of discussion on #gcc and richi poking on it; that said, it >>> still might not be web crawlers, that's right, but I'll happily accept >>> _any_ load improvement on gcc.gnu.org, how unfounded they might seem)
I think that simply blocking buglist.cgi has dropped bugzilla off the immediate radar. It also seems to have lowered the load, although I'm not sure if we are still keeping historical data. > I for example see also > > 66.249.71.59 - - [16/May/2011:13:37:58 +0000] "GET > /viewcvs?view=revision&revision=169814 HTTP/1.1" 200 1334 "-" > "Mozilla/5.0 (compatible; Googlebot/2.1; > +http://www.google.com/bot.html)" (35%) 2060117us > > and viewvc is certainly even worse (from an I/O perspecive). I thought > we blocked all bot traffic from the viewvc stuff ... This is only happening at top level. I committed this patch to fix this. Ian
Index: robots.txt =================================================================== RCS file: /cvs/gcc/wwwdocs/htdocs/robots.txt,v retrieving revision 1.10 diff -u -r1.10 robots.txt --- robots.txt 13 May 2011 17:09:11 -0000 1.10 +++ robots.txt 17 May 2011 05:19:11 -0000 @@ -2,8 +2,8 @@ # for information about the file format. # Contact g...@gcc.gnu.org for questions. -User-Agent: * -Disallow: /viewcvs/ +User-agent: * +Disallow: /viewcvs Disallow: /cgi-bin/ Disallow: /bugzilla/buglist.cgi Crawl-Delay: 60