Re: Don't let search bots look at buglist.cgi

Ian Lance Taylor Mon, 16 May 2011 22:28:05 -0700

On Mon, May 16, 2011 at 6:42 AM, Richard Guenther
<richard.guent...@gmail.com> wrote:
>>>
>>> httpd being in the top-10 always, fiddling with bugzilla URLs?
>>> (Note, I don't have access to gcc.gnu.org, I'm relaying info from multiple
>>> instances of discussion on #gcc and richi poking on it; that said, it
>>> still might not be web crawlers, that's right, but I'll happily accept
>>> _any_ load improvement on gcc.gnu.org, how unfounded they might seem)


I think that simply blocking buglist.cgi has dropped bugzilla off the
immediate radar.
It also seems to have lowered the load, although I'm not sure if we
are still keeping
historical data.


> I for example see also
>
> 66.249.71.59 - - [16/May/2011:13:37:58 +0000] "GET
> /viewcvs?view=revision&revision=169814 HTTP/1.1" 200 1334 "-"
> "Mozilla/5.0 (compatible; Googlebot/2.1;
> +http://www.google.com/bot.html)" (35%) 2060117us
>
> and viewvc is certainly even worse (from an I/O perspecive).  I thought
> we blocked all bot traffic from the viewvc stuff ...

This is only happening at top level.  I committed this patch to fix this.

Ian

Index: robots.txt
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/robots.txt,v
retrieving revision 1.10
diff -u -r1.10 robots.txt
--- robots.txt	13 May 2011 17:09:11 -0000	1.10
+++ robots.txt	17 May 2011 05:19:11 -0000
@@ -2,8 +2,8 @@
 # for information about the file format.
 # Contact g...@gcc.gnu.org for questions.
 
-User-Agent: *
-Disallow: /viewcvs/
+User-agent: *
+Disallow: /viewcvs
 Disallow: /cgi-bin/
 Disallow: /bugzilla/buglist.cgi
 Crawl-Delay: 60

Re: Don't let search bots look at buglist.cgi

Reply via email to