Re: Googling crawling wiki for hours

Chad Bailey Mon, 17 Jan 2011 06:06:20 -0800

Google's crawler obey's /robots.txt, that's where i'd start.

On Mon, Jan 17, 2011 at 1:20 AM, William L. Thomson Jr.
<[email protected]> wrote:
> Really not sure what is up with Google or what in the wiki the crawler
> finds so interesting. Not sure if I need to see about disabling history
> or other things that increase the size of the wiki. Since at this time
> its rather small. But Google will hit the wiki for hours and hours, and
> I think it might even span days. I haven't monitored it for that long so
> not sure, but its enough to where I have noticed it.
>
> Normally I would not care, but the only load being placed on the wiki
> for the most part is coming from Google. Resulting in a consistent CPU
> usage of ~20-40%. Not sure if I need to just block Google IP's or
> something drastic like that.
>
> Its really annoying, I can stop the web server and all connections from
> Google drop. Soon as I start it back up again, it Google requests start
> coming in right away. Just makes no sense why Google should spend so
> much time crawling a site with such little content. Short of the history
> and change logs.
>
> Unless others have a problem with it, I am about to block Google from
> crawling/indexing the Wiki. Or if anyone else has any suggestions, or
> experience with Google causing unnecessary loads on wiki's.
>
> --
> William L. Thomson Jr.
> Systems Administrator
> Jacksonville Linux Users Group
>
>
> ---------------------------------------------------------------------
> Archive      http://marc.info/?l=jaxlug-list&r=1&w=2
> RSS Feed     http://www.mail-archive.com/[email protected]/maillist.xml
> Unsubscribe  [email protected]
>
>


---------------------------------------------------------------------
Archive      http://marc.info/?l=jaxlug-list&r=1&w=2
RSS Feed     http://www.mail-archive.com/[email protected]/maillist.xml
Unsubscribe  [email protected]

Re: Googling crawling wiki for hours

Reply via email to