Re: Googling crawling wiki for hours

William L. Thomson Jr. Mon, 17 Jan 2011 08:25:09 -0800

On Mon, 2011-01-17 at 03:34 -0500, Tom Allen wrote:
> Don't block Google IPs.  Just use robots.txt on the google bot, it
> will respect it.  You could even use that to block some pages if you
> want.


The problem comes down to time. Blocking ips is quick and easy. Per the
provided examples, creating a robots.txt file for the wiki is not so
easy. Also really seems like one should be pre-existing in Wikimedia.
Since this should be a common problem. Not to mention Google's crawler
should have been adapted for Wikis surely MediaWiki which is used all
over the place.

Kinda crazy that each has to go out and do the work on their own. If it
was a small or standard robots.txt file as I am accustomed to working
with no problem. But I have to go like dig in logs, monitor what pages
and queries Google is sending over. Then come up with rules in a
robots.txt file to prevent that. Along with trial and error, and all the
various pages or ways to hit the wiki. Seems like a massive waste of
time IMHO.

-- 
William L. Thomson Jr.
Systems Administrator
Jacksonville Linux Users Group


---------------------------------------------------------------------
Archive      http://marc.info/?l=jaxlug-list&r=1&w=2
RSS Feed     http://www.mail-archive.com/[email protected]/maillist.xml
Unsubscribe  [email protected]

Re: Googling crawling wiki for hours

Reply via email to