On Mon, 2011-01-17 at 03:34 -0500, Tom Allen wrote: > Don't block Google IPs. Just use robots.txt on the google bot, it > will respect it. You could even use that to block some pages if you > want.
The problem comes down to time. Blocking ips is quick and easy. Per the provided examples, creating a robots.txt file for the wiki is not so easy. Also really seems like one should be pre-existing in Wikimedia. Since this should be a common problem. Not to mention Google's crawler should have been adapted for Wikis surely MediaWiki which is used all over the place. Kinda crazy that each has to go out and do the work on their own. If it was a small or standard robots.txt file as I am accustomed to working with no problem. But I have to go like dig in logs, monitor what pages and queries Google is sending over. Then come up with rules in a robots.txt file to prevent that. Along with trial and error, and all the various pages or ways to hit the wiki. Seems like a massive waste of time IMHO. -- William L. Thomson Jr. Systems Administrator Jacksonville Linux Users Group --------------------------------------------------------------------- Archive http://marc.info/?l=jaxlug-list&r=1&w=2 RSS Feed http://www.mail-archive.com/[email protected]/maillist.xml Unsubscribe [email protected]

