On Mon, 2011-01-17 at 07:17 -0500, William L. Thomson Jr. wrote: > On Mon, 2011-01-17 at 03:34 -0500, Tom Allen wrote: > > Don't block Google IPs. Just use robots.txt on the google bot, it > > will respect it. You could even use that to block some pages if you > > want. > > The problem comes down to time. Blocking ips is quick and easy. Per the > provided examples, creating a robots.txt file for the wiki is not so > easy. Also really seems like one should be pre-existing in Wikimedia. > Since this should be a common problem. Not to mention Google's crawler > should have been adapted for Wikis surely MediaWiki which is used all > over the place. > > Kinda crazy that each has to go out and do the work on their own. If it > was a small or standard robots.txt file as I am accustomed to working > with no problem. But I have to go like dig in logs, monitor what pages > and queries Google is sending over. Then come up with rules in a > robots.txt file to prevent that. Along with trial and error, and all the > various pages or ways to hit the wiki. Seems like a massive waste of > time IMHO.
Went with this for now, we will see if it stops Google anytime soon. >From there can work on a creating a better robots.txt file for the wiki. Contributions are welcomed, as this is not something I care to spend much if any time on, thanks! :) http://www.jaxlug.org/robots.txt Though blocking IPs is not totally out of the question either http://www.mediawiki.org/wiki/Manual:Robots.txt#Problems -- William L. Thomson Jr. Systems Administrator Jacksonville Linux Users Group --------------------------------------------------------------------- Archive http://marc.info/?l=jaxlug-list&r=1&w=2 RSS Feed http://www.mail-archive.com/[email protected]/maillist.xml Unsubscribe [email protected]

