On Jan 18, 2011, at 9:31 PM, William L. Thomson Jr. wrote: snip > > > Keep in mind I only wanted to stop the unwanted load from Google's > crawlers. The robots.txt file now effects every crawler, every search > engine. Which is really not ideal for a variety of reasons. > > At some point I and/or the group will need to revisit the robots.txt > file and dial it in so we can allow some stuff out to search engine. But > just sucks that since Google's crawler is not wiki friendly, other > crawlers must suffer and pay the price as well. > > The blocking IP approach would only have been specific to Google. Then > we would not have to worry about dealing with allowing certain > pages/areas of the wiki, denying others, etc. All because of Google, but > can't easily come up with a Google specific solution using robots.txt > file. > snip
Actually you can have the robots.txt only affect Google. In the User-Agent field put "googlebot" instead of "*" Robots.txt, if used effectively, which yes takes a little more time, is much more effective. Kevin ---- Kevin Johnson Security Consultant Secure Ideas http://www.secureideas.net office - 904-639-6709 cell - 904-403-8024 --------------------------------------------------------------------- Archive http://marc.info/?l=jaxlug-list&r=1&w=2 RSS Feed http://www.mail-archive.com/[email protected]/maillist.xml Unsubscribe [email protected]

