Re: Googling crawling wiki for hours

Kevin Johnson Wed, 19 Jan 2011 05:43:37 -0800

On Jan 18, 2011, at 9:31 PM, William L. Thomson Jr. wrote:
snip
> 
> 
> Keep in mind I only wanted to stop the unwanted load from Google's
> crawlers. The robots.txt file now effects every crawler, every search
> engine. Which is really not ideal for a variety of reasons.
> 
> At some point I and/or the group will need to revisit the robots.txt
> file and dial it in so we can allow some stuff out to search engine. But
> just sucks that since Google's crawler is not wiki friendly, other
> crawlers must suffer and pay the price as well.
> 
> The blocking IP approach would only have been specific to Google. Then
> we would not have to worry about dealing with allowing certain
> pages/areas of the wiki, denying others, etc. All because of Google, but
> can't easily come up with a Google specific solution using robots.txt
> file.
> 
snip



Actually you can have the robots.txt only affect Google.  In the User-Agent 
field put "googlebot" instead of "*"

Robots.txt, if used effectively, which yes takes a little more time, is much 
more effective.  

Kevin

----
Kevin Johnson
Security Consultant
Secure Ideas
http://www.secureideas.net
office - 904-639-6709
cell - 904-403-8024


---------------------------------------------------------------------
Archive      http://marc.info/?l=jaxlug-list&r=1&w=2
RSS Feed     http://www.mail-archive.com/[email protected]/maillist.xml
Unsubscribe  [email protected]

Re: Googling crawling wiki for hours

Reply via email to