On Jan 19, 2011, at 9:32 AM, William L. Thomson Jr. wrote:

> On Wed, 2011-01-19 at 08:43 -0500, Kevin Johnson wrote:
>> 
>> Actually you can have the robots.txt only affect Google.  In the
>> User-Agent field put "googlebot" instead of "*"
> 
> Well thats worth a shot a least, providing thats the only user-agent
> Google uses, which is not guaranteed. Also robots.txt is case sensitive
> so would have to be Googlebot ;)
> 

Its not the only user-agent they use, but it is the one they respect in 
robots.txt.  The file is not used to match to a user-agent, its used for the 
robot to find its settings.

>> Robots.txt, if used effectively, which yes takes a little more time,
>> is much more effective.  
> 
> I would feel more confident if there was an official standard[1].
> 
> "It is not an official standard backed by a standards body, or owned by
> any commercial organisation (<-their typo). It is not enforced by
> anybody, and there no guarantee that all current and future robots will
> use it. Consider it a common facility the majority of robot authors
> offer the WWW community to protect WWW server against unwanted accesses
> by their robots."
> 
> Also seems Google and others support allot of non-standard features and
> options[2]. Not to mention the following.
> 
> Last section -> Problems with spammers and other user-agents
> 
> "You can verify that a bot accessing your server really is Googlebot by
> using a reverse DNS lookup."
> (if it claims to be googlebot with no ptr its a spammer or not google)
> 
> snip...
> 
> "Google has several other user-agents, including Feedfetcher (user-agent
> Feedfetcher-Google)."
> 
> http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=182072
> 
> 
> And again -> User-agents and bots
> 
> "Google uses several different bots (user-agents). The bot we use for
> our web search is Googlebot. Our other bots like Googlebot-Mobile and
> Googlebot-Image follow rules you set up for Googlebot, but you can set
> up specific rules for these specific bots as well."
> http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449


While I agree an "official" standard would be nice, if all we are trying to do 
is tell a well-known spider to stop hogging resources, it works.

Kevin
----
Kevin Johnson
Security Consultant
Secure Ideas
http://www.secureideas.net
office - 904-639-6709
cell - 904-403-8024


---------------------------------------------------------------------
Archive      http://marc.info/?l=jaxlug-list&r=1&w=2
RSS Feed     http://www.mail-archive.com/[email protected]/maillist.xml
Unsubscribe  [email protected]

Reply via email to