On Jan 19, 2011, at 9:32 AM, William L. Thomson Jr. wrote: > On Wed, 2011-01-19 at 08:43 -0500, Kevin Johnson wrote: >> >> Actually you can have the robots.txt only affect Google. In the >> User-Agent field put "googlebot" instead of "*" > > Well thats worth a shot a least, providing thats the only user-agent > Google uses, which is not guaranteed. Also robots.txt is case sensitive > so would have to be Googlebot ;) >
Its not the only user-agent they use, but it is the one they respect in robots.txt. The file is not used to match to a user-agent, its used for the robot to find its settings. >> Robots.txt, if used effectively, which yes takes a little more time, >> is much more effective. > > I would feel more confident if there was an official standard[1]. > > "It is not an official standard backed by a standards body, or owned by > any commercial organisation (<-their typo). It is not enforced by > anybody, and there no guarantee that all current and future robots will > use it. Consider it a common facility the majority of robot authors > offer the WWW community to protect WWW server against unwanted accesses > by their robots." > > Also seems Google and others support allot of non-standard features and > options[2]. Not to mention the following. > > Last section -> Problems with spammers and other user-agents > > "You can verify that a bot accessing your server really is Googlebot by > using a reverse DNS lookup." > (if it claims to be googlebot with no ptr its a spammer or not google) > > snip... > > "Google has several other user-agents, including Feedfetcher (user-agent > Feedfetcher-Google)." > > http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=182072 > > > And again -> User-agents and bots > > "Google uses several different bots (user-agents). The bot we use for > our web search is Googlebot. Our other bots like Googlebot-Mobile and > Googlebot-Image follow rules you set up for Googlebot, but you can set > up specific rules for these specific bots as well." > http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 While I agree an "official" standard would be nice, if all we are trying to do is tell a well-known spider to stop hogging resources, it works. Kevin ---- Kevin Johnson Security Consultant Secure Ideas http://www.secureideas.net office - 904-639-6709 cell - 904-403-8024 --------------------------------------------------------------------- Archive http://marc.info/?l=jaxlug-list&r=1&w=2 RSS Feed http://www.mail-archive.com/[email protected]/maillist.xml Unsubscribe [email protected]

