Gisle Aas wrote:
> <[EMAIL PROTECTED]> writes:
> 
>> The problem... if I include a space in my robot's user agent, it
>> will fail to recognize robots.txt records targeted to my robot.
> 
> You are not allowed to have space in the user agent name.  See section
> "3.8 Product Tokens" of RFC 2616 [1].  Isn't it an option to just
> rename your spider to something that follows the spec?

Oops!  Yes, of course.  I will rename my spider accordingly.
Patch proposal withdrawn.

> I'm not really opposed to this patch if product names with spaces are
> actually in common use.  Do you have data to suggest it is?

Well, I do... here's some spiders that hit my site last week that are of this 
form:
Syndication Engine/1.1 (http://www.hexlet.com)
Feedster Crawler/1.0; Feedster, Inc.
Jakarta Commons-HttpClient/3.0-rc1
FAST Enterprise Crawler/6.4 (helpdesk at fast.no)
Jakarta HTTP Client/1.0
UPG1 UP/4.0 (compatible; Blazer 1.0)

On the other hand it's doubtful that any of these use RobotRules.pm, so these 
don't imply that a patch is called for.

-- 
Matthew.van.Eerde (at) hbinc.com               805.964.4554 x902
Hispanic Business Inc./HireDiversity.com       Software Engineer

Reply via email to