Gisle Aas wrote: > <[EMAIL PROTECTED]> writes: > >> The problem... if I include a space in my robot's user agent, it >> will fail to recognize robots.txt records targeted to my robot. > > You are not allowed to have space in the user agent name. See section > "3.8 Product Tokens" of RFC 2616 [1]. Isn't it an option to just > rename your spider to something that follows the spec?
Oops! Yes, of course. I will rename my spider accordingly. Patch proposal withdrawn. > I'm not really opposed to this patch if product names with spaces are > actually in common use. Do you have data to suggest it is? Well, I do... here's some spiders that hit my site last week that are of this form: Syndication Engine/1.1 (http://www.hexlet.com) Feedster Crawler/1.0; Feedster, Inc. Jakarta Commons-HttpClient/3.0-rc1 FAST Enterprise Crawler/6.4 (helpdesk at fast.no) Jakarta HTTP Client/1.0 UPG1 UP/4.0 (compatible; Blazer 1.0) On the other hand it's doubtful that any of these use RobotRules.pm, so these don't imply that a patch is called for. -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer
