>> Jonathan Knoll: >> >> User-agent: * >> >> Disallow: /cgi-bin >> >> Disallow: /site >> >> Klaus Johannes Rusch: >> > /cgi-bin/test.cgi >> > /siteindex.html >> > would be excluded. >> (Me:) >> But what about these paths (in the same root dir): >> >> /foo/cgi-bin/test.cgi >> /bar/user1/cgi-bin/test.sgi >> /bar/user2/cgi-bin/test.cgi >> >> Does the wildcard function recognize specified strings elsewhere (later) >> than in the immediate beginning of a path? > >Martin Beet: >The draft specification is quite clear on this: the strings are compared >octet by octet until the Allow / Disallow string ends, in which case this >rule matches, or until a mismatch is found. From the spec: > >" The matching process compares every octet in the path portion of > the URL and the path from the record. [...] The match > evaluates positively if and only if the end of the path from the > record is reached before a difference in octets is encountered."
Thanks, Martin! To briefly paraphrase this: A robot never traverses the URL beyond the lenght of the Disallow line. Thus a Disallow string cannot function as a *free* wildcard element ("Disallow: /foo" would apply to "/foo/bar" but not to "/bar/foo"). Regards, Tuomas