In [EMAIL PROTECTED], Sean M. Burke
[EMAIL PROTECTED] writes:
User-agent: *
Disallow: /cgi-bin/
Disallow: /~mojojojo/misc/
So I've changed it to this, and was about to submit it as a patch for the
next LWP release:
/^\s*Disallow:\s*(.*)/i
# Silently forgive leading whitespace.
But first, I thought I'd ask the list here: does anyone thing this'd break
anything?
The change should not break anything, files using leading whitespace for
comments or some other obscure purpose do not comply with the specification
anyway and will see varying results.
However, since the standard is sufficiently clear on the correct format, I
would rather opt to not support a non-standard format with leading whitespace
since developers will start relying on this feature and will complain that
other, standards compliant robots libraries don't support it (the infamous my
page works in Internet Explorer so I cannot be broken attitude).
Rather than modifying the library I would suggest any application that wants to
handle this content error gracefully should strip leading whitespace prior to
calling parse().
--
Klaus Johannes Rusch
[EMAIL PROTECTED]
http://www.atmedia.net/KlausRusch/