On Jul 20, 2008, at 12:09 PM, Jonas Sicking wrote:
Ian Hickson wrote:
On Sat, 19 Jul 2008, Jonas Sicking wrote:
According to the HTML5 spec space is a valid characted inside URLs.
That wasn't intentional -- can you point to where it says that? The
HTML5 spec relies on spaces not being allowed in URLs in various
places.
In section 2.3.2 (Parsing URLs):
# Add all characters with codepoints less than or equal to U+0020 or
# greater than or equal to U+007F to the <unreserved> production.
And RFC 3986 says:
# Characters that are allowed in a URI but do not have a reserved
# purpose are called unreserved. These include uppercase and
lowercase
# letters, decimal digits, hyphen, period, underscore, and tilde.
#
# unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
That rule is about what conforming HTML5 UAs must do when processing a
URL with error handling, it does not change what is a valid URI. In
any case, even if we use the HTML5 parsing algorithm, splitting on
whitespace before applying it should work. And finally, since we are
not allowing a path, the main convenience reason for the error
handling accepting spaces is gone.
Regards,
Maciej