On Jul 20, 2008, at 12:09 PM, Jonas Sicking wrote:

Ian Hickson wrote:
On Sat, 19 Jul 2008, Jonas Sicking wrote:
According to the HTML5 spec space is a valid characted inside URLs.
That wasn't intentional -- can you point to where it says that? The HTML5 spec relies on spaces not being allowed in URLs in various places.

In section 2.3.2 (Parsing URLs):

# Add all characters with codepoints less than or equal to U+0020 or
# greater than or equal to U+007F to the <unreserved> production.

And RFC 3986 says:

# Characters that are allowed in a URI but do not have a reserved
# purpose are called unreserved. These include uppercase and lowercase
# letters, decimal digits, hyphen, period, underscore, and tilde.
#
#     unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

That rule is about what conforming HTML5 UAs must do when processing a URL with error handling, it does not change what is a valid URI. In any case, even if we use the HTML5 parsing algorithm, splitting on whitespace before applying it should work. And finally, since we are not allowing a path, the main convenience reason for the error handling accepting spaces is gone.

Regards,
Maciej


Reply via email to