Re: Problem parsing non-ASCII in query component

Julian Reschke Tue, 27 Dec 2016 08:51:52 -0800

On 2016-12-27 17:19, Oleg Kalnichevski wrote:

On Tue, 2016-12-27 at 10:42 -0500, Jaime Hablutzel Egoavil wrote:

From RFC 3986:



When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set [UCS],
the data should first be encoded as octets according to the UTF-8
character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set *should be* percentencoded.
For example, the character A would be represented as "A",
the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
as "%C3%80", and the character KATAKANA LETTER A would be represented
as "%E3%82%A2".



As you can see it says "should" so it seems to me that it is not an
obligation to percent encode non-ASCII.

A real example where it this problem arises is with Firefox invoking custom
URI handlers, for example, if you have something like this in an HTML page:

<a href="myuri:?foo=b*%C3%A1*r">Invoke myuri handler</a>

The URI handler application will receive

myuri:?foo=bár

Then, during query component parsing HttpClient will fail to parse that
parameter value.


Both HTTP/1.1 and HTTP/2 require message head elements including the
request URI to be ASCII only.

Oleg

The same is true for URIs in general, as can easily be derived from theABNF in RFC 3986.


Best regards, Julian


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Problem parsing non-ASCII in query component

Reply via email to