>From RFC 3986: > > > When a new URI scheme defines a component that represents textual > data consisting of characters from the Universal Character Set [UCS], > the data should first be encoded as octets according to the UTF-8 > character encoding [STD63]; then only those octets that do not > correspond to characters in the unreserved set *should be* percentencoded. > For example, the character A would be represented as "A", > the character LATIN CAPITAL LETTER A WITH GRAVE would be represented > as "%C3%80", and the character KATAKANA LETTER A would be represented > as "%E3%82%A2".
As you can see it says "should" so it seems to me that it is not an obligation to percent encode non-ASCII. A real example where it this problem arises is with Firefox invoking custom URI handlers, for example, if you have something like this in an HTML page: <a href="myuri:?foo=b*%C3%A1*r">Invoke myuri handler</a> The URI handler application will receive myuri:?foo=bár Then, during query component parsing HttpClient will fail to parse that parameter value. On Tue, Dec 27, 2016 at 10:11 AM, Oleg Kalnichevski <[email protected]> wrote: > On Sat, 2016-12-24 at 18:26 -0500, Jaime Hablutzel Egoavil wrote: > > Currently something like this: > > > > public class ProblemWithNonAscii { > > public static void main(String[] args) { > > List<NameValuePair> pairs = URLEncodedUtils.parse("foo=bár", > > StandardCharsets.UTF_8); > > System.out.println(pairs); > > } > > } > > > > produces this output: > > > > [foo=b�r] > > > > Where the 'á' character has been scrambled. > > > > I can see that this is related to the following narrowing primitive > > conversion, > > https://github.com/apache/httpclient/blob/4.5.2/ > httpclient/src/main/java/org/apache/http/client/utils/ > URLEncodedUtils.java#L570 > > . > > > > Is this a bug isn't it?. > > > > Jaime, > > URL encoded content is not supposed to have non-ASCII characters in the > first place, is it not? > > Oleg > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Jaime Hablutzel - RPC 994690880
