On Wed, 2019-07-17 at 15:23 +0600, Denis Malyshkin wrote:
> Hello,
>
> After upgrade to HttpClient version 4.5.8+ we encountered that
> requests
> with Cyrillic characters are broken. Below is the simple test to
> expose the
> issue with HttpClient version 4.5.8:
> ===================================
> public void cyrillicSymbolsExtraTest() throws Exception {
> String urlStr = "http://google.com/кириллица-2019/?q=кириллица-2019
> ";
> URL url = new URL(urlStr);
> HttpUriRequest req = new HttpGet(url.toString());
>
> // Prints "
>
http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
> "
> System.out.println(req.getRequestLine().getUri());
>
> HttpClientContext context = HttpClientContext.create();
> HttpClient client = HttpClients.custom().build();
> HttpResponse resp = client.execute(req, context);
>
> Assert.assertEquals(req.getRequestLine().getUri(), "
> http://google.com" +
> context.getRequest().getRequestLine().getUri());
> // Expected :
>
http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
> // Actual :
>
http://google.com/:8@8;;8F0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
> }
> ===================================
>
> With HttpClient 4.5.7 the test is passed correctly.
>
> Yes, I know that non-ASCII codes aren't allowed in URLs. But I worry
> about
> the next things in the listed above behavior:
>
> 1. req.getRequestLine().getUri() returns the correctly URL-Encoded
> URI, but
> the request is sent to an address with an incorrect path -- "
> http://google.com/:8@8;;8F0-2019/".
>
> 2. If the URL is incorrect it seems very weird to me to send the
> request to
> a broken URL instead of returning an error.
>
> 3. There is an inconsistency between the encoding of the URL path
> part and
> the URL query part -- the path part becomes broken while the query
> part is
> correctly URL-encoded.
>
This is a classic case of "garbage in - garbage out" rule. Please do
not use invalid characters in URI components.
Oleg
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]