Hello,
After upgrade to HttpClient version 4.5.8+ we encountered that requests
with Cyrillic characters are broken. Below is the simple test to expose the
issue with HttpClient version 4.5.8:
===================================
public void cyrillicSymbolsExtraTest() throws Exception {
String urlStr = "http://google.com/кириллица-2019/?q=кириллица-2019";
URL url = new URL(urlStr);
HttpUriRequest req = new HttpGet(url.toString());
// Prints "
http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
"
System.out.println(req.getRequestLine().getUri());
HttpClientContext context = HttpClientContext.create();
HttpClient client = HttpClients.custom().build();
HttpResponse resp = client.execute(req, context);
Assert.assertEquals(req.getRequestLine().getUri(), "http://google.com" +
context.getRequest().getRequestLine().getUri());
// Expected :
http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
// Actual :
http://google.com/:8@8;;8F0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
}
===================================
With HttpClient 4.5.7 the test is passed correctly.
Yes, I know that non-ASCII codes aren't allowed in URLs. But I worry about
the next things in the listed above behavior:
1. req.getRequestLine().getUri() returns the correctly URL-Encoded URI, but
the request is sent to an address with an incorrect path -- "
http://google.com/:8@8;;8F0-2019/".
2. If the URL is incorrect it seems very weird to me to send the request to
a broken URL instead of returning an error.
3. There is an inconsistency between the encoding of the URL path part and
the URL query part -- the path part becomes broken while the query part is
correctly URL-encoded.
Summarizing above it looks to me like a bug.
Thank you,
Denis Malyshkin.