[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xavier BOURGOUIN updated HTTPCLIENT-2341:
-----------------------------------------
    Description: 
When an HTTP response has an URI in the Location header with percent-encoded 
reserved chars (such as %40), these chars are replaced by their normalized 
equivalent (which is "@" in the case of %40), which seems to contradict RFC 
3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the 
sense that for such reserved characters, their percent-encoded value doesn't 
have the same semantic meaning and thus aren't to be interpreted as equivalent.

One of the impacts is that it breaks any server / API that redirect clients to 
a S3 blob object (AWS S3 for instance) that would happen to contain a %40 in 
the URI path (ex: location: https://<endpoint>/<some blob 
container>/foo%40bar.file)

Disabling URI normalization as show below seems to workaround it:
{code:java}
new 
HttpGet("http://service-that-redirects";).setConfig(RequestConfig.custom().setNormalizeUri(false).build())
 {code}
However I'm not sure that's satisfying, if, as we suspect above, it is just 
always wrong to "normalize" those reserved characters (plus it is enabled by 
default).

Note that httpclient5 is fine (the percent-encoded %40 is preserved as it 
should, and it seems there's no more toggle for the normalization behavior).
 
This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was 
discussing something very similar, except it was the other way around: some 
reserved characters were replaced by their percent-encoded equivalent. However 
in the the lengthy comment thread there, it seems a consensus was finally reach 
that for such chars, their percent-encoded value aren't equivalent to their 
original value and thus shouldn't be transformed. So I believe that reasoning 
should be bijective, and should also apply to the case reported here.

I worked out a reproducer in the form of a little maven project that I'm 
attaching to this ticket, inspired from the one of that other ticket, that demo 
the issue for httpclient 4.5.14 (but probably all 4.x is the same), and 
compares it with httpclient5 (5.3.1). It should run directly with _mvn 
exec:java_ and hopefully the output and code content are clear enough to be 
self-explanatory.

 
In essence what it does is :
 * Start a dummy http server with two services: */foo* that redirect to 
*/foo%40bar* and one that listen on *foo@bar* and reply with HTTP 200.

 * Test httpclient4 (along with some other clients to demonstrate the 
differences in behavior) by sending some GET request toward '/foo' and observe 
if and how it follows the redirect toward 'foo@bar', which thus allows to 
observe whether *%40* was replaced by *@*

 
{code:java}
// Dummy server
public static void main(String[] args) throws IOException, InterruptedException 
{
        HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0);
        server.createContext("/foo", new RedirectHttpHandler());
        server.createContext("/foo@bar", new SuccessHttpHandler());
        server.setExecutor(null);
        server.start();
        server.stop(0);
       
       // [... test client requets]
}

public static class RedirectHttpHandler implements HttpHandler {
        @Override
        public void handle(HttpExchange t) throws IOException {
            t.getResponseHeaders().add("Location", "/foo%40bar");
            t.sendResponseHeaders(302, 0);
            OutputStream os = t.getResponseBody();
            os.close();
        }
    }    
    
    public static class SuccessHttpHandler implements HttpHandler {
        @Override
        public void handle(HttpExchange t) throws IOException {
            System.out.println("[server] Received GET with URI: " + 
t.getRequestURI().toString());
            String response = "You followed the redirect!";
            t.sendResponseHeaders(200, response.length());
            OutputStream os = t.getResponseBody();
            os.write(response.getBytes());
            os.close();
        }
    }
{code}
And httpclient4 test like this:
{code:java}
Unable to find source-code formatter for language: java. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
yamlCloseableHttpClient client = HttpClients.createDefault();

HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo";);

CloseableHttpResponse response = client.execute(httpget);

if (response.getStatusLine().getStatusCode() == 302) {
    System.out.println("-> Location header: " + 
response.getFirstHeader("Location").getValue());
} else if (response.getStatusLine().getStatusCode() == 200) {
    System.out.println("-> Followed the redirect!");
} else {
    throw new RuntimeException("Unexpected response code: " + 
response.getStatusLine().getStatusCode());
}   
{code}
 

 

 

 

  was:
When an HTTP response has an URI in the Location header with percent-encoded 
reserved chars (such as %40), these chars are replaced by their normalized 
equivalent (which is "@" in the case of %40), which seems to contradict RFC 
3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the 
sense that for such reserved characters, their percent-encoded value doesn't 
have the same semantic meaning and thus aren't to be interpreted as equivalent.

One of the impacts is that it breaks any server / API that redirect clients to 
a S3 blob object (AWS S3 for instance) that would happen to contain a %40 in 
the URI path (ex: location: https://<endpoint>/<some blob 
container>/foo%40bar.file)

Disabling URI normalization as show below seems to workaround it:
{code:java}
new 
HttpGet("http://service-that-redirects";).setConfig(RequestConfig.custom().setNormalizeUri(false).build())
 {code}
However I'm not sure that's satisfying, if, as we suspect above, it is just 
always wrong to "normalize" those reserved characters (plus it is enabled by 
default).

Note that httpclient5 is fine (the percent-encoded %40 is preserved as it 
should, and it seems there's no more toggle for the normalization behavior).
 
This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was 
discussing something very similar, except it was the other way around: some 
reserved characters were replaced by their percent-encoded equivalent. However 
in the the lengthy comment thread there, it seems a consensus was finally reach 
that for such chars, their percent-encoded value aren't equivalent to their 
original value and thus shouldn't be transformed. So I believe that reasoning 
should be bijective, and should also apply to the case reported here.

I worked out a reproducer in the form of a little maven project that I'm 
attaching to this ticket, inspired from the one of that other ticket, that demo 
the issue for httpclient 4.5.14 (but probably all 4.x is the same), and 
compares it with httpclient5 (5.3.1). It should run directly with _mvn 
exec:java_ and hopefully the output and code content are clear enough to be 
self-explanatory.

 
In essence what it does is :
 * Start a dummy http server with two services: '{*}/foo{*}' that redirect to 
'{*}/foo%40bar{*}' and one that listen on '{*}foo@bar{*}'
 * Test httpclient4 (along with some other clients to demonstrate the 
differences in behavior) by sending some GET request toward '/foo' and observe 
if and how it follows the redirect toward 'foo@bar', which thus allows to 
observe whether *%40* was replaced by *@*

 
{code:java}
// Dummy server
public static void main(String[] args) throws IOException, InterruptedException 
{
        HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0);
        server.createContext("/foo", new RedirectHttpHandler());
        server.createContext("/foo@bar", new SuccessHttpHandler());
        server.setExecutor(null);
        server.start();
        server.stop(0);
       
       // [... test client requets]
}

public static class RedirectHttpHandler implements HttpHandler {
        @Override
        public void handle(HttpExchange t) throws IOException {
            t.getResponseHeaders().add("Location", "/foo%40bar");
            t.sendResponseHeaders(302, 0);
            OutputStream os = t.getResponseBody();
            os.close();
        }
    }    
    
    public static class SuccessHttpHandler implements HttpHandler {
        @Override
        public void handle(HttpExchange t) throws IOException {
            System.out.println("[server] Received GET with URI: " + 
t.getRequestURI().toString());
            String response = "You followed the redirect!";
            t.sendResponseHeaders(200, response.length());
            OutputStream os = t.getResponseBody();
            os.write(response.getBytes());
            os.close();
        }
    }
{code}
And httpclient4 test like this:
{code:java}
Unable to find source-code formatter for language: java. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
yamlCloseableHttpClient client = HttpClients.createDefault();

HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo";);

CloseableHttpResponse response = client.execute(httpget);

if (response.getStatusLine().getStatusCode() == 302) {
    System.out.println("-> Location header: " + 
response.getFirstHeader("Location").getValue());
} else if (response.getStatusLine().getStatusCode() == 200) {
    System.out.println("-> Followed the redirect!");
} else {
    throw new RuntimeException("Unexpected response code: " + 
response.getStatusLine().getStatusCode());
}   
{code}
 

 

 

 


> DefaultRedirect strategy breaks reserved chars in URI path
> ----------------------------------------------------------
>
>                 Key: HTTPCLIENT-2341
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2341
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (classic)
>    Affects Versions: 4.5.14
>         Environment: httpclient4 (4.5.14)
> Linux/Ubuntu 22.04
>            Reporter: Xavier BOURGOUIN
>            Priority: Major
>         Attachments: hc4normalize.tar.gz
>
>
> When an HTTP response has an URI in the Location header with percent-encoded 
> reserved chars (such as %40), these chars are replaced by their normalized 
> equivalent (which is "@" in the case of %40), which seems to contradict RFC 
> 3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the 
> sense that for such reserved characters, their percent-encoded value doesn't 
> have the same semantic meaning and thus aren't to be interpreted as 
> equivalent.
> One of the impacts is that it breaks any server / API that redirect clients 
> to a S3 blob object (AWS S3 for instance) that would happen to contain a %40 
> in the URI path (ex: location: https://<endpoint>/<some blob 
> container>/foo%40bar.file)
> Disabling URI normalization as show below seems to workaround it:
> {code:java}
> new 
> HttpGet("http://service-that-redirects";).setConfig(RequestConfig.custom().setNormalizeUri(false).build())
>  {code}
> However I'm not sure that's satisfying, if, as we suspect above, it is just 
> always wrong to "normalize" those reserved characters (plus it is enabled by 
> default).
> Note that httpclient5 is fine (the percent-encoded %40 is preserved as it 
> should, and it seems there's no more toggle for the normalization behavior).
>  
> This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was 
> discussing something very similar, except it was the other way around: some 
> reserved characters were replaced by their percent-encoded equivalent. 
> However in the the lengthy comment thread there, it seems a consensus was 
> finally reach that for such chars, their percent-encoded value aren't 
> equivalent to their original value and thus shouldn't be transformed. So I 
> believe that reasoning should be bijective, and should also apply to the case 
> reported here.
> I worked out a reproducer in the form of a little maven project that I'm 
> attaching to this ticket, inspired from the one of that other ticket, that 
> demo the issue for httpclient 4.5.14 (but probably all 4.x is the same), and 
> compares it with httpclient5 (5.3.1). It should run directly with _mvn 
> exec:java_ and hopefully the output and code content are clear enough to be 
> self-explanatory.
>  
> In essence what it does is :
>  * Start a dummy http server with two services: */foo* that redirect to 
> */foo%40bar* and one that listen on *foo@bar* and reply with HTTP 200.
>  * Test httpclient4 (along with some other clients to demonstrate the 
> differences in behavior) by sending some GET request toward '/foo' and 
> observe if and how it follows the redirect toward 'foo@bar', which thus 
> allows to observe whether *%40* was replaced by *@*
>  
> {code:java}
> // Dummy server
> public static void main(String[] args) throws IOException, 
> InterruptedException {
>         HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0);
>         server.createContext("/foo", new RedirectHttpHandler());
>         server.createContext("/foo@bar", new SuccessHttpHandler());
>         server.setExecutor(null);
>         server.start();
>         server.stop(0);
>        
>        // [... test client requets]
> }
> public static class RedirectHttpHandler implements HttpHandler {
>         @Override
>         public void handle(HttpExchange t) throws IOException {
>             t.getResponseHeaders().add("Location", "/foo%40bar");
>             t.sendResponseHeaders(302, 0);
>             OutputStream os = t.getResponseBody();
>             os.close();
>         }
>     }    
>     
>     public static class SuccessHttpHandler implements HttpHandler {
>         @Override
>         public void handle(HttpExchange t) throws IOException {
>             System.out.println("[server] Received GET with URI: " + 
> t.getRequestURI().toString());
>             String response = "You followed the redirect!";
>             t.sendResponseHeaders(200, response.length());
>             OutputStream os = t.getResponseBody();
>             os.write(response.getBytes());
>             os.close();
>         }
>     }
> {code}
> And httpclient4 test like this:
> {code:java}
> Unable to find source-code formatter for language: java. Available languages 
> are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
> groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, 
> perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
> yamlCloseableHttpClient client = HttpClients.createDefault();
> HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo";);
> CloseableHttpResponse response = client.execute(httpget);
> if (response.getStatusLine().getStatusCode() == 302) {
>     System.out.println("-> Location header: " + 
> response.getFirstHeader("Location").getValue());
> } else if (response.getStatusLine().getStatusCode() == 200) {
>     System.out.println("-> Followed the redirect!");
> } else {
>     throw new RuntimeException("Unexpected response code: " + 
> response.getStatusLine().getStatusCode());
> }   
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to