[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939511#comment-13939511
 ] 

William Porter commented on HTTPCLIENT-1486:
--------------------------------------------

I am also of the opinion that implementing a more lenient URI parse is not what 
we want.  I think the issue is that the current URI parsing is actually too 
lenient.  The internal java.net.URI representation is unable to parse a host 
and port from that URI, but the org.apache.http.client.utils.URIUtils 
extractHost() has custom parsing logic that leads to this scenario.  Rather 
than parsing a host and port through custom means, maybe an exception should be 
thrown.

This method is already parsing the URI more than once in this code (in 
URIUtils.java extractHost(final URI uri)):

{code}
298            String host = uri.getHost();
299            if (host == null) { // normal parse failed; let's do it ourselves
300                // authority does not seem to care about the valid 
character-set for host names
301                host = uri.getAuthority();
302                if (host != null) {
303                    // Strip off any leading user credentials
304                    final int at = host.indexOf('@');
305                    if (at >= 0) {
306                        if (host.length() > at+1 ) {
307                            host = host.substring(at+1);
308                        } else {
309                            host = null; // @ on its own
310                        }
311                    }
312                    // Extract the port suffix, if present
313                    if (host != null) {
314                        final int colon = host.indexOf(':');
315                        if (colon >= 0) {
316                            final int pos = colon + 1;
317                            int len = 0;
318                            for (int i = pos; i < host.length(); i++) {
319                                if (Character.isDigit(host.charAt(i))) {
320                                    len++;
321                                } else {
322                                    break;
323                                }
324                            }
325                            if (len > 0) {
326                                try {
327                                    port = 
Integer.parseInt(host.substring(pos, pos + len));
328                                } catch (final NumberFormatException ex) {
329                                }
330                            }
331                            host = host.substring(0, colon);
332                        }
333                    }
334                }
335            }
...
{code}

> Quirky Behavior in URIUtils leads to Improper Request Execution
> ---------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1486
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1486
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.3.3
>            Reporter: William Porter
>            Priority: Minor
>
> While executing a HttpUriRequest with a ClosableHttpClient, malformed URIs 
> can lead to HTTP requests being executed for unexpected resources.  The root 
> issue is in the extractHost() method in URIUtils, and is demonstracted by the 
> following example.
> {code:title=Main.java|borderStyle=solid}
> import java.io.IOException;
> import java.net.URI;
> import java.net.URISyntaxException;
> import org.apache.http.HttpHost;
> import org.apache.http.HttpResponse;
> import org.apache.http.client.ClientProtocolException;
> import org.apache.http.client.HttpClient;
> import org.apache.http.client.methods.HttpGet;
> import org.apache.http.client.methods.HttpUriRequest;
> import org.apache.http.client.utils.URIUtils;
> import org.apache.http.impl.client.HttpClientBuilder;
> import org.apache.log4j.BasicConfigurator;
> import org.junit.Assert;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> public class Main {
>       
>       private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>       public static void main(String [] args) {
>               
>               // Set up Log4J logging
>               BasicConfigurator.configure();
>               
>               try {
>                       
>                       // The following is a strange URI string that is 
> possibly a typo that
>                       // doesn't include the / between the authority and the 
> 'intended' path
>                       final String strangeUriString = 
> "http://www.example.com:80somepath/someresource.html";;
>                       // Whereas it doesn't neccesarily seem like strange 
> behavior to resolve the
>                       // host and port as www.example.com and 80 from the 
> authority, it can have unintended
>                       // consequences at higher levels of indirection
>                       Assert.assertEquals(new HttpHost("www.example.com", 
> 80), URIUtils.extractHost(new URI(strangeUriString)));
>                       
>                       // Now we construct a request with the strange URI 
> String
>                       HttpUriRequest request = new HttpGet(strangeUriString);
>                       
>                       // We create a CloseableHttpClient to execute the 
> request
>                       final HttpClientBuilder builder = 
> HttpClientBuilder.create();
>                       HttpClient client = builder.build();
>                       
>                       // Here, the request is executed, but is actually a GET 
> /someresource.html
>                       // on www.example.com:80 since part of the intended 
> path was considered part 
>                       // of the authority by the URI class, but disregarded 
> by URIUtils
>                       final HttpResponse response = client.execute(request);
>                       LOG.info("Response: {}", 
> response.getStatusLine().toString());
>                       
>                       
>               } catch (final URISyntaxException e) {
>                       LOG.error("UriSyntaxException: {}", e.getMessage());
>               } catch (final ClientProtocolException e) {
>                       LOG.error("ClientProtocolException: {}", 
> e.getMessage());
>               } catch (final IOException e) {
>                       LOG.error("IOException: {}", e.getMessage());
>               }
>               
>       }
> }
> {code}
> This bug may be introduced by the fix for 
> https://issues.apache.org/jira/browse/HTTPCLIENT-1166.  It might be 
> advantageous to throw an exception in this case rather than be lenient with 
> the host and port parsing, but further discussion might be merited based on 
> the comments in the aforementioned issue. 
> Here is some debug output to show the request is actually a GET 
> /someresource.html
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "GET 
> /someresource.html HTTP/1.1[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "Host: 
> www.example.com:80[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "Connection: 
> Keep-Alive[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "User-Agent: 
> Apache-HttpClient/4.3.3 (java 1.5)[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "Accept-Encoding: 
> gzip,deflate[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "[\r][\n]"



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to