I've found what I think is a bug, but could also be expected behavior that's 
surprising from the user's perspective.
https://gist.github.com/Earth-Turtle/c39c5282af1c8a306099e89091fafea9
Expected behavior: assume we have some URI https://www.example.com/some/path. 
HttpClient provides overloads for execute that allow the URI to be split into 
host and path components("https://www.example.com";, "/some/path"), or provided 
all in the same HttpRequest (where request.getAuthority() is 
"https://example.com"; and request.getUri() is "/some/path"). Using either of 
these two methods provides the exact same result.

Actual behavior: execute(HttpHost, HttpRequest, ResponseHandler) sets the Host 
header to be www.example.com:443, while execute(HttpRequest, ResponseHandler) 
sets it to www.example.com.

Normally, this behavior has no effect. In fact, https://echo.free.beeceptor.com 
will strip the port in the Host header when echoing back the headers in a 
request. However, I've recently come across a server that rejected some 
requests with "Invalid host header, this site must be accessed as 
https://www.example.com";. Investigation revealed that it rejected requests 
where the port was included in the Host header, and would only accept requests 
where a port was not defined.

This behavior is not defined by the HTTP spec; the port number is not required 
in the Host header sent by the client, nor is the server obligated to respect 
the host portion without the port. This case feels like an outlier from usual 
behavior; however, this hidden behavior from HttpClient was unexpected.

It appears that this happens when ProtocolExec, AsyncProtocolExec, and 
MinimalHttpClient are filling in the authority and scheme for a request if it 
didn't have one to begin with. Because they fill from the HttpRoute's target 
HttpHost, this host also contains port information (usually scheme-default) 
when it is set as the request's authority.

This bug is very easily worked around by simply setting the requests authority 
from the target before calling execute, but it still seems unusual. Was this 
behavior intended?

Data Protection and Privacy: Your privacy is important to us. We are committed 
to protecting your personal information and handling it responsibly. For more 
information on how we process and protect your data, please review our Privacy 
Policy (https://adaptiva.com/privacy).  Confidentiality Notice: Information in 
this email and any attachments is confidential and intended solely for the use 
of the individual(s) to whom it is addressed. Any views or opinions presented 
are solely those of the author and do not necessarily represent those of 
Adaptiva. The recipient should check for the presence of viruses, as Adaptiva 
accepts no liability for any virus transmitted by this email.

Reply via email to