Nicholas O'Connor created HTTPCLIENT-2363:
---------------------------------------------
Summary: execute(HttpHost, HttpRequest, ResponseHandler) adds port
to Host header while execute(HttpRequest, ResponseHandler) does not
Key: HTTPCLIENT-2363
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2363
Project: HttpComponents HttpClient
Issue Type: Bug
Components: HttpClient (classic)
Affects Versions: 5.4.2, 5.3.1
Reporter: Nicholas O'Connor
I've found what I think is a bug, but could also be expected behavior that's
surprising from the user's perspective.
[https://gist.github.com/Earth-Turtle/c39c5282af1c8a306099e89091fafea9]
Expected behavior: assume we have some URI
{{{}[https://www.example.com/some/path]{}}}. {{HttpClient}} provides overloads
for execute that allow the URI to be split into host and path
components("{{{}[https://www.example.com|https://www.example.com/]{}}}",
"{{{}/some/path{}}}"), or provided all in the same {{HttpRequest}} (where
{{{}request.getAuthority({}}}) is
"[{{https://example.com}}|https://example.com/]" and {{request.getUri()}} is
"/some/path"). Using either of these two methods provides the exact same result.
Actual behavior: {{execute(HttpHost, HttpRequest, ResponseHandler)}} sets the
Host header to be [{{www.example.com:443}}|http://www.example.com:443/], while
{{execute(HttpRequest, ResponseHandler)}} sets it to
[{{www.example.com}}|http://www.example.com/].
Normally, this behavior has no effect. In fact,
[https://echo.free.beeceptor.com|https://echo.free.beeceptor.com/] will strip
the port in the Host header when echoing back the headers in a request.
However, I've recently come across a server that rejected some requests with
"Invalid host header, this site must be accessed as
[https://www.example.com|https://www.example.com/]". Investigation revealed
that it rejected requests where the port was included in the Host header, and
would only accept requests where a port was not defined.
This behavior is not defined by the HTTP spec; the port number is not required
in the Host header sent by the client, nor is the server obligated to respect
the host portion without the port. This case feels like an outlier from usual
behavior; however, this hidden behavior from {{HttpClient}} was unexpected.
It appears that this happens when {{{}ProtocolExec{}}},
{{{}AsyncProtocolExec{}}}, and {{MinimalHttpClient}} are filling in the
authority and scheme for a request if it didn't have one to begin with. Because
they fill from the {{{}HttpRoute{}}}'s target {{{}HttpHost{}}}, this host also
contains port information (usually scheme-default) when it is set as the
request's authority.
This bug is very easily worked around by simply setting the requests authority
from the target before calling execute, but it still seems unusual. Was this
behavior intended?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]