jlephay created NIFI-14068:
------------------------------
Summary: Proxified SFTP processors resolve hostnames and pass IP
instead of FQDN in HTTP headers
Key: NIFI-14068
URL: https://issues.apache.org/jira/browse/NIFI-14068
Project: Apache NiFi
Issue Type: Bug
Affects Versions: 1.28.1, 2.0.0, 1.28.0, 1.27.0, 1.26.0
Reporter: jlephay
Attachments: image-2024-12-09-12-10-09-441.png
When using HTTP proxy configuration with SFTP processors (ListSFTP for
exemple), the requests arriving to the proxy use the resolved IP instead of the
FQDN in the "Host" header, here is an exemple:
* *"proxified" ListSFTP configuration* :
!image-2024-12-09-12-10-09-441.png|width=383,height=138!
* *proxy log* : 728465600.445 358 172.17.0.1 TCP_TUNNEL/200 1948 CONNECT
194.108.117.16:22 - HIER_DIRECT/194.108.117.16 -
We should have test.rebex.net in this log instead of 194.108.117.16.
As a comparison, SFTP linux client doesn't behave that way and we get as
exepected the FQDN in the host header:
* *command* : sftp -o "ProxyCommand /bin/nc -X connect -x 172.17.0.1:3128 %h
%p" [[email protected]|mailto:[email protected]]
* *proxy log* : 1728506954.780 11365 172.17.0.1 TCP_TUNNEL/200 2068 CONNECT
test.rebex.net:22 - HIER_DIRECT/194.108.117.16 -
After investigations that only occurs when using a proxy without authentication.
When using a proxy server with a username and password, NiFi uses the [Socket
Broker library|https://github.com/exceptionfactory/socketbroker] and in this
case, the HTTP request uses the DNS address without resolving to an IP address
("expected" behaviour).
When not using a username and password, NiFi uses the standard Java Proxy
Socket behavior defined in
[HttpConnectSocketImpl|https://github.com/openjdk/jdk/blob/jdk-21%2B35/src/java.base/share/classes/java/net/HttpConnectSocketImpl.java#L106].
In that case if the DNS address is resolvable, the implementation will use the
resolved IP address for the connection ("wrong" behaviour).
Those behaviours should be aligned and the header should always refelect the
FQDN used in the configuration. If not, it makes it difficult for exemple to
implement access list at proxy level based on destination domain name.
Here is the slack exchange where the issue was investigated for reference :
https://app.slack.com/client/T0L9SDNRZ/C0L9VCD47
--
This message was sent by Atlassian Jira
(v8.20.10#820010)