jlephay created NIFI-14068:
------------------------------

             Summary: Proxified SFTP processors resolve hostnames and pass IP 
instead of FQDN in HTTP headers
                 Key: NIFI-14068
                 URL: https://issues.apache.org/jira/browse/NIFI-14068
             Project: Apache NiFi
          Issue Type: Bug
    Affects Versions: 1.28.1, 2.0.0, 1.28.0, 1.27.0, 1.26.0
            Reporter: jlephay
         Attachments: image-2024-12-09-12-10-09-441.png

When using HTTP proxy configuration with SFTP processors (ListSFTP for 
exemple), the requests arriving to the proxy use the resolved IP instead of the 
FQDN in the "Host" header, here is an exemple:
 * *"proxified" ListSFTP configuration* :  
!image-2024-12-09-12-10-09-441.png|width=383,height=138!
 
 * *proxy log* : 728465600.445    358 172.17.0.1 TCP_TUNNEL/200 1948 CONNECT 
194.108.117.16:22 - HIER_DIRECT/194.108.117.16 -

We should have test.rebex.net in this log instead of 194.108.117.16.

As a comparison, SFTP linux client doesn't behave that way and we get as 
exepected the FQDN in the host header:
 * *command* : sftp -o "ProxyCommand /bin/nc -X connect -x 172.17.0.1:3128 %h 
%p" [[email protected]|mailto:[email protected]]
 * *proxy log* : 1728506954.780  11365 172.17.0.1 TCP_TUNNEL/200 2068 CONNECT 
test.rebex.net:22 - HIER_DIRECT/194.108.117.16 -

After investigations that only occurs when using a proxy without authentication.

When using a proxy server with a username and password, NiFi uses the [Socket 
Broker library|https://github.com/exceptionfactory/socketbroker] and in this 
case, the HTTP request uses the DNS address without resolving to an IP address 
("expected" behaviour).

When not using a username and password, NiFi uses the standard Java Proxy 
Socket behavior defined in 
[HttpConnectSocketImpl|https://github.com/openjdk/jdk/blob/jdk-21%2B35/src/java.base/share/classes/java/net/HttpConnectSocketImpl.java#L106].
 In that case if the DNS address is resolvable, the implementation will use the 
resolved IP address for the connection ("wrong" behaviour).

Those behaviours should be aligned and the header should always refelect the 
FQDN used in the configuration. If not, it makes it difficult for exemple to 
implement access list at proxy level based on destination domain name.
  
Here is the slack exchange where the issue was investigated for reference : 
https://app.slack.com/client/T0L9SDNRZ/C0L9VCD47



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to