[ 
https://issues.apache.org/jira/browse/HDFS-14594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872391#comment-16872391
 ] 

Sebastien Barnoud edited comment on HDFS-14594 at 6/25/19 2:20 PM:
-------------------------------------------------------------------

We do a full step by step debugging session.

We have Kerkeros enabled on all the clusters.

The command is:

hdfs dfs -ls (s)webhdfs://<server>:>port>/users

The command will call 4 URL
{code:java}
URL n°1
sun.net.www.protocol.http.HttpURLConnection:http://dtltstap009.fr.world.socgen:50070/webhdfs/v1/?op=GETDELEGATIONTOKEN&user.name=<the
 user>

URL n°2
sun.net.www.protocol.http.HttpURLConnection:http://dtltstap009.fr.world.socgen:50070/webhdfs/v1/user?op=GETFILESTATUS&delegation=<the
 token>

URL n°3
sun.net.www.protocol.http.HttpURLConnection:http://dtltstap009.fr.world.socgen:50070/webhdfs/v1/user?op=LISTSTATUS&delegation=<the
 token>

URL n°4
sun.net.www.protocol.http.HttpURLConnection:http://dtltstap009.fr.world.socgen:50070/webhdfs/v1/?op=CANCELDELEGATIONTOKEN&user.name=<the
 user>&token=<the token>{code}
 

The sequence with Http statuses is
{code:java}
URL 1 : 1 cnx = 401 + 1 retry = 200
URL 2 : 0 cnx = 200
URL 3 : 0 cnx = 200
URL 4 : 0 cnx = 401 + 1 retry = 200{code}
In plaintext or SSL with keepalive disabled 
(HADOOP_OPTS=-Dhttp.keepAlive=false), we have of course 6 connections.

In plaintext  (webhdfs) with keepalive enabled (the default), we get 3 
connections, that because *+HttpURLConnection explicitly closes+* the 
connection after the 401 in the authentication negotiation.
{code:java}
URL 1 : 1 cnx = 401 + 1 retry = 200
URL 2 : 0 cnx (reuse last) = 200
URL 3 : 0 cnx (reuse last) = 200
URL 4 : 0 cnx (reuse last) = 401 + 1 retry = 200{code}
 

In SSL (swebhdfs) with keepalive enabled, we get 6 connections, so far i didn't 
explain why we never get connection reuse ...

However this show that with this implementation +*we can not have*+ the best 
connection reuse when Kerberos is enabled because of the standard JDK 
implementation choice that closes the connection in the authentication 
negotiation. The strace of curl show that it is perfectly working when the 
connection is reused (the server supports it).

Maybe that the SSLFactory may explain why we have 6 connections, but even if we 
patch that we will still we 3.

So, imo,we should change the implementation. What is your opinion ?

Regards,

 

 


was (Author: sebastien barnoud):
We do a full step by step debugging session.

We have Kerkeros enabled on all the clusters.

The command is:

hdfs dfs -ls /users

The command will call 4 URL
{code:java}
URL n°1
sun.net.www.protocol.http.HttpURLConnection:http://dtltstap009.fr.world.socgen:50070/webhdfs/v1/?op=GETDELEGATIONTOKEN&user.name=<the
 user>

URL n°2
sun.net.www.protocol.http.HttpURLConnection:http://dtltstap009.fr.world.socgen:50070/webhdfs/v1/user?op=GETFILESTATUS&delegation=<the
 token>

URL n°3
sun.net.www.protocol.http.HttpURLConnection:http://dtltstap009.fr.world.socgen:50070/webhdfs/v1/user?op=LISTSTATUS&delegation=<the
 token>

URL n°4
sun.net.www.protocol.http.HttpURLConnection:http://dtltstap009.fr.world.socgen:50070/webhdfs/v1/?op=CANCELDELEGATIONTOKEN&user.name=<the
 user>&token=<the token>{code}
 

The sequence with Http statuses is
{code:java}
URL 1 : 1 cnx = 401 + 1 retry = 200
URL 2 : 0 cnx = 200
URL 3 : 0 cnx = 200
URL 4 : 0 cnx = 401 + 1 retry = 200{code}
In plaintext or SSL with keepalive disabled 
(HADOOP_OPTS=-Dhttp.keepAlive=false), we have of course 6 connections.

In plaintext  (webhdfs) with keepalive enabled (the default), we get 3 
connections, that because *+HttpURLConnection explicitly closes+* the 
connection after the 401 in the authentication negotiation.
{code:java}
URL 1 : 1 cnx = 401 + 1 retry = 200
URL 2 : 0 cnx (reuse last) = 200
URL 3 : 0 cnx (reuse last) = 200
URL 4 : 0 cnx (reuse last) = 401 + 1 retry = 200{code}
 

In SSL (swebhdfs) with keepalive enabled, we get 6 connections, so far i didn't 
explain why we never get connection reuse ...

However this show that with this implementation +*we can not have*+ the best 
connection reuse when Kerberos is enabled because of the standard JDK 
implementation choice that closes the connection in the authentication 
negotiation. The strace of curl show that it is perfectly working when the 
connection is reused (the server supports it).

Maybe that the SSLFactory may explain why we have 6 connections, but even if we 
patch that we will still we 3.

So, imo,we should change the implementation. What is your opinion ?

Regards,

 

 

> Replace all Http(s)URLConnection
> --------------------------------
>
>                 Key: HDFS-14594
>                 URL: https://issues.apache.org/jira/browse/HDFS-14594
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>    Affects Versions: 2.7.3
>         Environment: HDP 2.6.5 and HDP 2.6.2
> HotSpot 8u192 and 8u92
> Linux Redhat 3.10.0-862.14.4.el7.x86_64
>            Reporter: Sebastien Barnoud
>            Priority: Major
>
> When authentication is activated there is no keep-alive on http(s) 
> connections.
> That's because the JDK Http(s)URLConnection explicitly closes the connection 
> after the HTTP 401 that negotiate the authentication.
> This lead to poor performance, especially when encryption is on.
> To see the issue, simply strace and compare the number of connection between 
> hdfs implementation and curl:
> {code:java}
> $ strace -T -tt -f hdfs dfs -ls 
> swebhdfs://dtltstap009.fr.world.socgen:50470/user 2>&1 | grep 
> "sin_port=htons(50470)" 
> [pid 92879] 15:11:47.019865 connect(386, {sa_family=AF_INET, 
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 
> EINPROGRESS (Operation now in progress) <0.000157>
> [pid 92879] 15:11:47.182110 connect(386, {sa_family=AF_INET, 
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished 
> ...>
> [pid 92879] 15:11:47.387073 connect(386, {sa_family=AF_INET, 
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 
> EINPROGRESS (Operation now in progress) <0.000167>
> [pid 92879] 15:11:47.429716 connect(386, {sa_family=AF_INET, 
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished 
> ...>
> [pid 93116] 15:11:47.528073 connect(386, {sa_family=AF_INET, 
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 
> EINPROGRESS (Operation now in progress) <0.000110>
> [pid 93116] 15:11:47.566947 connect(386, {sa_family=AF_INET, 
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished 
> ...>
> => 6 connect{code}
> {code:java}
> $ strace -T -tt -f curl --negotiate -u: -v 
> https://dtltstap009.fr.world.socgen:50470/webhdfs/v1/user/?op=GETFILESTATUS 
> 2>&1 | grep "sin_port=htons(50470)" 
> 15:10:53.671358 connect(3, {sa_family=AF_INET, sin_port=htons(50470), 
> sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now 
> in progress) <0.000118>
> 15:10:53.683513 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), 
> sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
> 15:10:53.869482 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), 
> sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
> 15:10:53.869576 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), 
> sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000008>
> [bash-4.2.46][j:0|h:4961|?:0][2019-06-21 
> 15:10:53][dtlprd05@nazare:~/test-hdfs]
> => only one connect{code}
>  
> In addition, even without encryption, too many connection are used:
> {code:java}
> $ strace -T -tt -f hdfs dfs -ls 
> webhdfs://dtltstap009.fr.world.socgen:50070/user 2>&1 | grep 
> "sin_port=htons(50070)" 
> [pid 99569] 15:13:13.838257 connect(386, {sa_family=AF_INET, 
> sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 
> EINPROGRESS (Operation now in progress) <0.000119>
> [pid 99569] 15:13:13.904255 connect(386, {sa_family=AF_INET, 
> sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished 
> ...>
> [pid 99635] 15:13:14.201236 connect(386, {sa_family=AF_INET, 
> sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished 
> ...>
> => 3 connect{code}
>  
> Looking in the JDK code, 
> https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/net/www/protocol/http/HttpURLConnection.java
> {code:java}
> serverAuthentication = getServerAuthentication(srvHdr);
> currentServerCredentials = serverAuthentication;
> if (serverAuthentication != null) {
>     disconnectWeb();
>     redirects++; // don't let things loop ad nauseum
>     setCookieHeader();
>     continue;
> }{code}
> disconnectWeb() will close the connection (no keep alive reuse)
> Finally we have some unexplained webhdfs command that are stucked in 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375):
> -) for hdfs dfs commands with swebhdfs schema
> -) for some TEZ job using the same implementation for the shuffle service 
> when encryption is on
> All other services (typically RPC) are working fine on the cluster.
> It really seams that Http(s)URLConnection causes some issues that Netty or 
> HttpClient don't have.
> Regards,
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to