[ https://issues.apache.org/jira/browse/HDFS-14594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872466#comment-16872466 ]
Gabriel MANCHE commented on HDFS-14594: --------------------------------------- To be able to reuse a connection, the chunk data flow must be transfer corresponding to a specific encoding describe at RFC [https://tools.ietf.org/html/rfc2616#page-25]. The last-chunk/trailer/CRLF sequence in _sun.net.www.http.ChunkedInputStream_ will involve a internal state at STATE_DONE(5). In PLAINTEXT connection, it's about a buffer of 7 bytes [ 13, 10, 48, 13, 10, 13, 10 ] (or "\r\n0\r\n\r\n") corresponding to the RFC (with empty trailer) In SSL connection, we have at this place only 2 bytes [ 13, 10 ] (or "\r\n"). The chunk is not ended properly (state stay as STATE_AWAITING_CHUNK_HEADER(1)), so the last process cannot be done and the put in the KeepAliveCache cannot be done neither. In detail : while calling _sun.net.www.http.ChunkedInputStream.closeUnderlying_, in PLAINTEXT we call _this.hc.finished()_ (that does the _putInKeepAliveCache_) In SSL, we call _this.hc.closeServer()_, then byebye keepAlive... That's why we have 6 connections in SSL (same in PLAINTEXT with http.keepAlive=false). It sounds like the server does not send a valid chunk end sequence ! > Replace all Http(s)URLConnection > -------------------------------- > > Key: HDFS-14594 > URL: https://issues.apache.org/jira/browse/HDFS-14594 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs > Affects Versions: 2.7.3 > Environment: HDP 2.6.5 and HDP 2.6.2 > HotSpot 8u192 and 8u92 > Linux Redhat 3.10.0-862.14.4.el7.x86_64 > Reporter: Sebastien Barnoud > Priority: Major > > When authentication is activated there is no keep-alive on http(s) > connections. > That's because the JDK Http(s)URLConnection explicitly closes the connection > after the HTTP 401 that negotiate the authentication. > This lead to poor performance, especially when encryption is on. > To see the issue, simply strace and compare the number of connection between > hdfs implementation and curl: > {code:java} > $ strace -T -tt -f hdfs dfs -ls > swebhdfs://dtltstap009.fr.world.socgen:50470/user 2>&1 | grep > "sin_port=htons(50470)" > [pid 92879] 15:11:47.019865 connect(386, {sa_family=AF_INET, > sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 > EINPROGRESS (Operation now in progress) <0.000157> > [pid 92879] 15:11:47.182110 connect(386, {sa_family=AF_INET, > sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished > ...> > [pid 92879] 15:11:47.387073 connect(386, {sa_family=AF_INET, > sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 > EINPROGRESS (Operation now in progress) <0.000167> > [pid 92879] 15:11:47.429716 connect(386, {sa_family=AF_INET, > sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished > ...> > [pid 93116] 15:11:47.528073 connect(386, {sa_family=AF_INET, > sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 > EINPROGRESS (Operation now in progress) <0.000110> > [pid 93116] 15:11:47.566947 connect(386, {sa_family=AF_INET, > sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished > ...> > => 6 connect{code} > {code:java} > $ strace -T -tt -f curl --negotiate -u: -v > https://dtltstap009.fr.world.socgen:50470/webhdfs/v1/user/?op=GETFILESTATUS > 2>&1 | grep "sin_port=htons(50470)" > 15:10:53.671358 connect(3, {sa_family=AF_INET, sin_port=htons(50470), > sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now > in progress) <0.000118> > 15:10:53.683513 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), > sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009> > 15:10:53.869482 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), > sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009> > 15:10:53.869576 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), > sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000008> > [bash-4.2.46][j:0|h:4961|?:0][2019-06-21 > 15:10:53][dtlprd05@nazare:~/test-hdfs] > => only one connect{code} > > In addition, even without encryption, too many connection are used: > {code:java} > $ strace -T -tt -f hdfs dfs -ls > webhdfs://dtltstap009.fr.world.socgen:50070/user 2>&1 | grep > "sin_port=htons(50070)" > [pid 99569] 15:13:13.838257 connect(386, {sa_family=AF_INET, > sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 > EINPROGRESS (Operation now in progress) <0.000119> > [pid 99569] 15:13:13.904255 connect(386, {sa_family=AF_INET, > sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished > ...> > [pid 99635] 15:13:14.201236 connect(386, {sa_family=AF_INET, > sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished > ...> > => 3 connect{code} > > Looking in the JDK code, > https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/net/www/protocol/http/HttpURLConnection.java > {code:java} > serverAuthentication = getServerAuthentication(srvHdr); > currentServerCredentials = serverAuthentication; > if (serverAuthentication != null) { > disconnectWeb(); > redirects++; // don't let things loop ad nauseum > setCookieHeader(); > continue; > }{code} > disconnectWeb() will close the connection (no keep alive reuse) > Finally we have some unexplained webhdfs command that are stucked in > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375): > -) for hdfs dfs commands with swebhdfs schema > -) for some TEZ job using the same implementation for the shuffle service > when encryption is on > All other services (typically RPC) are working fine on the cluster. > It really seams that Http(s)URLConnection causes some issues that Netty or > HttpClient don't have. > Regards, > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org