[ https://issues.apache.org/jira/browse/ZOOKEEPER-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338280#comment-17338280 ]
Ravi Kishore Valeti edited comment on ZOOKEEPER-4235 at 5/3/21, 9:36 AM: ------------------------------------------------------------------------- [~dbwong], I am picking this up. Can some one please assign this to me?. I can't change the assignee. was (Author: rvaleti): [~dbwong], I am picking this up. Can some assign this to me?. I can't change the assignee. > Java Client SendThread does not clean up created objects during constructor > of SaslClient and Login > --------------------------------------------------------------------------------------------------- > > Key: ZOOKEEPER-4235 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4235 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Reporter: Daniel Wong > Priority: Major > > Hi I am an Apache Phoenix committer and I help manage many many zookeeper > clusters at my employment primarily using ZK for HBase use cases. We > recently had a production incident where some of our ACLs were not setup > preventing connectivity from the client to the ZK nodes and the failure path > exposed 2 issues to fix. This Jira and > https://issues.apache.org/jira/browse/ZOOKEEPER-4236 . This Jira is the more > important of the 2 and handles the failure observed in that we had a > FD/thread leak from the ZK java client send thread. We had hundreds of > threads per JVM with the following stack trace. > {code:java} > java.lang.Thread.State: RUNNABLE at > java.net.PlainSocketImpl.socketConnect(java.base@11.0.4.0.101/Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(java.base@11.0.4.0.101/AbstractPlainSocketImpl.java:399) > - locked <0x00000015004fde20> (a java.net.SocksSocketImpl) at > java.net.AbstractPlainSocketImpl.connectToAddress(java.base@11.0.4.0.101/AbstractPlainSocketImpl.java:242) > at > java.net.AbstractPlainSocketImpl.connect(java.base@11.0.4.0.101/AbstractPlainSocketImpl.java:224) > at > java.net.SocksSocketImpl.connect(java.base@11.0.4.0.101/SocksSocketImpl.java:403) > at java.net.Socket.connect(java.base@11.0.4.0.101/Socket.java:609) at > sun.security.krb5.internal.TCPClient.<init>(java.security.jgss@11.0.4.0.101/NetClient.java:62) > at > sun.security.krb5.internal.NetClient.getInstance(java.security.jgss@11.0.4.0.101/NetClient.java:42) > at > sun.security.krb5.KdcComm$KdcCommunication.run(java.security.jgss@11.0.4.0.101/KdcComm.java:401) > at > sun.security.krb5.KdcComm$KdcCommunication.run(java.security.jgss@11.0.4.0.101/KdcComm.java:364) > at java.security.AccessController.doPrivileged(java.base@11.0.4.0.101/Native > Method) at > sun.security.krb5.KdcComm.send(java.security.jgss@11.0.4.0.101/KdcComm.java:348) > at > sun.security.krb5.KdcComm.sendIfPossible(java.security.jgss@11.0.4.0.101/KdcComm.java:253) > at > sun.security.krb5.KdcComm.send(java.security.jgss@11.0.4.0.101/KdcComm.java:234) > at > sun.security.krb5.KdcComm.send(java.security.jgss@11.0.4.0.101/KdcComm.java:200) > at > sun.security.krb5.KrbAsReqBuilder.send(java.security.jgss@11.0.4.0.101/KrbAsReqBuilder.java:326) > at > sun.security.krb5.KrbAsReqBuilder.action(java.security.jgss@11.0.4.0.101/KrbAsReqBuilder.java:371) > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(jdk.security.auth@11.0.4.0.101/Krb5LoginModule.java:754) > at > com.sun.security.auth.module.Krb5LoginModule.login(jdk.security.auth@11.0.4.0.101/Krb5LoginModule.java:592) > at > javax.security.auth.login.LoginContext.invoke(java.base@11.0.4.0.101/LoginContext.java:726) > at > javax.security.auth.login.LoginContext$4.run(java.base@11.0.4.0.101/LoginContext.java:665) > at > javax.security.auth.login.LoginContext$4.run(java.base@11.0.4.0.101/LoginContext.java:663) > at java.security.AccessController.doPrivileged(java.base@11.0.4.0.101/Native > Method) at > javax.security.auth.login.LoginContext.invokePriv(java.base@11.0.4.0.101/LoginContext.java:663) > at > javax.security.auth.login.LoginContext.login(java.base@11.0.4.0.101/LoginContext.java:574) > at org.apache.zookeeper.Login.login(Login.java:304) - locked > <0x000000151c477148> (a org.apache.zookeeper.Login) at > org.apache.zookeeper.Login.<init>(Login.java:106) at > org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslClient(ZooKeeperSaslClient.java:249) > - locked <0x000000151c476f68> (a > org.apache.zookeeper.client.ZooKeeperSaslClient) at > org.apache.zookeeper.client.ZooKeeperSaslClient.<init>(ZooKeeperSaslClient.java:141) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:972) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1031) > {code} > Note that today ZooKeeperSaslClient as well as Login both allocate resources > in their constructors and thus cannot be cleaned up or interrupted via > close/shutdown/disconnect of their parents due to still being a null object > during initialization. This leaves the thread/sockets at the mercy of the > configured kdc retry/timeout configuration. > This Jira is intended to break the constructor and the initialization path > into separate methods and properly clean up the resulting objects. > -- This message was sent by Atlassian Jira (v8.3.4#803005)