Daniel Wong created ZOOKEEPER-4235:
--------------------------------------

             Summary: Java Client SendThread does not clean up created objects 
during constructor of SaslClient and Login. 
                 Key: ZOOKEEPER-4235
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4235
             Project: ZooKeeper
          Issue Type: Bug
          Components: java client
            Reporter: Daniel Wong


Hi I am an Apache Phoenix committer and I help manage many many zookeeper 
clusters at my employment primarily using ZK for HBase use cases.  We recently 
had a production incident where some of our ACLs were not setup preventing 
connectivity from the client to the ZK nodes and the failure path exposed 2 
issues to fix. This Jira and ZooKeeper-XXXX (TBD) .  This Jira is the more 
important of the 2 and handles the failure observed in that we had a FD/thread 
leak from the ZK java client send thread.  We had hundreds of threads per JVM 
with the following stack trace.


{code:java}
java.lang.Thread.State: RUNNABLE at 
java.net.PlainSocketImpl.socketConnect([email protected]/Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect([email protected]/AbstractPlainSocketImpl.java:399)
 - locked <0x00000015004fde20> (a java.net.SocksSocketImpl) at 
java.net.AbstractPlainSocketImpl.connectToAddress([email protected]/AbstractPlainSocketImpl.java:242)
 at 
java.net.AbstractPlainSocketImpl.connect([email protected]/AbstractPlainSocketImpl.java:224)
 at 
java.net.SocksSocketImpl.connect([email protected]/SocksSocketImpl.java:403)
 at java.net.Socket.connect([email protected]/Socket.java:609) at 
sun.security.krb5.internal.TCPClient.<init>([email protected]/NetClient.java:62)
 at 
sun.security.krb5.internal.NetClient.getInstance([email protected]/NetClient.java:42)
 at 
sun.security.krb5.KdcComm$KdcCommunication.run([email protected]/KdcComm.java:401)
 at 
sun.security.krb5.KdcComm$KdcCommunication.run([email protected]/KdcComm.java:364)
 at java.security.AccessController.doPrivileged([email protected]/Native 
Method) at 
sun.security.krb5.KdcComm.send([email protected]/KdcComm.java:348)
 at 
sun.security.krb5.KdcComm.sendIfPossible([email protected]/KdcComm.java:253)
 at 
sun.security.krb5.KdcComm.send([email protected]/KdcComm.java:234)
 at 
sun.security.krb5.KdcComm.send([email protected]/KdcComm.java:200)
 at 
sun.security.krb5.KrbAsReqBuilder.send([email protected]/KrbAsReqBuilder.java:326)
 at 
sun.security.krb5.KrbAsReqBuilder.action([email protected]/KrbAsReqBuilder.java:371)
 at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication([email protected]/Krb5LoginModule.java:754)
 at 
com.sun.security.auth.module.Krb5LoginModule.login([email protected]/Krb5LoginModule.java:592)
 at 
javax.security.auth.login.LoginContext.invoke([email protected]/LoginContext.java:726)
 at 
javax.security.auth.login.LoginContext$4.run([email protected]/LoginContext.java:665)
 at 
javax.security.auth.login.LoginContext$4.run([email protected]/LoginContext.java:663)
 at java.security.AccessController.doPrivileged([email protected]/Native 
Method) at 
javax.security.auth.login.LoginContext.invokePriv([email protected]/LoginContext.java:663)
 at 
javax.security.auth.login.LoginContext.login([email protected]/LoginContext.java:574)
 at org.apache.zookeeper.Login.login(Login.java:304) - locked 
<0x000000151c477148> (a org.apache.zookeeper.Login) at 
org.apache.zookeeper.Login.<init>(Login.java:106) at 
org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslClient(ZooKeeperSaslClient.java:249)
 - locked <0x000000151c476f68> (a 
org.apache.zookeeper.client.ZooKeeperSaslClient) at 
org.apache.zookeeper.client.ZooKeeperSaslClient.<init>(ZooKeeperSaslClient.java:141)
 at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:972) at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1031)
{code}
Note that today ZooKeeperSaslClient as well as Login both allocate resources in 
their constructors and thus cannot be cleaned up or interrupted via 
close/shutdown/disconnect of their parents due to still being a null object 
during initialization.  This leaves the thread/sockets at the mercy of the 
configured kdc retry/timeout configuration.

This Jira is intended to break the constructor and the initialization path into 
separate methods and properly clean up the resulting objects.

  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to