[ https://issues.apache.org/jira/browse/OOZIE-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Venkat Ranganathan updated OOZIE-2654: -------------------------------------- Attachment: OOZIE-2654.diff > Zookeeper dependent services should not depend on Connectionstate to be valid > before cleaning up > ------------------------------------------------------------------------------------------------ > > Key: OOZIE-2654 > URL: https://issues.apache.org/jira/browse/OOZIE-2654 > Project: Oozie > Issue Type: Bug > Components: HA > Affects Versions: 4.2.0 > Reporter: Venkat Ranganathan > Assignee: Venkat Ranganathan > Attachments: OOZIE-2654.diff > > > Currently in ZKUtils, ZKLocks and ZKJobsConcurrency services, we don't > properly teardown the zookeeper connections when the callback was not > received from zookeeper to change the connection state. > We can get into this situation if the ZK session for example was closed by ZK > before any callback was received to update the connection state. This can > cause the oozie server in a HA mode to not terminate with one or more > sockets in close_wait state. > Here is an instance of this issue > From the network connections, we have one connection still on close_wait with > indefinite wait. > {quote} tcp6 143 0 x.x.x.1:46710 x.x.x.2:2181 CLOSE_WAIT 4688/java off > (0.00/0/0) > {quote} > From the zookeeper logs, > {quote} > 016-08-18 20:45:29,921 - INFO > NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868 - Client > attempting to establish new session at /x.x.x.1:46710 2016-08-18 20:45:29,926 > - INFO CommitProcessor:1:ZooKeeperServer@617 - Established session > 0x1569f576843000e with negotiated timeout 40000 for client /x.x.x.1:46710 > {quote} > and later > {quote} > 2016-08-18 20:46:34,008 - INFO CommitProcessor:1:NIOServerCnxn@1007 - Closed > socket connection for client /x.x.x.1:46710 which had sessionid > 0x1569f576843000e > {quote} > The fix is to not check for the connectionstate during service destroy and > teardown the zk connections. -- This message was sent by Atlassian JIRA (v6.3.4#6332)