Venkat Ranganathan created OOZIE-2654:
-----------------------------------------

             Summary: Zookeeper dependent services should not depend on 
Connectionstate to be valid before cleaning up
                 Key: OOZIE-2654
                 URL: https://issues.apache.org/jira/browse/OOZIE-2654
             Project: Oozie
          Issue Type: Bug
          Components: HA
    Affects Versions: 4.2.0
            Reporter: Venkat Ranganathan
            Assignee: Venkat Ranganathan


Currently in ZKUtils, ZKLocks and ZKJobsConcurrency services, we don't properly 
teardown the zookeeper connections when the callback was not received from 
zookeeper to change the connection state.

We can get into this situation if the ZK session for example was closed by ZK 
before any callback was received to update the connection state. This can cause 
the oozie server in a HA mode to not terminate  with one or more sockets in 
close_wait state.

Here is an instance of this issue

>From the network connections, we have one connection still on close_wait with 
>indefinite wait.
{quote} tcp6 143 0 x.x.x.1:46710 x.x.x.2:2181 CLOSE_WAIT 4688/java off 
(0.00/0/0)
{quote}

>From the zookeeper logs,
{quote}
016-08-18 20:45:29,921 - INFO 
NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868 - Client 
attempting to establish new session at /x.x.x.1:46710 2016-08-18 20:45:29,926 - 
INFO CommitProcessor:1:ZooKeeperServer@617 - Established session 
0x1569f576843000e with negotiated timeout 40000 for client /x.x.x.1:46710
{quote}
and later
{quote}
2016-08-18 20:46:34,008 - INFO CommitProcessor:1:NIOServerCnxn@1007 - Closed 
socket connection for client /x.x.x.1:46710 which had sessionid 
0x1569f576843000e
{quote}
The fix is to not check for the connectionstate during service destroy and  
teardown the zk connections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to