[ 
https://issues.apache.org/jira/browse/OOZIE-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated OOZIE-2654:
-------------------------------------------

Cherry-picked from master to branch-4.3.


> Zookeeper dependent services should not depend on Connectionstate to be valid 
> before cleaning up
> ------------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-2654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2654
>             Project: Oozie
>          Issue Type: Bug
>          Components: HA
>    Affects Versions: 4.2.0
>            Reporter: Venkat Ranganathan
>            Assignee: Venkat Ranganathan
>             Fix For: 5.0.0b1, 4.3.1
>
>         Attachments: OOZIE-2654.diff
>
>
> Currently in ZKUtils, ZKLocks and ZKJobsConcurrency services, we don't 
> properly teardown the zookeeper connections when the callback was not 
> received from zookeeper to change the connection state.
> We can get into this situation if the ZK session for example was closed by ZK 
> before any callback was received to update the connection state. This can 
> cause the oozie server in a HA mode to not terminate  with one or more 
> sockets in close_wait state.
> Here is an instance of this issue
> From the network connections, we have one connection still on close_wait with 
> indefinite wait.
> {quote} tcp6 143 0 x.x.x.1:46710 x.x.x.2:2181 CLOSE_WAIT 4688/java off 
> (0.00/0/0)
> {quote}
> From the zookeeper logs,
> {quote}
> 016-08-18 20:45:29,921 - INFO 
> NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868 - Client 
> attempting to establish new session at /x.x.x.1:46710 2016-08-18 20:45:29,926 
> - INFO CommitProcessor:1:ZooKeeperServer@617 - Established session 
> 0x1569f576843000e with negotiated timeout 40000 for client /x.x.x.1:46710
> {quote}
> and later
> {quote}
> 2016-08-18 20:46:34,008 - INFO CommitProcessor:1:NIOServerCnxn@1007 - Closed 
> socket connection for client /x.x.x.1:46710 which had sessionid 
> 0x1569f576843000e
> {quote}
> The fix is to not check for the connectionstate during service destroy and  
> teardown the zk connections.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to