[ 
https://issues.apache.org/jira/browse/STORM-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360845#comment-15360845
 ] 

ASF GitHub Bot commented on STORM-1941:
---------------------------------------

Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1535#discussion_r69409241
  
    --- Diff: 
storm-core/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java ---
    @@ -219,6 +219,8 @@ public void stateChanged(CuratorFramework 
curatorFramework, ConnectionState conn
                     LOG.info("Connection state listener invoked, zookeeper 
connection state has changed to {}", connectionState);
                     if (connectionState.equals(ConnectionState.RECONNECTED)) {
                         LOG.info("Connection state has changed to reconnected 
so setting nimbuses entry one more time");
    +                    // explicit delete for ephmeral node to ensure this 
session creates the entry.
    +                    
stateStorage.delete_node(ClusterUtils.nimbusPath(nimbusId));
    --- End diff --
    
    @harshach 
    This is for edge case for ephemeral node not deleted while session is 
closed.
    Please refer http://issues.apache.org/jira/browse/STORM-1941.
    
    I know it's odd but it is happening, and other article states this behavior.
    https://www.box.com/blog/a-gotcha-when-using-zookeeper-ephemeral-nodes/


> Nimbus discovery can fail when zookeeper reconnect happens.
> -----------------------------------------------------------
>
>                 Key: STORM-1941
>                 URL: https://issues.apache.org/jira/browse/STORM-1941
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.0, 1.0.1
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Critical
>
> When zookeeper reconnect happens, nimbus registry can be deleted though 
> nimbus is alive.
> Below is zookeeper node for nimbus registry.
> {code}
> get /storm/nimbuses/<host>:6627
> ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
> ?'h?g?g?g?g
> t-?,[??Q
> cZxid = 0x4000005ae
> ctime = Fri Jul 01 11:43:51 UTC 2016
> mZxid = 0x4000005ae
> mtime = Fri Jul 01 11:43:51 UTC 2016
> pZxid = 0x4000005ae
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x255a62e310c0005
> dataLength = 98
> numChildren = 0
> {code}
> {code}
> get /storm/nimbuses/<host>:6627
> ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
> ?'h?g?g?g?g
> t-?,[??Q
> cZxid = 0x4000005ae
> ctime = Fri Jul 01 11:43:51 UTC 2016
> mZxid = 0x50000000e
> mtime = Fri Jul 01 11:46:08 UTC 2016
> pZxid = 0x4000005ae
> cversion = 0
> dataVersion = 1
> aclVersion = 0
> ephemeralOwner = 0x255a62e310c0005
> dataLength = 98
> numChildren = 0
> {code}
> Below is transaction log for that node.
> {code}
> 7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae 
> create 
> '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10
> 7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e 
> setData 
> '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1
> {code}
> Please take a look at ctime, mtime, and ephemeralOwner.
> Ephemeral owner session was already closed from nimbus side but there's 
> possible for node to be not deleted immediately, so new session doesn't 
> create new node but set the value to ephemeral node for other session which 
> is already closed.
> {code}
> 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client 
> for session: 0x255a62e310c0005
> 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: 
> 0x255a62e310c0005 closed
> {code}
> We can delete the node first and set ephemeral node when reconnect event 
> handler is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to