[ 
https://issues.apache.org/jira/browse/STORM-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359098#comment-15359098
 ] 

ASF GitHub Bot commented on STORM-1941:
---------------------------------------

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/1534

    STORM-1941 Nimbus discovery can fail when zookeeper reconnect happens. (1.x)

    * delete ephemeral node first when reconnected handler is called
    
    This also deletes node if session is alive but reconnected.
    If we really need to avoid deleting node, we could check ephemeral owner 
before deleting. Zookeeper reconnect is not happening so often, so I guess it's 
fine to not applying. If you think we should, please let me know.
    
    Btw, blobstore also uses ephemeral nodes so I'm curious they should be 
recreated too.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm STORM-1941-1.x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1534
    
----
commit 44ffed3e846597100e8f1c9914811ac1d9907e17
Author: Jungtaek Lim <[email protected]>
Date:   2016-07-01T14:54:15Z

    STORM-1941 Nimbus discovery can fail when zookeeper reconnect happens.
    
    * delete ephemeral node first when reconnected handler is called

----


> Nimbus discovery can fail when zookeeper reconnect happens.
> -----------------------------------------------------------
>
>                 Key: STORM-1941
>                 URL: https://issues.apache.org/jira/browse/STORM-1941
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.0, 1.0.1
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Critical
>
> When zookeeper reconnect happens, nimbus registry can be deleted though 
> nimbus is alive.
> Below is zookeeper node for nimbus registry.
> {code}
> get /storm/nimbuses/<host>:6627
> ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
> ?'h?g?g?g?g
> t-?,[??Q
> cZxid = 0x4000005ae
> ctime = Fri Jul 01 11:43:51 UTC 2016
> mZxid = 0x4000005ae
> mtime = Fri Jul 01 11:43:51 UTC 2016
> pZxid = 0x4000005ae
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x255a62e310c0005
> dataLength = 98
> numChildren = 0
> {code}
> {code}
> get /storm/nimbuses/<host>:6627
> ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
> ?'h?g?g?g?g
> t-?,[??Q
> cZxid = 0x4000005ae
> ctime = Fri Jul 01 11:43:51 UTC 2016
> mZxid = 0x50000000e
> mtime = Fri Jul 01 11:46:08 UTC 2016
> pZxid = 0x4000005ae
> cversion = 0
> dataVersion = 1
> aclVersion = 0
> ephemeralOwner = 0x255a62e310c0005
> dataLength = 98
> numChildren = 0
> {code}
> Below is transaction log for that node.
> {code}
> 7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae 
> create 
> '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10
> 7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e 
> setData 
> '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1
> {code}
> Please take a look at ctime, mtime, and ephemeralOwner.
> Ephemeral owner session was already closed from nimbus side but there's 
> possible for node to be not deleted immediately, so new session doesn't 
> create new node but set the value to ephemeral node for other session which 
> is already closed.
> {code}
> 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client 
> for session: 0x255a62e310c0005
> 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: 
> 0x255a62e310c0005 closed
> {code}
> We can delete the node first and set ephemeral node when reconnect event 
> handler is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to