[
https://issues.apache.org/jira/browse/STORM-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359099#comment-15359099
]
ASF GitHub Bot commented on STORM-1941:
---------------------------------------
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/storm/pull/1535
STORM-1941 Nimbus discovery can fail when zookeeper reconnect happens.
* delete ephemeral node first when reconnected handler is called
PR for 1.x: #1534
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HeartSaVioR/storm STORM-1941
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1535.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1535
----
commit 5b19cf0ebdd2aef627aae1ee0edfd09574d55ca6
Author: Jungtaek Lim <[email protected]>
Date: 2016-07-01T14:58:05Z
STORM-1941 Nimbus discovery can fail when zookeeper reconnect happens.
* delete ephemeral node first when reconnected handler is called
----
> Nimbus discovery can fail when zookeeper reconnect happens.
> -----------------------------------------------------------
>
> Key: STORM-1941
> URL: https://issues.apache.org/jira/browse/STORM-1941
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 1.0.0, 1.0.1
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
> Priority: Critical
>
> When zookeeper reconnect happens, nimbus registry can be deleted though
> nimbus is alive.
> Below is zookeeper node for nimbus registry.
> {code}
> get /storm/nimbuses/<host>:6627
> ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
> ?'h?g?g?g?g
> t-?,[??Q
> cZxid = 0x4000005ae
> ctime = Fri Jul 01 11:43:51 UTC 2016
> mZxid = 0x4000005ae
> mtime = Fri Jul 01 11:43:51 UTC 2016
> pZxid = 0x4000005ae
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x255a62e310c0005
> dataLength = 98
> numChildren = 0
> {code}
> {code}
> get /storm/nimbuses/<host>:6627
> ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
> ?'h?g?g?g?g
> t-?,[??Q
> cZxid = 0x4000005ae
> ctime = Fri Jul 01 11:43:51 UTC 2016
> mZxid = 0x50000000e
> mtime = Fri Jul 01 11:46:08 UTC 2016
> pZxid = 0x4000005ae
> cversion = 0
> dataVersion = 1
> aclVersion = 0
> ephemeralOwner = 0x255a62e310c0005
> dataLength = 98
> numChildren = 0
> {code}
> Below is transaction log for that node.
> {code}
> 7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae
> create
> '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10
> 7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e
> setData
> '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1
> {code}
> Please take a look at ctime, mtime, and ephemeralOwner.
> Ephemeral owner session was already closed from nimbus side but there's
> possible for node to be not deleted immediately, so new session doesn't
> create new node but set the value to ephemeral node for other session which
> is already closed.
> {code}
> 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client
> for session: 0x255a62e310c0005
> 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
> 0x255a62e310c0005 closed
> {code}
> We can delete the node first and set ephemeral node when reconnect event
> handler is called.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)