Jungtaek Lim created STORM-1941:
-----------------------------------

             Summary: Nimbus discovery can fail when zookeeper reconnect 
happens.
                 Key: STORM-1941
                 URL: https://issues.apache.org/jira/browse/STORM-1941
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
    Affects Versions: 1.0.0, 1.0.1
            Reporter: Jungtaek Lim
            Assignee: Jungtaek Lim


When zookeeper reconnect happens, nimbus registry can be deleted though nimbus 
is alive.

Below is zookeeper node for nimbus registry.

{code}
get /storm/nimbuses/<host>:6627
?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
?'h?g?g?g?g
t-?,[??Q
cZxid = 0x4000005ae
ctime = Fri Jul 01 11:43:51 UTC 2016
mZxid = 0x4000005ae
mtime = Fri Jul 01 11:43:51 UTC 2016
pZxid = 0x4000005ae
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x255a62e310c0005
dataLength = 98
numChildren = 0
{code}

{code}
get /storm/nimbuses/<host>:6627
?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
?'h?g?g?g?g
t-?,[??Q
cZxid = 0x4000005ae
ctime = Fri Jul 01 11:43:51 UTC 2016
mZxid = 0x50000000e
mtime = Fri Jul 01 11:46:08 UTC 2016
pZxid = 0x4000005ae
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x255a62e310c0005
dataLength = 98
numChildren = 0
{code}

Below is transaction log for that node.
{code}
7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae 
create 
'/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10

7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e 
setData 
'/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1
{code}

Please take a look at ctime, mtime, and ephemeralOwner.
Ephemeral owner session was already closed from nimbus side but there's 
possible for node to be not deleted immediately, so new session doesn't create 
new node but set the value to ephemeral node for other session which is already 
closed.

{code}
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client 
for session: 0x255a62e310c0005
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: 
0x255a62e310c0005 closed
{code}

We can delete the node first and set ephemeral node when reconnect event 
handler is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to