[jira] [Commented] (HBASE-7551) nodeChildrenChange event may happen after the transition to RS_ZK_REGION_SPLITTING in SplitTransaction causing the SPLIT event to be missed in the master side.

Lars Hofhansl (JIRA) Tue, 15 Jan 2013 14:42:20 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554472#comment-13554472
 ]


Lars Hofhansl commented on HBASE-7551:
--------------------------------------

Either way :)
Ram did the work of tracking the issue down...
                
> nodeChildrenChange event may happen after the transition to 
> RS_ZK_REGION_SPLITTING in SplitTransaction causing the SPLIT event to be 
> missed in the master side.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7551
>                 URL: https://issues.apache.org/jira/browse/HBASE-7551
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.4
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.96.0, 0.94.5
>
>         Attachments: 7551-0.94-test.txt, 7551-0.94-v1.txt, 7551-trunk.txt, 
> 7551-trunk-v2.txt, testSplitTransactionOnCluster-output.txt
>
>
> This came from HBASE-7468.
> I got the issue. I am able to reproduce this
> See the logs
> {code}
> 2013-01-14 14:37:21,760 INFO  [main] regionserver.SplitTransaction(216): 
> Starting split of region 
> testShouldClearRITWhenNodeFoundInSplittingState,,1358154439514.a9e57d09c58b3ef3b949d602232fb2c2.
> 2013-01-14 14:37:21,760 DEBUG [main] regionserver.SplitTransaction(871): 
> regionserver:61665-0x13c384e4e4f0002 Creating ephemeral node for 
> a9e57d09c58b3ef3b949d602232fb2c2 in SPLITTING state
> 2013-01-14 14:37:21,844 DEBUG [main] zookeeper.ZKAssign(757): 
> regionserver:61665-0x13c384e4e4f0002 Attempting to transition node 
> a9e57d09c58b3ef3b949d602232fb2c2 from RS_ZK_REGION_SPLITTING to 
> RS_ZK_REGION_SPLITTING
> 2013-01-14 14:37:21,849 DEBUG [Thread-873-EventThread] 
> zookeeper.ZooKeeperWatcher(277): master:62334-0x13c384e4e4f001b Received 
> ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, 
> path=/hbase/unassigned
> 2013-01-14 14:37:21,853 DEBUG [main] zookeeper.ZKUtil(1565): 
> regionserver:61665-0x13c384e4e4f0002 Retrieved 140 byte(s) of data from znode 
> /hbase/unassigned/a9e57d09c58b3ef3b949d602232fb2c2; 
> data=region=testShouldClearRITWhenNodeFoundInSplittingState,,1358154439514.a9e57d09c58b3ef3b949d602232fb2c2.,
>  origin=Ram.Home,61665,1358154325430, state=RS_ZK_REGION_SPLITTING
> 2013-01-14 14:37:21,918 DEBUG [main] zookeeper.ZKAssign(820): 
> regionserver:61665-0x13c384e4e4f0002 Successfully transitioned node 
> a9e57d09c58b3ef3b949d602232fb2c2 from RS_ZK_REGION_SPLITTING to 
> RS_ZK_REGION_SPLITTING
> 2013-01-14 14:37:21,919 DEBUG [Thread-873-EventThread] zookeeper.ZKUtil(417): 
> master:62334-0x13c384e4e4f001b Set watcher on existing znode 
> /hbase/unassigned/a9e57d09c58b3ef3b949d602232fb2c2
> {code}
> Here we can observe that the SPLITTING node was first created. Then we 
> transit it to SPLITTING to SPLITTING so that AM can have the nodeDataChange 
> event. But for the nodeDataChange event to happen first nodeChildrenChange 
> event should happen so that the master can set a watcher on the node.
> Now when this hang happens, we can see that after the transition happens only 
> then the watcher is set by nodeChildrenChange event and so the SPLITTING to 
> SPLITTING event itself is missed or skipped.
> Ideally the nodeChildrenChange event iterates thro the list of new znodes on 
> the /hbase/assignment nodes. And then creates a watcher on that. One reason 
> could be there are more than one znode and so the watch setting operation 
> takes time. The order of execution is different when we try running from 
> eclipse and when we run mvn tests. 
> My conclusion is that the testcase actually reveals the problem but the same 
> can happen in any case where the SPLITTING event can get missed out. May be 
> some of the SPLIT related bugs that were raised is due to this? Need to 
> analyse.
> Any suggestions welcome. We should ensure that the transition from SPLITTING 
> to SPLITTING should happen only after the master has set the watch on the 
> znode and we should be sure of that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7551) nodeChildrenChange event may happen after the transition to RS_ZK_REGION_SPLITTING in SplitTransaction causing the SPLIT event to be missed in the master side.

Reply via email to