[ https://issues.apache.org/jira/browse/HBASE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554513#comment-13554513 ]
Hudson commented on HBASE-7551: ------------------------------- Integrated in HBase-0.94 #736 (See [https://builds.apache.org/job/HBase-0.94/736/]) HBASE-7551 nodeChildrenChange event may happen after the transition to RS_ZK_REGION_SPLITTING in SplitTransaction causing the SPLIT event to be missed in the master side. (Ram, Ted, and Lars H) (Revision 1433700) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java > nodeChildrenChange event may happen after the transition to > RS_ZK_REGION_SPLITTING in SplitTransaction causing the SPLIT event to be > missed in the master side. > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-7551 > URL: https://issues.apache.org/jira/browse/HBASE-7551 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.94.4 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Fix For: 0.96.0, 0.94.5 > > Attachments: 7551-0.94-test.txt, 7551-0.94-v1.txt, 7551-trunk.txt, > 7551-trunk-v2.txt, testSplitTransactionOnCluster-output.txt > > > This came from HBASE-7468. > I got the issue. I am able to reproduce this > See the logs > {code} > 2013-01-14 14:37:21,760 INFO [main] regionserver.SplitTransaction(216): > Starting split of region > testShouldClearRITWhenNodeFoundInSplittingState,,1358154439514.a9e57d09c58b3ef3b949d602232fb2c2. > 2013-01-14 14:37:21,760 DEBUG [main] regionserver.SplitTransaction(871): > regionserver:61665-0x13c384e4e4f0002 Creating ephemeral node for > a9e57d09c58b3ef3b949d602232fb2c2 in SPLITTING state > 2013-01-14 14:37:21,844 DEBUG [main] zookeeper.ZKAssign(757): > regionserver:61665-0x13c384e4e4f0002 Attempting to transition node > a9e57d09c58b3ef3b949d602232fb2c2 from RS_ZK_REGION_SPLITTING to > RS_ZK_REGION_SPLITTING > 2013-01-14 14:37:21,849 DEBUG [Thread-873-EventThread] > zookeeper.ZooKeeperWatcher(277): master:62334-0x13c384e4e4f001b Received > ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, > path=/hbase/unassigned > 2013-01-14 14:37:21,853 DEBUG [main] zookeeper.ZKUtil(1565): > regionserver:61665-0x13c384e4e4f0002 Retrieved 140 byte(s) of data from znode > /hbase/unassigned/a9e57d09c58b3ef3b949d602232fb2c2; > data=region=testShouldClearRITWhenNodeFoundInSplittingState,,1358154439514.a9e57d09c58b3ef3b949d602232fb2c2., > origin=Ram.Home,61665,1358154325430, state=RS_ZK_REGION_SPLITTING > 2013-01-14 14:37:21,918 DEBUG [main] zookeeper.ZKAssign(820): > regionserver:61665-0x13c384e4e4f0002 Successfully transitioned node > a9e57d09c58b3ef3b949d602232fb2c2 from RS_ZK_REGION_SPLITTING to > RS_ZK_REGION_SPLITTING > 2013-01-14 14:37:21,919 DEBUG [Thread-873-EventThread] zookeeper.ZKUtil(417): > master:62334-0x13c384e4e4f001b Set watcher on existing znode > /hbase/unassigned/a9e57d09c58b3ef3b949d602232fb2c2 > {code} > Here we can observe that the SPLITTING node was first created. Then we > transit it to SPLITTING to SPLITTING so that AM can have the nodeDataChange > event. But for the nodeDataChange event to happen first nodeChildrenChange > event should happen so that the master can set a watcher on the node. > Now when this hang happens, we can see that after the transition happens only > then the watcher is set by nodeChildrenChange event and so the SPLITTING to > SPLITTING event itself is missed or skipped. > Ideally the nodeChildrenChange event iterates thro the list of new znodes on > the /hbase/assignment nodes. And then creates a watcher on that. One reason > could be there are more than one znode and so the watch setting operation > takes time. The order of execution is different when we try running from > eclipse and when we run mvn tests. > My conclusion is that the testcase actually reveals the problem but the same > can happen in any case where the SPLITTING event can get missed out. May be > some of the SPLIT related bugs that were raised is due to this? Need to > analyse. > Any suggestions welcome. We should ensure that the transition from SPLITTING > to SPLITTING should happen only after the master has set the watch on the > znode and we should be sure of that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira