[ https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183925#comment-13183925 ]
Shrijeet Paliwal commented on HBASE-3638: ----------------------------------------- Here is the relevant portion of log. The master (even if you restart all the Hbase services across the cluster) will always get stuck at this state. {noformat} 2012-01-10 21:28:03,382 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region in transition 1028785192 references a server no longer up txa-18.rfiserve.net,60020,1326125886539; letting RIT timeout so will be assigned elsewhere 2012-01-10 21:28:06,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:06,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:16,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:16,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:26,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:26,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:36,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:36,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:46,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:46,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:56,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 {noformat} bq. What do you think Stack, can master pick a stale ZK state which is not a leftover from previous HBase install, in other words a stale state created by itself? By this I was referring to comment made by Todd in the related jira when he said: bq. Notably, it wasn't clearing ZK between runs. So some leftover RIT data from a previous HBase incarnation may be confusing this one's master. He floated one possibility, left over RIT from previous incarnation. I am thinking what other possibilities are there? > If a FS bootstrap, need to also ensure ZK is cleaned > ---------------------------------------------------- > > Key: HBASE-3638 > URL: https://issues.apache.org/jira/browse/HBASE-3638 > Project: HBase > Issue Type: Bug > Reporter: stack > Priority: Minor > > In a test environment where a cycle of start, operation, kill hbase (repeat), > noticed that we were doing a bootstrap on startup but then we were picking up > the previous cycles zk state. It made for a mess in the test. > Last thing seen on previous cycle was: > {code} > 2011-03-11 06:33:36,708 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, > region=1028785192/.META. > {code} > Then, in the messed up cycle I saw: > {code} > 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: > BOOTSTRAP: creating ROOT and first META regions > ..... > {code} > Then after setting watcher on .META., we get a > {code} > 2011-03-11 06:42:58,301 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Processing region > .META.,,1.1028785192 in state RS_ZK_REGION_OPENED > 2011-03-11 06:42:58,302 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Region in transition > 1028785192 references a server no longer up X.X.X; letting RIT timeout so > will be assigned elsewhere > {code} > We're all confused. > Should at least clear our zk if a bootstrap happened. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira