[ https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567181#comment-13567181 ]
Jimmy Xiang commented on HBASE-7701: ------------------------------------ That's not because the region state is inconsistent. It's because the timeout monitor wakes up every 30 seconds. To fix this issue, we can (1) reduce the timeout monitor checking period from 30 seconds to 1 second, or (2) don't depend on timeout monitor for reassignment. I prefer (2). > inconsistent state in AssignmentManager for moving region > --------------------------------------------------------- > > Key: HBASE-7701 > URL: https://issues.apache.org/jira/browse/HBASE-7701 > Project: HBase > Issue Type: Bug > Affects Versions: 0.96.0 > Reporter: Sergey Shelukhin > Attachments: > TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml > > > Closed regions are not removed from assignments. I am not sure if it's a > general state problem, or just a small bug; for now, one manifestation is > that moved region is ignored by SSH of the target server if target server > dies before updating ZK. > {code} > 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] > master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 > for region > IntegrationTestRebalanceAndKillServersTargeted,66666660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. > 2013-01-22 17:59:00,997 DEBUG > [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] > handler.CloseRegionHandler(167): set region closed state in zk successfully > for region > IntegrationTestRebalanceAndKillServersTargeted,66666660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. > sn name: 10.11.2.92,51231,1358906285048 > 2013-01-22 17:59:01,088 INFO > [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] > master.RegionStates(242): Region {NAME => > 'IntegrationTestRebalanceAndKillServersTargeted,66666660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.', > STARTKEY => '66666660', ENDKEY => '7333332c', > ENCODED => 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from > {IntegrationTestRebalanceAndKillServersTargeted,66666660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. > state=CLOSED, ts=1358906341087, server=null} to > {IntegrationTestRebalanceAndKillServersTargeted,66666660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. > state=OFFLINE, ts=1358906341088, server=null} > 2013-01-22 17:59:01,128 INFO > [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] > master.AssignmentManager(1596): Assigning region > IntegrationTestRebalanceAndKillServersTargeted,66666660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. > to 10.11.2.92,50661,1358906192942 > ... (50661 didn't update ZK to OPEN, only OPENING) > 2013-01-22 17:59:06,605 INFO > [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] > handler.ServerShutdownHandler(202): Reassigning 7 region(s) that > 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are > already in transition) > 2013-01-22 17:59:06,605 DEBUG > [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] > handler.ServerShutdownHandler(219): Skip assigning region > IntegrationTestRebalanceAndKillServersTargeted,66666660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. > because it has been opened in 10.11.2.92,51231,1358906285048 > {code} > Note the server in the last line - the one that has long closed the region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira