[ https://issues.apache.org/jira/browse/HBASE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Yu updated HBASE-3741: -------------------------- Fix Version/s: (was: 0.90.5) 0.90.3 > Make HRegionServer aware of the regions it's opening/closing > ------------------------------------------------------------ > > Key: HBASE-3741 > URL: https://issues.apache.org/jira/browse/HBASE-3741 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.1 > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Priority: Blocker > Fix For: 0.90.3 > > Attachments: HBASE-3741-rsfix-v2.patch, HBASE-3741-rsfix-v3.patch, > HBASE-3741-rsfix.patch, HBASE-3741-trunk.patch > > > This is a serious issue about a race between regions being opened and closed > in region servers. We had this situation where the master tried to unassign a > region for balancing, failed, force unassigned it, force assigned it > somewhere else, failed to open it on another region server (took too long), > and then reassigned it back to the original region server. A few seconds > later, the region server processed the first closed and the region was left > unassigned. > This is from the master log: > {quote} > 11-04-05 15:11:17,758 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: > Sent CLOSE to serverName=sv4borg42,60020,1300920459477, load=(requests=187, > regions=574, usedHeap=3918, maxHeap=6973) for region > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:12:10,021 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed > out: > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > state=PENDING_CLOSE, ts=1302041477758 > 2011-04-05 15:12:10,021 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been > PENDING_CLOSE for too long, running forced unassign again on > region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > ... > 2011-04-05 15:14:45,783 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > state=CLOSED, ts=1302041685733 > 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x42ec2cece810b68 Creating (or updating) unassigned node for > 1470298961 with OFFLINE state > ... > 2011-04-05 15:14:45,885 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for > region > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961; > > plan=hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961, > src=sv4borg42,60020,1300920459477, dest=sv4borg40,60020,1302041218196 > 2011-04-05 15:14:45,885 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > to sv4borg40,60020,1302041218196 > 2011-04-05 15:15:39,410 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed > out: > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > state=PENDING_OPEN, ts=1302041700944 > 2011-04-05 15:15:39,410 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been > PENDING_OPEN for too long, reassigning > region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:39,410 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > state=PENDING_OPEN, ts=1302041700944 > ... > 2011-04-05 15:15:39,410 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan > was found (or we are ignoring an existing plan) for > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > so generated a random one; > hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961, > src=, dest=sv4borg42,60020,1300920459477; 19 (online=19, exclude=null) > available servers > 2011-04-05 15:15:39,410 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > to sv4borg42,60020,1300920459477 > 2011-04-05 15:15:40,951 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: > master:60000-0x42ec2cece810b68 Received ZooKeeper Event, > type=NodeDataChanged, state=SyncConnected, > path=/prodjobs/unassigned/1470298961 > 2011-04-05 15:15:40,952 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:60000-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode > /prodjobs/unassigned/1470298961 and set watcher; > region=stumbles_by_userid2,'����穗���6,1266566087256, > server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_OPENED > 2011-04-05 15:15:40,952 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENED, server=sv4borg42,60020,1300920459477, > region=1470298961 > 2011-04-05 15:15:42,222 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED > event for 1470298961; deleting unassigned node > ... > 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:60000-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode > /prodjobs/unassigned/1470298961 and set watcher; > region=stumbles_by_userid2,'����穗���6,1266566087256, > server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING > 2011-04-05 15:15:55,812 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned > node: /prodjobs/unassigned/1470298961 > (region=stumbles_by_userid2,'����穗���6,1266566087256, > server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING) > 2011-04-05 15:15:55,812 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_CLOSING, server=sv4borg42,60020,1300920459477, > region=1470298961 > 2011-04-05 15:15:55,812 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSING for region > 1470298961 from server sv4borg42,60020,1300920459477 but region was in the > state null and not in expected PENDING_CLOSE or CLOSING states > {quote} > And from sv4borg42: > {quote} > 2011-04-05 15:09:58,755 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:11:17,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:12:10,021 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:14:45,675 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing > close of > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:14:45,700 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Closing > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961: > disabling compactions & flushes > 2011-04-05 15:14:45,701 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Updates disabled for region > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:14:45,701 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Closed > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:14:45,758 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed > region > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:39,410 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open > region: > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:39,410 DEBUG > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing > open of > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:39,486 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Opening region: REGION => {NAME => > 'stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256', > STARTKEY => '\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6', > ENDKEY => '\x00'\x9AU\x7F\xFF\xFE\xEBQ\xB0\xC3\xEF\x00Jr\xF2', ENCODED => > 1470298961, TABLE => ... > 2011-04-05 15:15:39,487 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Instantiated > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:40,399 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Onlined > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961; > next sequenceid=37627407247 > 2011-04-05 15:15:40,488 INFO org.apache.hadoop.hbase.catalog.MetaEditor: > Updated row > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > in region .META.,,1 with server=sv4borg42:60020, startcode=1300920459477 > 2011-04-05 15:15:40,582 DEBUG > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,776 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing > close of > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Closing > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961: > disabling compactions & flushes > 2011-04-05 15:15:55,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Updates disabled for region > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,809 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Closed > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,842 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed > region > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,943 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing > close of > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,943 WARN > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Received > CLOSE for region > stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > but currently not serving > {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira