[ https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026020#comment-13026020 ]
Jean-Daniel Cryans commented on HBASE-3669: ------------------------------------------- I'm starting to think that we should set hbase.master.assignment.timeoutmonitor.timeout equal to the ZK timeout since it causes so many issues. > Region in PENDING_OPEN keeps being bounced between RS and master > ---------------------------------------------------------------- > > Key: HBASE-3669 > URL: https://issues.apache.org/jira/browse/HBASE-3669 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.1 > Reporter: Jean-Daniel Cryans > Priority: Critical > Fix For: 0.90.3, 0.92.0 > > Attachments: HBASE-3669-debug-v1.patch > > > After going crazy killing region servers after HBASE-3668, most of the > cluster recovered except for 3 regions that kept being refused by the region > servers. > One the master I would see: > {code} > 2011-03-17 22:23:14,828 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed > out: > supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. > state=PENDING_OPEN, ts=1300400554826 > 2011-03-17 22:23:14,828 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been > PENDING_OPEN for too long, reassigning > region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. > 2011-03-17 22:23:14,828 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. > state=PENDING_OPEN, ts=1300400554826 > 2011-03-17 22:23:14,828 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan > was found (or we are ignoring an existing plan) for > supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. > so generated a random one; > hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., > src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) > available servers > 2011-03-17 22:23:14,828 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. > to sv2borg171,60020,1300399357135 > {code} > Then on the region server: > {code} > 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:60020-0x22d627c142707d2 Attempting to transition node > f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to > RS_ZK_REGION_OPENING > 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode > /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; > data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., > server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING > 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned > node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to > RS_ZK_REGION_OPENING failed, the node existed but was in the state > RS_ZK_REGION_OPENING > 2011-03-17 22:23:14,832 WARN > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed > transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21 > {code} > I'm not sure I fully understand what was going on... the master was suppose > to OFFLINE the znode but then that's not what the region server was seeing? > In any case, I was able to recover by doing a force unassign for each region > and then assign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira