[ 
https://issues.apache.org/jira/browse/HBASE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4400:
------------------------------------------

    Attachment: HBASE-4400_0.90_1.patch

> .META. getting stuck if RS hosting it is dead and znode state is in 
> RS_ZK_REGION_OPENED
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-4400
>                 URL: https://issues.apache.org/jira/browse/HBASE-4400
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.92.0, 0.90.5
>
>         Attachments: HBASE-4400_0.90.patch, HBASE-4400_0.90_1.patch, 
> HBASE-4400_trunk.patch, HBASE-4400_trunk_1.patch
>
>
> Start 2 RS.
> The .META. is being hosted by RS2 but while processing it goes down.
> Now restart the master and RS1.  Master gets the RS name from the znode in 
> RS_ZK_REGION_OPENED.  But as RS2 is not online still the master is not able 
> to process the META at all.  Please find the logs
> {noformat}
> 2011-09-14 16:43:51,949 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=linux76,60020,1315998828523, 
> region=70236052/-ROOT-
> 2011-09-14 16:43:51,968 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- 
> assigned=1, rit=false, location=linux76:60020
> 2011-09-14 16:43:51,970 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
> 2011-09-14 16:43:51,970 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Failed to find 
> linux146,60020,1315998414623 in list of online servers; skipping registration 
> of open of .META.,,1.1028785192
> 2011-09-14 16:43:51,971 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Waiting on 1028785192/.META.
> 2011-09-14 16:43:51,983 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=linux76,60020,1315998828523, 
> region=70236052/-ROOT-
> 2011-09-14 16:43:51,986 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
> event for 70236052; deleting unassigned node
> 2011-09-14 16:43:51,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:60000-0x13267854032001d Deleting existing unassigned node for 70236052 
> that is in expected state RS_ZK_REGION_OPENED
> 2011-09-14 16:43:51,998 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:60000-0x13267854032001d Successfully deleted unassigned node for 
> region 70236052 in expected state RS_ZK_REGION_OPENED
> 2011-09-14 16:43:51,999 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> -ROOT-,,0.70236052 on linux76,60020,1315998828523
> 2011-09-14 16:44:00,945 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Registering server=linux146,60020,1315998839724, regionCount=0, userLoad=false
> 2011-09-14 16:46:20,003 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  .META.,,1.1028785192 state=OPEN, ts=0
> 2011-09-14 16:46:20,004 ERROR 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
> too long, we don't know where region was opened so can't do anything
> {noformat}
> {code}
>         regionsInTransition.put(encodedRegionName, new RegionState(
>             regionInfo, RegionState.State.OPEN, data.getStamp()));
>           ................
>         } else {
>           HServerInfo hsi = this.serverManager.getServerInfo(sn);
>           if (hsi == null) {
>             LOG.info("Failed to find " + sn +
>               " in list of online servers; skipping registration of open of " 
> +
>               regionInfo.getRegionNameAsString());
>           } else {
>             new OpenedRegionHandler(master, this, regionInfo, hsi).process();
>           }
>         }
> {code}
> So timeout monitor is not able to do anything here
> {code}
>           LOG.error("Region has been OPEN for too long, " +
>           "we don't know where region was opened so can't do anything");
>           synchronized(regionState) {
>             regionState.update(regionState.getState());
>           }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to