[ https://issues.apache.org/jira/browse/HBASE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3381: ------------------------- Attachment: 3381.txt Here is the patch. I've not been able to repro the condition during last few hours of testing so would like to commit this (need a +1 -- Jon?). While in here, I did some cleanup of hbck messages and stopped it claiming error when offlined split parent. Also added logging around fixup of case where parent offlining edit got in but not daughter addtions; needed debugging. > Interrupt of a region open comes across as a successful open > ------------------------------------------------------------ > > Key: HBASE-3381 > URL: https://issues.apache.org/jira/browse/HBASE-3381 > Project: HBase > Issue Type: Bug > Reporter: stack > Fix For: 0.90.0 > > Attachments: 3381.txt > > > Meta was offline when below happened: > {code} > 2010-12-21 19:45:23,023 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:60020-0x12d0a53c540000e Attempting to transition node > 337038b50e467fbd6b031f278bbd9c22 from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENING > 2010-12-21 19:45:23,046 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:60020-0x12d0a53c540000e Successfully transitioned node > 337038b50e467fbd6b031f278bbd9c22 from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENING > 2010-12-21 19:45:26,379 DEBUG > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Interrupting > thread Thread[PostOpenDeployTasks:337038b50e467fbd6b031f278bbd9c22,5,main] > 2010-12-21 19:45:26,379 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:60020-0x12d0a53c540000e Attempting to transition node > 337038b50e467fbd6b031f278bbd9c22 from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENED > 2010-12-21 19:45:26,381 WARN > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Exception > running postOpenDeployTasks; region=337038b50e467fbd6b031f278bbd9c22 > org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Interrupted > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:364) > at > org.apache.hadoop.hbase.catalog.MetaEditor.updateRegionLocation(MetaEditor.java:146) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1331) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:195) > ... > {code} > So, we timed out trying to open the region but rather than close the region > because edit failed, we missed seeing the InterruptedException. > Here is suggested fix: > {code} > diff --git a/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java > b/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java > index 7bf680d..2b0078c 100644 > --- a/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java > +++ b/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java > @@ -339,7 +339,7 @@ public class MetaReader { > get.addFamily(HConstants.CATALOG_FAMILY); > byte [] meta = getCatalogRegionNameForRegion(regionName); > Result r = catalogTracker.waitForMetaServerConnectionDefault().get(meta, > get); > - if(r == null || r.isEmpty()) { > + if (r == null || r.isEmpty()) { > return null; > } > return metaRowToRegionPair(r); > {code} > Let me try it. > W/o this, what we see is hbck showing that region is on server X but in > .META. it shows as being on Y (its pre-balance server) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.