[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-9387: ------------------------- Priority: Critical (was: Major) Suggested workaround would be to review all transitions and on edges such as this one, going from OPENING to OPENED, if it fails, do a radical abort (not just for meta region). Then in another issue stepback and revisit our system for managing region manipulation. It is way to complex consuming way too many hours of eng. time and there are holes. > Region could get lost during assignment > --------------------------------------- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: 0.95.2 > Reporter: Ted Yu > Assignee: Ted Yu > Priority: Critical > Attachments: 9387-v1.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira