[ https://issues.apache.org/jira/browse/HBASE-20671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502635#comment-16502635 ]
stack commented on HBASE-20671: ------------------------------- Yeah. Good one. Problematic code is in the load of meta after a master crash: {code} State localState = state; if (localState == null) { // No region state column data in hbase:meta table! Are I doing a rolling upgrade from // hbase1 to hbase2? Am I restoring a SNAPSHOT or otherwise adding a region to hbase:meta? // In any of these cases, state is empty. For now, presume OFFLINE but there are probably // cases where we need to probe more to be sure this correct; TODO informed by experience. LOG.info(regionInfo.getEncodedName() + " regionState=null; presuming " + State.OFFLINE); localState = State.OFFLINE; } {code] Above note allows that there are cases where the presumption that the region is OFFLINE is wrong (per HBASE-19529). Let me write a test for this one to repro. We need a FOR_GC state to which we set Regions that are up for deletion. > Merged region brought back to life causing RS to be killed by Master > -------------------------------------------------------------------- > > Key: HBASE-20671 > URL: https://issues.apache.org/jira/browse/HBASE-20671 > Project: HBase > Issue Type: Bug > Components: amv2 > Affects Versions: 2.0.0 > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Critical > Fix For: 2.0.1 > > Attachments: > hbase-hbase-master-ctr-e138-1518143905142-336066-01-000003.hwx.site.log.zip, > hbase-hbase-regionserver-ctr-e138-1518143905142-336066-01-000002.hwx.site.log.zip > > > Another bug coming out of a master restart and replay of the pv2 logs. > The master merged two regions into one successfully, was restarted, but then > ended up assigning the children region back out to the cluster. There is a > log message which appears to indicate that RegionStates acknowledges that it > doesn't know what this region is as it's replaying the pv2 WAL; however, it > incorrectly assumes that the region is just OFFLINE and needs to be assigned. > {noformat} > 2018-05-30 04:26:00,055 INFO > [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000] master.HMaster: > Client=hrt_qa//172.27.85.11 Merge regions a7dd6606dcacc9daf085fc9fa2aecc0c > and 4017a3c778551d4d258c785d455f9c0b > 2018-05-30 04:28:27,525 DEBUG > [master/ctr-e138-1518143905142-336066-01-000003:20000] > procedure2.ProcedureExecutor: Completed pid=4368, state=SUCCESS; > MergeTableRegionsProcedure table=tabletwo_merge, > regions=[a7dd6606dcacc9daf085fc9fa2aecc0c, 4017a3c778551d4d258c785d455f9c0b], > forcibly=false > {noformat} > {noformat} > 2018-05-30 04:29:20,263 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.AssignmentManager: a7dd6606dcacc9daf085fc9fa2aecc0c > regionState=null; presuming OFFLINE > 2018-05-30 04:29:20,263 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! > rit=OFFLINE, location=null, table=tabletwo_merge, > region=a7dd6606dcacc9daf085fc9fa2aecc0c > 2018-05-30 04:29:20,266 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.AssignmentManager: 4017a3c778551d4d258c785d455f9c0b > regionState=null; presuming OFFLINE > 2018-05-30 04:29:20,266 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! > rit=OFFLINE, location=null, table=tabletwo_merge, > region=4017a3c778551d4d258c785d455f9c0b > {noformat} > Eventually, the RS reports in its online regions, and the master tells it to > kill itself: > {noformat} > 2018-05-30 04:29:24,272 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=20000] > assignment.AssignmentManager: Killing > ctr-e138-1518143905142-336066-01-000002.hwx.site,16020,1527654546619: Not > online: tabletwo_merge,,1527652130538.a7dd6606dcacc9daf085fc9fa2aecc0c. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)