[ 
https://issues.apache.org/jira/browse/HBASE-20671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502635#comment-16502635
 ] 

stack commented on HBASE-20671:
-------------------------------

Yeah. Good one.  Problematic code is in the load of meta after a master crash:

{code}
        State localState = state;
        if (localState == null) {
          // No region state column data in hbase:meta table! Are I doing a 
rolling upgrade from
          // hbase1 to hbase2? Am I restoring a SNAPSHOT or otherwise adding a 
region to hbase:meta?
          // In any of these cases, state is empty. For now, presume OFFLINE 
but there are probably
          // cases where we need to probe more to be sure this correct; TODO 
informed by experience.
          LOG.info(regionInfo.getEncodedName() + " regionState=null; presuming 
" + State.OFFLINE);
          localState = State.OFFLINE;
        }
{code]

Above note allows that there are cases where the presumption that the region is 
OFFLINE is wrong (per HBASE-19529).

Let me write a test for this one to repro. We need a FOR_GC state to which we 
set Regions that are up for deletion.

> Merged region brought back to life causing RS to be killed by Master
> --------------------------------------------------------------------
>
>                 Key: HBASE-20671
>                 URL: https://issues.apache.org/jira/browse/HBASE-20671
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>    Affects Versions: 2.0.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Critical
>             Fix For: 2.0.1
>
>         Attachments: 
> hbase-hbase-master-ctr-e138-1518143905142-336066-01-000003.hwx.site.log.zip, 
> hbase-hbase-regionserver-ctr-e138-1518143905142-336066-01-000002.hwx.site.log.zip
>
>
> Another bug coming out of a master restart and replay of the pv2 logs.
> The master merged two regions into one successfully, was restarted, but then 
> ended up assigning the children region back out to the cluster. There is a 
> log message which appears to indicate that RegionStates acknowledges that it 
> doesn't know what this region is as it's replaying the pv2 WAL; however, it 
> incorrectly assumes that the region is just OFFLINE and needs to be assigned.
> {noformat}
> 2018-05-30 04:26:00,055 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000] master.HMaster: 
> Client=hrt_qa//172.27.85.11 Merge regions a7dd6606dcacc9daf085fc9fa2aecc0c 
> and 4017a3c778551d4d258c785d455f9c0b
> 2018-05-30 04:28:27,525 DEBUG 
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> procedure2.ProcedureExecutor: Completed pid=4368, state=SUCCESS; 
> MergeTableRegionsProcedure table=tabletwo_merge, 
> regions=[a7dd6606dcacc9daf085fc9fa2aecc0c, 4017a3c778551d4d258c785d455f9c0b], 
> forcibly=false
> {noformat}
> {noformat}
> 2018-05-30 04:29:20,263 INFO  
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> assignment.AssignmentManager: a7dd6606dcacc9daf085fc9fa2aecc0c 
> regionState=null; presuming OFFLINE
> 2018-05-30 04:29:20,263 INFO  
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! 
> rit=OFFLINE, location=null, table=tabletwo_merge, 
> region=a7dd6606dcacc9daf085fc9fa2aecc0c
> 2018-05-30 04:29:20,266 INFO  
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> assignment.AssignmentManager: 4017a3c778551d4d258c785d455f9c0b 
> regionState=null; presuming OFFLINE
> 2018-05-30 04:29:20,266 INFO  
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! 
> rit=OFFLINE, location=null, table=tabletwo_merge, 
> region=4017a3c778551d4d258c785d455f9c0b
> {noformat}
> Eventually, the RS reports in its online regions, and the master tells it to 
> kill itself:
> {noformat}
> 2018-05-30 04:29:24,272 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=20000] 
> assignment.AssignmentManager: Killing 
> ctr-e138-1518143905142-336066-01-000002.hwx.site,16020,1527654546619: Not 
> online: tabletwo_merge,,1527652130538.a7dd6606dcacc9daf085fc9fa2aecc0c.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to