[ https://issues.apache.org/jira/browse/HBASE-20671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579011#comment-16579011 ]
stack commented on HBASE-20671: ------------------------------- Downing priority and moving out of 2.0.2. Josh hasn't seen this recently. We have put in a bit of a workaround too. Leaving open in case we get a fresh instance. > Merged region brought back to life causing RS to be killed by Master > -------------------------------------------------------------------- > > Key: HBASE-20671 > URL: https://issues.apache.org/jira/browse/HBASE-20671 > Project: HBase > Issue Type: Bug > Components: amv2 > Affects Versions: 2.0.0 > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Major > Attachments: 0001-Test-for-HBASE-20671.patch, > hbase-hbase-master-ctr-e138-1518143905142-336066-01-000003.hwx.site.log.zip, > hbase-hbase-regionserver-ctr-e138-1518143905142-336066-01-000002.hwx.site.log.zip, > workaround.txt > > > Another bug coming out of a master restart and replay of the pv2 logs. > The master merged two regions into one successfully, was restarted, but then > ended up assigning the children region back out to the cluster. There is a > log message which appears to indicate that RegionStates acknowledges that it > doesn't know what this region is as it's replaying the pv2 WAL; however, it > incorrectly assumes that the region is just OFFLINE and needs to be assigned. > {noformat} > 2018-05-30 04:26:00,055 INFO > [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000] master.HMaster: > Client=hrt_qa//172.27.85.11 Merge regions a7dd6606dcacc9daf085fc9fa2aecc0c > and 4017a3c778551d4d258c785d455f9c0b > 2018-05-30 04:28:27,525 DEBUG > [master/ctr-e138-1518143905142-336066-01-000003:20000] > procedure2.ProcedureExecutor: Completed pid=4368, state=SUCCESS; > MergeTableRegionsProcedure table=tabletwo_merge, > regions=[a7dd6606dcacc9daf085fc9fa2aecc0c, 4017a3c778551d4d258c785d455f9c0b], > forcibly=false > {noformat} > {noformat} > 2018-05-30 04:29:20,263 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.AssignmentManager: a7dd6606dcacc9daf085fc9fa2aecc0c > regionState=null; presuming OFFLINE > 2018-05-30 04:29:20,263 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! > rit=OFFLINE, location=null, table=tabletwo_merge, > region=a7dd6606dcacc9daf085fc9fa2aecc0c > 2018-05-30 04:29:20,266 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.AssignmentManager: 4017a3c778551d4d258c785d455f9c0b > regionState=null; presuming OFFLINE > 2018-05-30 04:29:20,266 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! > rit=OFFLINE, location=null, table=tabletwo_merge, > region=4017a3c778551d4d258c785d455f9c0b > {noformat} > Eventually, the RS reports in its online regions, and the master tells it to > kill itself: > {noformat} > 2018-05-30 04:29:24,272 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=20000] > assignment.AssignmentManager: Killing > ctr-e138-1518143905142-336066-01-000002.hwx.site,16020,1527654546619: Not > online: tabletwo_merge,,1527652130538.a7dd6606dcacc9daf085fc9fa2aecc0c. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)