[ https://issues.apache.org/jira/browse/HBASE-20671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584709#comment-16584709 ]
Tak Lon (Stephen) Wu commented on HBASE-20671: ---------------------------------------------- hi guys, I am not 100% sure yet but I recently worked on using {{hbase.readonly}} to be true on hbase-2.1.0 for a read replica cluster that the {{hbase:namespace}} cannot be assigned (infinite loop when {{isTableAssigned}} is checking for {{hbase:namespace}} table but return false) during the read replica cluster startup. I found the patch of HBASE-20702 has skipped `empty` rows but seems like rows for system table(s) e.g. {{hbase:namespace}} should not be considered as empty. I made my band-aid change below and the cluster resumed to be started. {noformat} private void loadMeta() throws IOException { // TODO: use a thread pool regionStateStore.visitMeta(new RegionStateStore.RegionStateVisitor() { @Override public void visitRegionState(Result result, final RegionInfo regionInfo, final State state, final ServerName regionLocation, final ServerName lastHost, final long openSeqNum) { if (!regionInfo.getTable().equals(TableName.NAMESPACE_TABLE_NAME)) { // <-- added to unblock the read replica cluster if (state == null && regionLocation == null && lastHost == null && openSeqNum == SequenceId.NO_SEQUENCE_ID) { // This is a row with nothing in it. LOG.warn("Skipping empty row={}", result); return; } } {noformat} so, do you guys think I should fix it in other place? > Merged region brought back to life causing RS to be killed by Master > -------------------------------------------------------------------- > > Key: HBASE-20671 > URL: https://issues.apache.org/jira/browse/HBASE-20671 > Project: HBase > Issue Type: Bug > Components: amv2 > Affects Versions: 2.0.0 > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Major > Attachments: 0001-Test-for-HBASE-20671.patch, > hbase-hbase-master-ctr-e138-1518143905142-336066-01-000003.hwx.site.log.zip, > hbase-hbase-regionserver-ctr-e138-1518143905142-336066-01-000002.hwx.site.log.zip, > workaround.txt > > > Another bug coming out of a master restart and replay of the pv2 logs. > The master merged two regions into one successfully, was restarted, but then > ended up assigning the children region back out to the cluster. There is a > log message which appears to indicate that RegionStates acknowledges that it > doesn't know what this region is as it's replaying the pv2 WAL; however, it > incorrectly assumes that the region is just OFFLINE and needs to be assigned. > {noformat} > 2018-05-30 04:26:00,055 INFO > [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000] master.HMaster: > Client=hrt_qa//172.27.85.11 Merge regions a7dd6606dcacc9daf085fc9fa2aecc0c > and 4017a3c778551d4d258c785d455f9c0b > 2018-05-30 04:28:27,525 DEBUG > [master/ctr-e138-1518143905142-336066-01-000003:20000] > procedure2.ProcedureExecutor: Completed pid=4368, state=SUCCESS; > MergeTableRegionsProcedure table=tabletwo_merge, > regions=[a7dd6606dcacc9daf085fc9fa2aecc0c, 4017a3c778551d4d258c785d455f9c0b], > forcibly=false > {noformat} > {noformat} > 2018-05-30 04:29:20,263 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.AssignmentManager: a7dd6606dcacc9daf085fc9fa2aecc0c > regionState=null; presuming OFFLINE > 2018-05-30 04:29:20,263 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! > rit=OFFLINE, location=null, table=tabletwo_merge, > region=a7dd6606dcacc9daf085fc9fa2aecc0c > 2018-05-30 04:29:20,266 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.AssignmentManager: 4017a3c778551d4d258c785d455f9c0b > regionState=null; presuming OFFLINE > 2018-05-30 04:29:20,266 INFO > [master/ctr-e138-1518143905142-336066-01-000003:20000] > assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! > rit=OFFLINE, location=null, table=tabletwo_merge, > region=4017a3c778551d4d258c785d455f9c0b > {noformat} > Eventually, the RS reports in its online regions, and the master tells it to > kill itself: > {noformat} > 2018-05-30 04:29:24,272 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=20000] > assignment.AssignmentManager: Killing > ctr-e138-1518143905142-336066-01-000002.hwx.site,16020,1527654546619: Not > online: tabletwo_merge,,1527652130538.a7dd6606dcacc9daf085fc9fa2aecc0c. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)