[jira] [Commented] (HBASE-20671) Merged region brought back to life causing RS to be killed by Master

Tak Lon (Stephen) Wu (JIRA) Sat, 18 Aug 2018 01:46:16 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-20671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584709#comment-16584709
 ]


Tak Lon (Stephen) Wu commented on HBASE-20671:
----------------------------------------------

hi guys, I am not 100% sure yet but I recently worked on using 
{{hbase.readonly}} to be true on hbase-2.1.0 for a read replica cluster that 
the {{hbase:namespace}} cannot be assigned (infinite loop when 
{{isTableAssigned}} is checking for {{hbase:namespace}} table but return false) 
during the read replica cluster startup.

I found the patch of HBASE-20702 has skipped `empty` rows but seems like rows 
for system table(s) e.g. {{hbase:namespace}} should not be considered as empty. 
I made my band-aid change below and the cluster resumed to be started. 
{noformat}
  private void loadMeta() throws IOException {
    // TODO: use a thread pool
    regionStateStore.visitMeta(new RegionStateStore.RegionStateVisitor() {
      @Override
      public void visitRegionState(Result result, final RegionInfo regionInfo, 
final State state,
          final ServerName regionLocation, final ServerName lastHost, final 
long openSeqNum) {
        if (!regionInfo.getTable().equals(TableName.NAMESPACE_TABLE_NAME)) { // 
<-- added to unblock the read replica cluster
          if (state == null && regionLocation == null && lastHost == null
              && openSeqNum == SequenceId.NO_SEQUENCE_ID) {
            // This is a row with nothing in it.
            LOG.warn("Skipping empty row={}", result);
            return;
          }
        }
{noformat}

so, do you guys think I should fix it in other place?

> Merged region brought back to life causing RS to be killed by Master
> --------------------------------------------------------------------
>
>                 Key: HBASE-20671
>                 URL: https://issues.apache.org/jira/browse/HBASE-20671
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>    Affects Versions: 2.0.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>         Attachments: 0001-Test-for-HBASE-20671.patch, 
> hbase-hbase-master-ctr-e138-1518143905142-336066-01-000003.hwx.site.log.zip, 
> hbase-hbase-regionserver-ctr-e138-1518143905142-336066-01-000002.hwx.site.log.zip,
>  workaround.txt
>
>
> Another bug coming out of a master restart and replay of the pv2 logs.
> The master merged two regions into one successfully, was restarted, but then 
> ended up assigning the children region back out to the cluster. There is a 
> log message which appears to indicate that RegionStates acknowledges that it 
> doesn't know what this region is as it's replaying the pv2 WAL; however, it 
> incorrectly assumes that the region is just OFFLINE and needs to be assigned.
> {noformat}
> 2018-05-30 04:26:00,055 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000] master.HMaster: 
> Client=hrt_qa//172.27.85.11 Merge regions a7dd6606dcacc9daf085fc9fa2aecc0c 
> and 4017a3c778551d4d258c785d455f9c0b
> 2018-05-30 04:28:27,525 DEBUG 
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> procedure2.ProcedureExecutor: Completed pid=4368, state=SUCCESS; 
> MergeTableRegionsProcedure table=tabletwo_merge, 
> regions=[a7dd6606dcacc9daf085fc9fa2aecc0c, 4017a3c778551d4d258c785d455f9c0b], 
> forcibly=false
> {noformat}
> {noformat}
> 2018-05-30 04:29:20,263 INFO  
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> assignment.AssignmentManager: a7dd6606dcacc9daf085fc9fa2aecc0c 
> regionState=null; presuming OFFLINE
> 2018-05-30 04:29:20,263 INFO  
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! 
> rit=OFFLINE, location=null, table=tabletwo_merge, 
> region=a7dd6606dcacc9daf085fc9fa2aecc0c
> 2018-05-30 04:29:20,266 INFO  
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> assignment.AssignmentManager: 4017a3c778551d4d258c785d455f9c0b 
> regionState=null; presuming OFFLINE
> 2018-05-30 04:29:20,266 INFO  
> [master/ctr-e138-1518143905142-336066-01-000003:20000] 
> assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! 
> rit=OFFLINE, location=null, table=tabletwo_merge, 
> region=4017a3c778551d4d258c785d455f9c0b
> {noformat}
> Eventually, the RS reports in its online regions, and the master tells it to 
> kill itself:
> {noformat}
> 2018-05-30 04:29:24,272 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=20000] 
> assignment.AssignmentManager: Killing 
> ctr-e138-1518143905142-336066-01-000002.hwx.site,16020,1527654546619: Not 
> online: tabletwo_merge,,1527652130538.a7dd6606dcacc9daf085fc9fa2aecc0c.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20671) Merged region brought back to life causing RS to be killed by Master

Reply via email to