[ https://issues.apache.org/jira/browse/HBASE-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guanghao Zhang resolved HBASE-24897. ------------------------------------ Fix Version/s: 2.2.6 Resolution: Fixed > RegionReplicaFlushHandler should handle NoServerForRegionException to avoid > aborting RegionServer > ------------------------------------------------------------------------------------------------- > > Key: HBASE-24897 > URL: https://issues.apache.org/jira/browse/HBASE-24897 > Project: HBase > Issue Type: Bug > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Priority: Major > Fix For: 2.2.6 > > > Debug flaky test TestRegionReplicaReplicationEndpoint, I found the RS aborted > because RegionReplicaFlushHandler flush failed. When create a new table with > region replica, the assign order may be: > # assign 0002 replica region and trigger primary region flush. > # assign 0001 replica region and trigger primary region flush. > # assign primary region. > But the primary region flush may failed because the primary region not opened > now. So it may abort the RS...... > > {code:java} > 2020-08-18 16:56:30,041 INFO > [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] > handler.AssignRegionHandler(141): Opened > testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0002.66e9757a05fbae7623cfea3369fc8354. > 2020-08-18 16:56:30,558 INFO > [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] > handler.AssignRegionHandler(141): Opened > testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0001.22ff45423b0f1f0e93794f673449d140. > 2020-08-18 16:56:31,192 INFO > [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] > handler.AssignRegionHandler(141): Opened > testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463.901f9cd06bbf27ef7c2d70b5af725cd2. > 2020-08-18 16:58:53,857 ERROR > [RS_REGION_REPLICA_FLUSH_OPS-regionserver/hao-OptiPlex-7050:0-0] > helpers.MarkerIgnoringBase(159): ***** ABORTING region server > hao-optiplex-7050,36368,1597740961432: ServerAborting because an exception > was thrown ***** > org.apache.hadoop.hbase.client.NoServerForRegionException: No server address > listed in hbase:meta for region > testRegionReplicaReplicationWithReplicas_10,,1597741128945.0f541dc1a7ca64797c4cf054adb9edfb. > containing row > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:926) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:784) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:140) > at > org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:147) > at > org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:98) > at > org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:84) > at > org.apache.hadoop.hbase.client.FlushRegionCallable.prepare(FlushRegionCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at > org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.triggerFlushInPrimaryRegion(RegionReplicaFlushHandler.java:129) > at > org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.process(RegionReplicaFlushHandler.java:78) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > I thought the fix should be assign primary region firstly when enable region > replica featue. Will check the implmenation of region replica. > -- This message was sent by Atlassian Jira (v8.3.4#803005)