Guanghao Zhang created HBASE-24897:
--------------------------------------

             Summary: Fix flaky test TestRegionReplicaReplicationEndpoint
                 Key: HBASE-24897
                 URL: https://issues.apache.org/jira/browse/HBASE-24897
             Project: HBase
          Issue Type: Bug
            Reporter: Guanghao Zhang


Debug this unti test, I found the RS aborted because RegionReplicaFlushHandler 
flush failed. When create a new table with region replica, the assign order may 
be:
 # assign 0002 replica region and trigger primary region flush.
 # assign 0001 replica region and trigger primary region flush.
 # assign primary region.

But the primary region flush may failed because the primary region not opened 
now. So it may abort the RS......

 
{code:java}
2020-08-18 16:56:30,041 INFO 
[RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] 
handler.AssignRegionHandler(141): Opened 
testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0002.66e9757a05fbae7623cfea3369fc8354.
2020-08-18 16:56:30,558 INFO 
[RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] 
handler.AssignRegionHandler(141): Opened 
testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0001.22ff45423b0f1f0e93794f673449d140.
2020-08-18 16:56:31,192 INFO 
[RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] 
handler.AssignRegionHandler(141): Opened 
testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463.901f9cd06bbf27ef7c2d70b5af725cd2.


2020-08-18 16:58:53,857 ERROR 
[RS_REGION_REPLICA_FLUSH_OPS-regionserver/hao-OptiPlex-7050:0-0] 
helpers.MarkerIgnoringBase(159): ***** ABORTING region server 
hao-optiplex-7050,36368,1597740961432: ServerAborting because an exception was 
thrown *****
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address 
listed in hbase:meta for region 
testRegionReplicaReplicationWithReplicas_10,,1597741128945.0f541dc1a7ca64797c4cf054adb9edfb.
 containing row 
  at 
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:926)
  at 
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:784)
  at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:140)
  at 
org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:147)
  at 
org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:98)
  at 
org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:84)
  at 
org.apache.hadoop.hbase.client.FlushRegionCallable.prepare(FlushRegionCallable.java:62)
  at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
  at 
org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.triggerFlushInPrimaryRegion(RegionReplicaFlushHandler.java:129)
  at 
org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.process(RegionReplicaFlushHandler.java:78)
  at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
{code}
I thought the fix should be assign primary region firstly when enable region 
replica featue. Will check the implmenation of region replica.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to