Szabolcs Bukros created HBASE-23601:
---------------------------------------

             Summary: OutputSink.WriterThread exception gets stuck and repeated 
indefinietly
                 Key: HBASE-23601
                 URL: https://issues.apache.org/jira/browse/HBASE-23601
             Project: HBase
          Issue Type: Bug
          Components: read replicas
            Reporter: Szabolcs Bukros
            Assignee: Szabolcs Bukros
             Fix For: 2.2.2


When a WriterThread runs into an exception (ie: NotServingRegionException), the 
exception is stored in the controller. It is never removed and can not be 
overwritten either.

 
{code:java}
public void run()  {
  try {
    doRun();
  } catch (Throwable t) {
    LOG.error("Exiting thread", t);
    controller.writerThreadError(t);
  }
}{code}
Thanks to this every time PipelineController.checkForErrors() is called the 
same old exception is rethrown.

 

For example in RegionReplicaReplicationEndpoint.replicate there is a while loop 
that does the actual replicating. Every time it loops, it calls 
checkForErrors(), catches the rethrown exception, logs it but does nothing 
about it. This results in ~2GB log files in ~5min in my experience.

 

My proposal would be to clean up the stored exception when it reaches 
RegionReplicaReplicationEndpoint.replicate and make sure we restart the 
WriterThread that died throwing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to