Szabolcs Bukros created HBASE-23601: ---------------------------------------
Summary: OutputSink.WriterThread exception gets stuck and repeated indefinietly Key: HBASE-23601 URL: https://issues.apache.org/jira/browse/HBASE-23601 Project: HBase Issue Type: Bug Components: read replicas Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros Fix For: 2.2.2 When a WriterThread runs into an exception (ie: NotServingRegionException), the exception is stored in the controller. It is never removed and can not be overwritten either. {code:java} public void run() { try { doRun(); } catch (Throwable t) { LOG.error("Exiting thread", t); controller.writerThreadError(t); } }{code} Thanks to this every time PipelineController.checkForErrors() is called the same old exception is rethrown. For example in RegionReplicaReplicationEndpoint.replicate there is a while loop that does the actual replicating. Every time it loops, it calls checkForErrors(), catches the rethrown exception, logs it but does nothing about it. This results in ~2GB log files in ~5min in my experience. My proposal would be to clean up the stored exception when it reaches RegionReplicaReplicationEndpoint.replicate and make sure we restart the WriterThread that died throwing it. -- This message was sent by Atlassian Jira (v8.3.4#803005)