Szabolcs Bukros created HBASE-23601:
---------------------------------------
Summary: OutputSink.WriterThread exception gets stuck and repeated
indefinietly
Key: HBASE-23601
URL: https://issues.apache.org/jira/browse/HBASE-23601
Project: HBase
Issue Type: Bug
Components: read replicas
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros
Fix For: 2.2.2
When a WriterThread runs into an exception (ie: NotServingRegionException), the
exception is stored in the controller. It is never removed and can not be
overwritten either.
{code:java}
public void run() {
try {
doRun();
} catch (Throwable t) {
LOG.error("Exiting thread", t);
controller.writerThreadError(t);
}
}{code}
Thanks to this every time PipelineController.checkForErrors() is called the
same old exception is rethrown.
For example in RegionReplicaReplicationEndpoint.replicate there is a while loop
that does the actual replicating. Every time it loops, it calls
checkForErrors(), catches the rethrown exception, logs it but does nothing
about it. This results in ~2GB log files in ~5min in my experience.
My proposal would be to clean up the stored exception when it reaches
RegionReplicaReplicationEndpoint.replicate and make sure we restart the
WriterThread that died throwing it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)