[ https://issues.apache.org/jira/browse/HBASE-23601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012320#comment-17012320 ]
Hudson commented on HBASE-23601: -------------------------------- Results for branch branch-2.1 [build #1767 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1767/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1767//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1767//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1767//JDK8_Nightly_Build_Report_(Hadoop3)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} -- Something went wrong with this stage, [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1767//console]. > OutputSink.WriterThread exception gets stuck and repeated indefinietly > ---------------------------------------------------------------------- > > Key: HBASE-23601 > URL: https://issues.apache.org/jira/browse/HBASE-23601 > Project: HBase > Issue Type: Bug > Components: read replicas > Affects Versions: 2.2.2 > Reporter: Szabolcs Bukros > Assignee: Szabolcs Bukros > Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.9, 2.2.4 > > > When a WriterThread runs into an exception (ie: NotServingRegionException), > the exception is stored in the controller. It is never removed and can not be > overwritten either. > > {code:java} > public void run() { > try { > doRun(); > } catch (Throwable t) { > LOG.error("Exiting thread", t); > controller.writerThreadError(t); > } > }{code} > Thanks to this every time PipelineController.checkForErrors() is called the > same old exception is rethrown. > > For example in RegionReplicaReplicationEndpoint.replicate there is a while > loop that does the actual replicating. Every time it loops, it calls > checkForErrors(), catches the rethrown exception, logs it but does nothing > about it. This results in ~2GB log files in ~5min in my experience. > > My proposal would be to clean up the stored exception when it reaches > RegionReplicaReplicationEndpoint.replicate and make sure we restart the > WriterThread that died throwing it. -- This message was sent by Atlassian Jira (v8.3.4#803005)