wchevreuil commented on a change in pull request #2255: URL: https://github.com/apache/hbase/pull/2255#discussion_r475670615
########## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java ########## @@ -587,12 +617,25 @@ private void initialize() { @Override public void startup() { - // mark we are running now - this.sourceRunning = true; - initThread = new Thread(this::initialize); - Threads.setDaemonThreadRunning(initThread, - Thread.currentThread().getName() + ".replicationSource," + this.queueId, - this::uncaughtException); + //Flag that signalizes uncaught error happening while starting up the source + // and a retry should be attempted + AtomicBoolean retryStartup = new AtomicBoolean(false); + retryStartup.set(true); + do { + if(retryStartup.get()) { + retryStartup.set(false); + // mark we are running now + this.sourceRunning = true; + initThread = new Thread(this::initialize); + Threads.setDaemonThreadRunning(initThread, + Thread.currentThread().getName() + ".replicationSource," + this.queueId, + (t,e) -> { + sourceRunning = false; + uncaughtException(t, e, null, null); + retryStartup.set(true); + }); + } + } while (!this.sourceRunning); Review comment: I had a second thought on this here, we can't simply re-use this boolean, because in case of failure, we risk reach this point before the exception handler has updated it to false. I'm bringing back the original _startupOngoing_ in the next commit, ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org