[ https://issues.apache.org/jira/browse/RATIS-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095225#comment-17095225 ]
runzhiwang edited comment on RATIS-878 at 4/29/20, 8:13 AM: ------------------------------------------------------------ There are 2 options to avoid the infinite restart of LogAppender. 1. If LogAppender try to restart when remove group, we refuse it, else we restart. 2. If restart times do not exceed the limit i.e. checkCanRestart is true, both the 2 options can restart GrpcLogAppender again 3. If exceed the limit i.e. checkCanRestart is false, there are 2 options we can choose: a. close the pipeline, throw a special exception back to client b. leader step down, wait all the GrpcLogAppenders to be dead, and trigger a leader election, and throw a special exception back to client The pseudo code as follow. {code:java} Queue<TimeStamp, LogAppender> queue; restart() { if (removeGroup) { return; } if (checkCanRestart()) { newLogAppender = restartLogAppender(); if (queue.size == 3) { queue.poll(); } queue.add(<now, newLogAppender>); } else { stop restart, and there are two options we can choose. a. close the pipeline, throw a special exception back to client b. leader step down, wait all the GrpcLogAppenders to be dead, and trigger a leader election, throw a special exception back } } boolean checkCanRestart() { // limit restart 3 times if (queue.size() < 3) { return true; } if ((now - queue.peek().TimeStamp > 1 hour) and (queue.peek().LogAppender.isAlive == false)) { return true; } return false; } {code} was (Author: yjxxtd): There are 2 options to avoid the infinite restart of LogAppender. 1. If LogAppender try to restart when remove group, we refuse it, else we restart. 2. If restart times do not exceed the limit i.e. checkCanRestart is true, both the 2 options can restart GrpcLogAppender again 3. If exceed the limit i.e. checkCanRestart is false, there are 2 options we can choose: a. close the pipeline, throw a special exception back to client b. leader step down, wait all the GrpcLogAppenders to be dead, and trigger a leader election, and throw a special exception back to client The pseudo code as follow. {code:java} Queue<TimeStamp, LogAppender> queue; restart() { if (removeGroup) { return; } if (checkCanRestart()) { newLogAppender = restartLogAppender(); if (queue.size == 3) { queue.poll(); } queue.add(<now, newLogAppender>); } else { stop restart, and there are two options we can choose. a. close the pipeline, throw a special exception back to client b. leader step down, wait all the GrpcLogAppenders to be dead, and trigger a leader election, throw a special exception back } } boolean checkCanRestart() { if (queue.size() < 3) { return true; } if ((now - queue.peek().TimeStamp > 1 hour) and (queue.peek().LogAppender.isAlive == false)) { return true; } return false; } {code} > Infinite restart of LogAppender > -------------------------------- > > Key: RATIS-878 > URL: https://issues.apache.org/jira/browse/RATIS-878 > Project: Ratis > Issue Type: Bug > Reporter: runzhiwang > Assignee: runzhiwang > Priority: Major > > The details please refer it here > [RATIS-840|https://issues.apache.org/jira/browse/RATIS-840]. -- This message was sent by Atlassian Jira (v8.3.4#803005)