[ https://issues.apache.org/jira/browse/RATIS-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872974#comment-17872974 ]
Tsz-wo Sze commented on RATIS-2137: ----------------------------------- What happened after 09:03:13,715 ? Did the problem keep repeating? > Leader fails to send correct index to follower after timeout exception > ---------------------------------------------------------------------- > > Key: RATIS-2137 > URL: https://issues.apache.org/jira/browse/RATIS-2137 > Project: Ratis > Issue Type: Bug > Affects Versions: 2.5.1 > Reporter: Kevin Liu > Priority: Major > > I found that after the following log, the follower became unavailable > 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 1@group-47BEDE733167: Failed appendEntries as the first entry (index > 34795876) already exists (snapshotIndex: 34670809, commitIndex: 34795893) > 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 1@group-47BEDE733167: inconsistency entries. > Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1 > 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 1@group-47BEDE733167: Failed appendEntries as the first entry (index > 34795875) already exists (snapshotIndex: 34670809, commitIndex: 34795893) > 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 1@group-47BEDE733167: inconsistency entries. > Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1 > Here is what I found in the leader's log > 24/08/11 09:03:10,130 WARN > [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: > 3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries > (retry=1): org.apache.ratis.protocol.exceptions.TimeoutIOException > 24/08/11 09:03:13,714 INFO > [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: > 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally > 34795876 -> 34795875 > 24/08/11 09:03:13,715 INFO > [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: > 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally > 34795875 -> 34795874 > I guess when the leader called appendEntries for the first time, the follower > had been executed successfully, but the leader did not receive the follower's > response and times out. When it resent, it found that it could not match the > change and started to decreaseNextIndex. > Sometimes it can be fixed automatically by rolling segment log, but not > always. > 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 2@group-47BEDE733167: Failed appendEntries as the first entry (index > 35049128) already exists (snapshotIndex: 35070063, commitIndex: 35259248) > 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 2@group-47BEDE733167: inconsistency entries. > Reply:3<-2#3583083:FAIL-t59,INCONSISTENCY,nextIndex=35259249,followerCommit=35259248,matchIndex=-1 > 24/08/12 11:47:55,132 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: > 2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment > log-35250411_35261308 to index:35261308 -- This message was sent by Atlassian Jira (v8.20.10#820010)