[ https://issues.apache.org/jira/browse/RATIS-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Liu updated RATIS-2137: ----------------------------- Description: I found that after the following log, the follower became unavailable 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795876) already exists (snapshotIndex: 34670809, commitIndex: 34795893) 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 1@group-47BEDE733167: inconsistency entries. Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795875) already exists (snapshotIndex: 34670809, commitIndex: 34795893) 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 1@group-47BEDE733167: inconsistency entries. Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1 Here is what I found in the leader's log 24/08/11 09:03:10,130 WARN [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: 3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries (retry=1): org.apache.ratis.protocol.exceptions.TimeoutIOException 24/08/11 09:03:13,714 INFO [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 34795876 -> 34795875 24/08/11 09:03:13,715 INFO [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 34795875 -> 34795874 I guess when the leader called appendEntries for the first time, the follower had been executed successfully, but the leader did not receive the follower's response and times out. When it resent, it found that it could not match the change and started to decreaseNextIndex. Sometimes it can be fixed automatically by rolling segment log, but not always. 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 2@group-47BEDE733167: Failed appendEntries as the first entry (index 35049128) already exists (snapshotIndex: 35070063, commitIndex: 35259248) 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 2@group-47BEDE733167: inconsistency entries. Reply:3<-2#3583083:FAIL-t59,INCONSISTENCY,nextIndex=35259249,followerCommit=35259248,matchIndex=-1 24/08/12 11:47:55,132 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment log-35250411_35261308 to index:35261308 was: I found that after the following log, the follower became unavailable 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795876) already exists (snapshotIndex: 34670809, commitIndex: 34795893) 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 1@group-47BEDE733167: inconsistency entries. Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795875) already exists (snapshotIndex: 34670809, commitIndex: 34795893) 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 1@group-47BEDE733167: inconsistency entries. Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1 Here is what I found in the leader's log 24/08/11 09:03:10,130 WARN [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: 3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries (retry=1): org.apache.ratis.protocol.exceptions.TimeoutIOException 24/08/11 09:03:13,714 INFO [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 34795876 -> 34795875 24/08/11 09:03:13,715 INFO [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 34795875 -> 34795874 I guess when the leader called appendEntries for the first time, the follower had been executed successfully, but the leader did not receive the follower's response and times out. When it resent, it found that it could not match the change and started to decreaseNextIndex. > Leader fails to send correct index to follower after timeout exception > ---------------------------------------------------------------------- > > Key: RATIS-2137 > URL: https://issues.apache.org/jira/browse/RATIS-2137 > Project: Ratis > Issue Type: Bug > Affects Versions: 2.5.1 > Reporter: Kevin Liu > Priority: Major > > I found that after the following log, the follower became unavailable > 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 1@group-47BEDE733167: Failed appendEntries as the first entry (index > 34795876) already exists (snapshotIndex: 34670809, commitIndex: 34795893) > 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 1@group-47BEDE733167: inconsistency entries. > Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1 > 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 1@group-47BEDE733167: Failed appendEntries as the first entry (index > 34795875) already exists (snapshotIndex: 34670809, commitIndex: 34795893) > 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 1@group-47BEDE733167: inconsistency entries. > Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1 > Here is what I found in the leader's log > 24/08/11 09:03:10,130 WARN > [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: > 3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries > (retry=1): org.apache.ratis.protocol.exceptions.TimeoutIOException > 24/08/11 09:03:13,714 INFO > [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: > 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally > 34795876 -> 34795875 > 24/08/11 09:03:13,715 INFO > [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: > 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally > 34795875 -> 34795874 > I guess when the leader called appendEntries for the first time, the follower > had been executed successfully, but the leader did not receive the follower's > response and times out. When it resent, it found that it could not match the > change and started to decreaseNextIndex. > Sometimes it can be fixed automatically by rolling segment log, but not > always. > 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 2@group-47BEDE733167: Failed appendEntries as the first entry (index > 35049128) already exists (snapshotIndex: 35070063, commitIndex: 35259248) > 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: > 2@group-47BEDE733167: inconsistency entries. > Reply:3<-2#3583083:FAIL-t59,INCONSISTENCY,nextIndex=35259249,followerCommit=35259248,matchIndex=-1 > 24/08/12 11:47:55,132 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: > 2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment > log-35250411_35261308 to index:35261308 -- This message was sent by Atlassian Jira (v8.20.10#820010)