[ 
https://issues.apache.org/jira/browse/RATIS-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liu updated RATIS-2137:
-----------------------------
    Description: 
I found that after the following log, the follower became unavailable

24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795876) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795875) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1

Here is what I found in the leader's log

24/08/11 09:03:10,130 WARN 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: 
3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries (retry=1): 
org.apache.ratis.protocol.exceptions.TimeoutIOException
24/08/11 09:03:13,714 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795876 -> 34795875
24/08/11 09:03:13,715 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795875 -> 34795874

I guess when the leader called appendEntries for the first time, the follower 
had been executed successfully, but the leader did not receive the follower's 
response and times out. When it resent, it found that it could not match the 
change and started to decreaseNextIndex.

Sometimes it can be fixed automatically by rolling segment log, but not always.
24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
2@group-47BEDE733167: Failed appendEntries as the first entry (index 35049128) 
already exists (snapshotIndex: 35070063, commitIndex: 35259248)
24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
2@group-47BEDE733167: inconsistency entries. 
Reply:3<-2#3583083:FAIL-t59,INCONSISTENCY,nextIndex=35259249,followerCommit=35259248,matchIndex=-1
24/08/12 11:47:55,132 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment 
log-35250411_35261308 to index:35261308

  was:
I found that after the following log, the follower became unavailable

24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795876) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795875) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1

Here is what I found in the leader's log

24/08/11 09:03:10,130 WARN 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: 
3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries (retry=1): 
org.apache.ratis.protocol.exceptions.TimeoutIOException
24/08/11 09:03:13,714 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795876 -> 34795875
24/08/11 09:03:13,715 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795875 -> 34795874

I guess when the leader called appendEntries for the first time, the follower 
had been executed successfully, but the leader did not receive the follower's 
response and times out. When it resent, it found that it could not match the 
change and started to decreaseNextIndex.


> Leader fails to send correct index to follower after timeout exception
> ----------------------------------------------------------------------
>
>                 Key: RATIS-2137
>                 URL: https://issues.apache.org/jira/browse/RATIS-2137
>             Project: Ratis
>          Issue Type: Bug
>    Affects Versions: 2.5.1
>            Reporter: Kevin Liu
>            Priority: Major
>
> I found that after the following log, the follower became unavailable
> 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: Failed appendEntries as the first entry (index 
> 34795876) already exists (snapshotIndex: 34670809, commitIndex: 34795893)
> 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: inconsistency entries. 
> Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
> 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: Failed appendEntries as the first entry (index 
> 34795875) already exists (snapshotIndex: 34670809, commitIndex: 34795893)
> 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: inconsistency entries. 
> Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
> Here is what I found in the leader's log
> 24/08/11 09:03:10,130 WARN 
> [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: 
> 3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries 
> (retry=1): org.apache.ratis.protocol.exceptions.TimeoutIOException
> 24/08/11 09:03:13,714 INFO 
> [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
> 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
> 34795876 -> 34795875
> 24/08/11 09:03:13,715 INFO 
> [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
> 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
> 34795875 -> 34795874
> I guess when the leader called appendEntries for the first time, the follower 
> had been executed successfully, but the leader did not receive the follower's 
> response and times out. When it resent, it found that it could not match the 
> change and started to decreaseNextIndex.
> Sometimes it can be fixed automatically by rolling segment log, but not 
> always.
> 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 2@group-47BEDE733167: Failed appendEntries as the first entry (index 
> 35049128) already exists (snapshotIndex: 35070063, commitIndex: 35259248)
> 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 2@group-47BEDE733167: inconsistency entries. 
> Reply:3<-2#3583083:FAIL-t59,INCONSISTENCY,nextIndex=35259249,followerCommit=35259248,matchIndex=-1
> 24/08/12 11:47:55,132 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
> 2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment 
> log-35250411_35261308 to index:35261308



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to