[ 
https://issues.apache.org/jira/browse/RATIS-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liu updated RATIS-2137:
-----------------------------
    Description: 
*I found that after the following log, the follower became unavailable. The 
follower received incorrect entries repeatedly for about 10min, then got 
installSnapshot failed and started to election. After two hours, it succeed to 
install snapshot, but failed to updateLastAppliedTermIndex. After that, it 
repeated 'receive installSnapshot and installSnapshot failed' for several hours 
until I restarted the server.*

24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795876) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795875) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
*....(repeat 'Failed appendEntries')*
24/08/11 09:15:41,827 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34465382) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:15:41,827 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2892557:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
24/08/11 09:15:42,230 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:bbe49073-5dad-4499-9051-58a0e53b0658,0
24/08/11 09:15:42,231 ERROR [nioEventLoopGroup-3-3] 
SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
java.lang.IllegalStateException: 1@group-47BEDE733167 log's commit index is 
34795893, last included index in snapshot is 34670057
24/08/11 09:15:42,233 WARN [nioEventLoopGroup-3-3] DefaultChannelPipeline: An 
exceptionCaught() event was fired, and it reached at the tail of the pipeline. 
It usually means the last handler in the pipeline did not handle the exception.
java.lang.IllegalStateException: 1@group-47BEDE733167 log's commit index is 
34795893, last included index in snapshot is 34670057
24/08/11 09:15:50,105 INFO [1@group-47BEDE733167-FollowerState] FollowerState: 
1@group-47BEDE733167-FollowerState: change to CANDIDATE, 
lastRpcElapsedTime:7874610911ns, electionTimeout:3353ms
24/08/11 09:15:50,105 INFO [1@group-47BEDE733167-FollowerState] RoleInfo: 1: 
shutdown 1@group-47BEDE733167-FollowerState
24/08/11 09:15:50,105 INFO [1@group-47BEDE733167-FollowerState] 
RaftServer$Division: 1@group-47BEDE733167: changes role from  FOLLOWER to 
CANDIDATE at term 59 for changeToCandidate
24/08/11 09:15:50,106 INFO [1@group-47BEDE733167-FollowerState] 
RaftServerConfigKeys: raft.server.leaderelection.pre-vote = true (default)
24/08/11 09:15:50,106 INFO [1@group-47BEDE733167-FollowerState] RoleInfo: 1: 
start 1@group-47BEDE733167-LeaderElection5
24/08/11 09:15:50,107 INFO [1@group-47BEDE733167-LeaderElection5] 
RaftServer$Division: 1@group-47BEDE733167: change Leader from 3 to null at term 
59 for PRE_VOTE
24/08/11 09:15:50,107 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection: 1@group-47BEDE733167-LeaderElection5 PRE_VOTE round 0: submit 
vote requests at term 59 for 34233595: 
peers:[1|rpc:xxx:9862|admin:|client:xxx:9087|dataStream:|priority:0|startupRole:FOLLOWER,
 
2|rpc:xxx:9862|admin:|client:xxx:9087|dataStream:|priority:0|startupRole:FOLLOWER,
 
3|rpc:xxx:9862|admin:|client:xxx:9087|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
 old=null
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection: 1@group-47BEDE733167-LeaderElection5: PRE_VOTE REJECTED 
received 2 response(s) and 0 exception(s):
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection:   Response 0: 1<-2#0:FAIL-t59
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection:   Response 1: 1<-3#0:FAIL-t59
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection: 1@group-47BEDE733167-LeaderElection5 PRE_VOTE round 0: result 
REJECTED
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
RaftServer$Division: 1@group-47BEDE733167: changes role from CANDIDATE to 
FOLLOWER at term 59 for REJECTED
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] RoleInfo: 1: 
shutdown 1@group-47BEDE733167-LeaderElection5
24/08/11 09:15:50,111 INFO [1@group-47BEDE733167-LeaderElection5] RoleInfo: 1: 
start 1@group-47BEDE733167-FollowerState
24/08/11 09:15:52,075 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:802b8106-076b-4ffc-8cf1-1ace04952c7a,0
24/08/11 09:15:52,075 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: change Leader from null to 3 at term 59 for 
installSnapshot, leader elected after 1967ms
24/08/11 09:15:52,075 ERROR [nioEventLoopGroup-3-3] 
SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
java.lang.IllegalStateException: 1@group-47BEDE733167 log's commit index is 
34795893, last included index in snapshot is 34670057
*(repeat 'election, fail, install snapshot fail' for more than 2 hours)*
*(still election fail, but this time install snapshot successfully, but failed 
to updateLastAppliedTermIndex)*
24/08/11 11:52:38,168 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:dcc73f7c-c17d-44e5-96b6-f6f1570e68a7,0
24/08/11 11:52:38,168 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: change Leader from null to 3 at term 59 for 
installSnapshot, leader elected after 3610ms
24/08/11 11:52:39,434 INFO [nioEventLoopGroup-3-3] SnapshotManager: Installing 
snapshot:3->1#0-t59,chunk:dcc73f7c-c17d-44e5-96b6-f6f1570e68a7,0, to tmp 
dir:/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/snapshot-dcc73f7c-c17d-44e5-96b6-f6f1570e68a7
24/08/11 11:52:39,455 INFO [nioEventLoopGroup-3-3] SnapshotManager: Installed 
snapshot, renaming temporary dir 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/snapshot-dcc73f7c-c17d-44e5-96b6-f6f1570e68a7
 to /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm
24/08/11 11:52:40,829 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Reinitializing state machine.
24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] RaftLog: 
1@group-47BEDE733167-SegmentedRaftLog: snapshotIndex: updateIncreasingly 
33870802 -> 34870063
24/08/11 11:52:40,829 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: obsolete snapshot provided: (t:59, i:34670809)
24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
1@group-47BEDE733167-SegmentedRaftLogWorker: flushIndex: setUnconditionally 
34795893 -> 34870063
24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
1@group-47BEDE733167-SegmentedRaftLogWorker: safeCacheEvictIndex: 
setUnconditionally 34783743 -> 34870063
24/08/11 11:52:40,829 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachineUpdater: 1@group-47BEDE733167-StateMachineUpdater: appliedIndex: 
setUnconditionally 34795893 -> 34670809
24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
1@group-47BEDE733167-SegmentedRaftLogWorker: Closing segment 
log_inprogress_34783744 to index: 34795893
24/08/11 11:52:40,837 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: successfully install the entire snapshot-34870063
24/08/11 11:52:40,877 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: reply installSnapshot: 
3<-1#0:OK-t59,SUCCESS,requestIndex=0
24/08/11 11:52:40,902 INFO [1@group-47BEDE733167-SegmentedRaftLogWorker] 
SegmentedRaftLogWorker: 1@group-47BEDE733167-SegmentedRaftLogWorker: Rolled log 
segment from 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/current/log_inprogress_34783744
 to 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/current/log_34783744-34795893
24/08/11 11:52:41,160 ERROR [1@group-47BEDE733167-StateMachineUpdater] 
StateMachineUpdater: 1@group-47BEDE733167-StateMachineUpdater caught a 
Throwable.
java.lang.IllegalStateException: 1: Failed updateLastAppliedTermIndex: newTI = 
(t:59, i:34670810) < oldTI = (t:59, i:34795893)
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
RaftServer$Division: 1@group-47BEDE733167: shutdown
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
JmxRegister: Successfully un-registered JMX Bean with object name 
Ratis:service=RaftServer,group=group-47BEDE733167,id=1
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] RoleInfo: 
1: shutdown 1@group-47BEDE733167-FollowerState
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-FollowerState] FollowerState: 
1@group-47BEDE733167-FollowerState was interrupted
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
RaftServer$Division: 1@group-47BEDE733167: closes. applyIndex: 34670810
24/08/11 11:52:42,340 INFO [1@group-47BEDE733167-StateMachineUpdater] 
SegmentedRaftLogWorker: 1@group-47BEDE733167-SegmentedRaftLogWorker close()
24/08/11 11:52:48,170 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:7179265b-d007-4835-9b5f-fa9e4d853b72,0
24/08/11 11:52:48,170 ERROR [nioEventLoopGroup-3-3] 
SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
1@group-47BEDE733167 is not in [STARTING, RUNNING]: current state is CLOSED
24/08/11 11:52:58,171 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:7f56f35f-20f3-4b42-bdb6-c2d52db646fc,0
24/08/11 11:52:58,171 ERROR [nioEventLoopGroup-3-3] 
SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
1@group-47BEDE733167 is not in [STARTING, RUNNING]: current state is CLOSED
*(repeat 'receive installSnapshot and installSnapshot failed' for serval hours 
until I restarted the server)*

*Here is what I found in the leader's log*

24/08/11 09:03:10,130 WARN 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: 
3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries (retry=1): 
org.apache.ratis.protocol.exceptions.TimeoutIOException
24/08/11 09:03:13,714 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795876 -> 34795875
24/08/11 09:03:13,715 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795875 -> 34795874

*I guess when the leader called appendEntries for the first time, the follower 
had been executed successfully, but the leader did not receive the follower's 
response and times out. When it resent, it found that it could not match the 
change and started to decreaseNextIndex.*

*Sometimes it can be fixed automatically by rolling segment log, but not 
always.*
24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
2@group-47BEDE733167: Failed appendEntries as the first entry (index 35049128) 
already exists (snapshotIndex: 35070063, commitIndex: 35259248)
24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
2@group-47BEDE733167: inconsistency entries. 
Reply:3<-2#3583083:FAIL-t59,INCONSISTENCY,nextIndex=35259249,followerCommit=35259248,matchIndex=-1
24/08/12 11:47:55,132 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment 
log-35250411_35261308 to index:35261308
24/08/12 11:54:46,402 INFO [2@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Taking a snapshot to file 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/raft_snapshot_1723434886330_4407392949522239377.dat.
24/08/12 11:54:46,403 INFO [2@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Saving digest 272e2c87a8e62785588ab383c0b8c290 for snapshot file 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.59_35270066.
24/08/12 11:54:46,458 INFO [2@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Renaming a snapshot file 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/raft_snapshot_1723434886330_4407392949522239377.dat
 to 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.59_35270066.
24/08/12 11:54:46,487 INFO [2@group-47BEDE733167-StateMachineUpdater] 
SimpleStateMachineStorage: Deleting old snapshot at 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.59_34670063
24/08/12 11:54:46,530 INFO [2@group-47BEDE733167-StateMachineUpdater] 
StateMachineUpdater: 2@group-47BEDE733167-StateMachineUpdater: Took a snapshot 
at index 35270066
24/08/12 11:54:46,530 INFO [2@group-47BEDE733167-StateMachineUpdater] 
StateMachineUpdater: 2@group-47BEDE733167-StateMachineUpdater: snapshotIndex: 
updateIncreasingly 35070063 -> 35270066
24/08/12 11:57:16,592 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment 
log-35261309_35271592 to index:35271592

  was:
*I found that after the following log, the follower became unavailable. The 
follower received incorrect entries repeatedly for about 10min, then got 
installSnapshot failed and started to election. After two hours, it succeed to 
install snapshot, but failed to updateLastAppliedTermIndex. After that, it 
repeated 'receive installSnapshot and installSnapshot failed' for several hours 
until I restarted the server.*

24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795876) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795875) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
*....(repeat 'Failed appendEntries')*
24/08/11 09:15:41,827 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34465382) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:15:41,827 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2892557:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
24/08/11 09:15:42,230 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:bbe49073-5dad-4499-9051-58a0e53b0658,0
24/08/11 09:15:42,231 ERROR [nioEventLoopGroup-3-3] 
SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
java.lang.IllegalStateException: 1@group-47BEDE733167 log's commit index is 
34795893, last included index in snapshot is 34670057
24/08/11 09:15:42,233 WARN [nioEventLoopGroup-3-3] DefaultChannelPipeline: An 
exceptionCaught() event was fired, and it reached at the tail of the pipeline. 
It usually means the last handler in the pipeline did not handle the exception.
java.lang.IllegalStateException: 1@group-47BEDE733167 log's commit index is 
34795893, last included index in snapshot is 34670057
24/08/11 09:15:50,105 INFO [1@group-47BEDE733167-FollowerState] FollowerState: 
1@group-47BEDE733167-FollowerState: change to CANDIDATE, 
lastRpcElapsedTime:7874610911ns, electionTimeout:3353ms
24/08/11 09:15:50,105 INFO [1@group-47BEDE733167-FollowerState] RoleInfo: 1: 
shutdown 1@group-47BEDE733167-FollowerState
24/08/11 09:15:50,105 INFO [1@group-47BEDE733167-FollowerState] 
RaftServer$Division: 1@group-47BEDE733167: changes role from  FOLLOWER to 
CANDIDATE at term 59 for changeToCandidate
24/08/11 09:15:50,106 INFO [1@group-47BEDE733167-FollowerState] 
RaftServerConfigKeys: raft.server.leaderelection.pre-vote = true (default)
24/08/11 09:15:50,106 INFO [1@group-47BEDE733167-FollowerState] RoleInfo: 1: 
start 1@group-47BEDE733167-LeaderElection5
24/08/11 09:15:50,107 INFO [1@group-47BEDE733167-LeaderElection5] 
RaftServer$Division: 1@group-47BEDE733167: change Leader from 3 to null at term 
59 for PRE_VOTE
24/08/11 09:15:50,107 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection: 1@group-47BEDE733167-LeaderElection5 PRE_VOTE round 0: submit 
vote requests at term 59 for 34233595: 
peers:[1|rpc:xxx:9862|admin:|client:xxx:9087|dataStream:|priority:0|startupRole:FOLLOWER,
 
2|rpc:xxx:9862|admin:|client:xxx:9087|dataStream:|priority:0|startupRole:FOLLOWER,
 
3|rpc:xxx:9862|admin:|client:xxx:9087|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
 old=null
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection: 1@group-47BEDE733167-LeaderElection5: PRE_VOTE REJECTED 
received 2 response(s) and 0 exception(s):
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection:   Response 0: 1<-2#0:FAIL-t59
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection:   Response 1: 1<-3#0:FAIL-t59
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
LeaderElection: 1@group-47BEDE733167-LeaderElection5 PRE_VOTE round 0: result 
REJECTED
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
RaftServer$Division: 1@group-47BEDE733167: changes role from CANDIDATE to 
FOLLOWER at term 59 for REJECTED
24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] RoleInfo: 1: 
shutdown 1@group-47BEDE733167-LeaderElection5
24/08/11 09:15:50,111 INFO [1@group-47BEDE733167-LeaderElection5] RoleInfo: 1: 
start 1@group-47BEDE733167-FollowerState
24/08/11 09:15:52,075 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:802b8106-076b-4ffc-8cf1-1ace04952c7a,0
24/08/11 09:15:52,075 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: change Leader from null to 3 at term 59 for 
installSnapshot, leader elected after 1967ms
24/08/11 09:15:52,075 ERROR [nioEventLoopGroup-3-3] 
SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
java.lang.IllegalStateException: 1@group-47BEDE733167 log's commit index is 
34795893, last included index in snapshot is 34670057
*(repeat 'election, fail, install snapshot fail' for more than 2 hours)*
*(still election fail, but this time install snapshot successfully, but failed 
to updateLastAppliedTermIndex)*
24/08/11 11:52:38,168 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:dcc73f7c-c17d-44e5-96b6-f6f1570e68a7,0
24/08/11 11:52:38,168 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: change Leader from null to 3 at term 59 for 
installSnapshot, leader elected after 3610ms
24/08/11 11:52:39,434 INFO [nioEventLoopGroup-3-3] SnapshotManager: Installing 
snapshot:3->1#0-t59,chunk:dcc73f7c-c17d-44e5-96b6-f6f1570e68a7,0, to tmp 
dir:/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/snapshot-dcc73f7c-c17d-44e5-96b6-f6f1570e68a7
24/08/11 11:52:39,455 INFO [nioEventLoopGroup-3-3] SnapshotManager: Installed 
snapshot, renaming temporary dir 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/snapshot-dcc73f7c-c17d-44e5-96b6-f6f1570e68a7
 to /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm
24/08/11 11:52:40,829 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Reinitializing state machine.
24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] RaftLog: 
1@group-47BEDE733167-SegmentedRaftLog: snapshotIndex: updateIncreasingly 
33870802 -> 34870063
24/08/11 11:52:40,829 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: obsolete snapshot provided: (t:59, i:34670809)
24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
1@group-47BEDE733167-SegmentedRaftLogWorker: flushIndex: setUnconditionally 
34795893 -> 34870063
24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
1@group-47BEDE733167-SegmentedRaftLogWorker: safeCacheEvictIndex: 
setUnconditionally 34783743 -> 34870063
24/08/11 11:52:40,829 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachineUpdater: 1@group-47BEDE733167-StateMachineUpdater: appliedIndex: 
setUnconditionally 34795893 -> 34670809
24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
1@group-47BEDE733167-SegmentedRaftLogWorker: Closing segment 
log_inprogress_34783744 to index: 34795893
24/08/11 11:52:40,837 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: successfully install the entire snapshot-34870063
24/08/11 11:52:40,877 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: reply installSnapshot: 
3<-1#0:OK-t59,SUCCESS,requestIndex=0
24/08/11 11:52:40,902 INFO [1@group-47BEDE733167-SegmentedRaftLogWorker] 
SegmentedRaftLogWorker: 1@group-47BEDE733167-SegmentedRaftLogWorker: Rolled log 
segment from 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/current/log_inprogress_34783744
 to 
/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/current/log_34783744-34795893
24/08/11 11:52:41,160 ERROR [1@group-47BEDE733167-StateMachineUpdater] 
StateMachineUpdater: 1@group-47BEDE733167-StateMachineUpdater caught a 
Throwable.
java.lang.IllegalStateException: 1: Failed updateLastAppliedTermIndex: newTI = 
(t:59, i:34670810) < oldTI = (t:59, i:34795893)
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
RaftServer$Division: 1@group-47BEDE733167: shutdown
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
JmxRegister: Successfully un-registered JMX Bean with object name 
Ratis:service=RaftServer,group=group-47BEDE733167,id=1
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] RoleInfo: 
1: shutdown 1@group-47BEDE733167-FollowerState
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-FollowerState] FollowerState: 
1@group-47BEDE733167-FollowerState was interrupted
24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
RaftServer$Division: 1@group-47BEDE733167: closes. applyIndex: 34670810
24/08/11 11:52:42,340 INFO [1@group-47BEDE733167-StateMachineUpdater] 
SegmentedRaftLogWorker: 1@group-47BEDE733167-SegmentedRaftLogWorker close()
24/08/11 11:52:48,170 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:7179265b-d007-4835-9b5f-fa9e4d853b72,0
24/08/11 11:52:48,170 ERROR [nioEventLoopGroup-3-3] 
SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
1@group-47BEDE733167 is not in [STARTING, RUNNING]: current state is CLOSED
24/08/11 11:52:58,171 INFO [nioEventLoopGroup-3-3] SnapshotInstallationHandler: 
1@group-47BEDE733167: receive installSnapshot: 
3->1#0-t59,chunk:7f56f35f-20f3-4b42-bdb6-c2d52db646fc,0
24/08/11 11:52:58,171 ERROR [nioEventLoopGroup-3-3] 
SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
1@group-47BEDE733167 is not in [STARTING, RUNNING]: current state is CLOSED
*(repeat 'receive installSnapshot and installSnapshot failed' for serval hours 
until I restarted the server)*

*Here is what I found in the leader's log*

24/08/11 09:03:10,130 WARN 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: 
3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries (retry=1): 
org.apache.ratis.protocol.exceptions.TimeoutIOException
24/08/11 09:03:13,714 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795876 -> 34795875
24/08/11 09:03:13,715 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795875 -> 34795874

*I guess when the leader called appendEntries for the first time, the follower 
had been executed successfully, but the leader did not receive the follower's 
response and times out. When it resent, it found that it could not match the 
change and started to decreaseNextIndex.*

*Sometimes it can be fixed automatically by rolling segment log, but not 
always.*
24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
2@group-47BEDE733167: Failed appendEntries as the first entry (index 35049128) 
already exists (snapshotIndex: 35070063, commitIndex: 35259248)
24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
2@group-47BEDE733167: inconsistency entries. 
Reply:3<-2#3583083:FAIL-t59,INCONSISTENCY,nextIndex=35259249,followerCommit=35259248,matchIndex=-1
24/08/12 11:47:55,132 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment 
log-35250411_35261308 to index:35261308


> Leader fails to send correct index to follower after timeout exception
> ----------------------------------------------------------------------
>
>                 Key: RATIS-2137
>                 URL: https://issues.apache.org/jira/browse/RATIS-2137
>             Project: Ratis
>          Issue Type: Bug
>    Affects Versions: 2.5.1
>            Reporter: Kevin Liu
>            Priority: Major
>         Attachments: image-2024-08-13-11-28-16-250.png
>
>
> *I found that after the following log, the follower became unavailable. The 
> follower received incorrect entries repeatedly for about 10min, then got 
> installSnapshot failed and started to election. After two hours, it succeed 
> to install snapshot, but failed to updateLastAppliedTermIndex. After that, it 
> repeated 'receive installSnapshot and installSnapshot failed' for several 
> hours until I restarted the server.*
> 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: Failed appendEntries as the first entry (index 
> 34795876) already exists (snapshotIndex: 34670809, commitIndex: 34795893)
> 24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: inconsistency entries. 
> Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
> 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: Failed appendEntries as the first entry (index 
> 34795875) already exists (snapshotIndex: 34670809, commitIndex: 34795893)
> 24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: inconsistency entries. 
> Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
> *....(repeat 'Failed appendEntries')*
> 24/08/11 09:15:41,827 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: Failed appendEntries as the first entry (index 
> 34465382) already exists (snapshotIndex: 34670809, commitIndex: 34795893)
> 24/08/11 09:15:41,827 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: inconsistency entries. 
> Reply:3<-1#2892557:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
> 24/08/11 09:15:42,230 INFO [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: receive installSnapshot: 
> 3->1#0-t59,chunk:bbe49073-5dad-4499-9051-58a0e53b0658,0
> 24/08/11 09:15:42,231 ERROR [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
> java.lang.IllegalStateException: 1@group-47BEDE733167 log's commit index is 
> 34795893, last included index in snapshot is 34670057
> 24/08/11 09:15:42,233 WARN [nioEventLoopGroup-3-3] DefaultChannelPipeline: An 
> exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> java.lang.IllegalStateException: 1@group-47BEDE733167 log's commit index is 
> 34795893, last included index in snapshot is 34670057
> 24/08/11 09:15:50,105 INFO [1@group-47BEDE733167-FollowerState] 
> FollowerState: 1@group-47BEDE733167-FollowerState: change to CANDIDATE, 
> lastRpcElapsedTime:7874610911ns, electionTimeout:3353ms
> 24/08/11 09:15:50,105 INFO [1@group-47BEDE733167-FollowerState] RoleInfo: 1: 
> shutdown 1@group-47BEDE733167-FollowerState
> 24/08/11 09:15:50,105 INFO [1@group-47BEDE733167-FollowerState] 
> RaftServer$Division: 1@group-47BEDE733167: changes role from  FOLLOWER to 
> CANDIDATE at term 59 for changeToCandidate
> 24/08/11 09:15:50,106 INFO [1@group-47BEDE733167-FollowerState] 
> RaftServerConfigKeys: raft.server.leaderelection.pre-vote = true (default)
> 24/08/11 09:15:50,106 INFO [1@group-47BEDE733167-FollowerState] RoleInfo: 1: 
> start 1@group-47BEDE733167-LeaderElection5
> 24/08/11 09:15:50,107 INFO [1@group-47BEDE733167-LeaderElection5] 
> RaftServer$Division: 1@group-47BEDE733167: change Leader from 3 to null at 
> term 59 for PRE_VOTE
> 24/08/11 09:15:50,107 INFO [1@group-47BEDE733167-LeaderElection5] 
> LeaderElection: 1@group-47BEDE733167-LeaderElection5 PRE_VOTE round 0: submit 
> vote requests at term 59 for 34233595: 
> peers:[1|rpc:xxx:9862|admin:|client:xxx:9087|dataStream:|priority:0|startupRole:FOLLOWER,
>  
> 2|rpc:xxx:9862|admin:|client:xxx:9087|dataStream:|priority:0|startupRole:FOLLOWER,
>  
> 3|rpc:xxx:9862|admin:|client:xxx:9087|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
>  old=null
> 24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
> LeaderElection: 1@group-47BEDE733167-LeaderElection5: PRE_VOTE REJECTED 
> received 2 response(s) and 0 exception(s):
> 24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
> LeaderElection:   Response 0: 1<-2#0:FAIL-t59
> 24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
> LeaderElection:   Response 1: 1<-3#0:FAIL-t59
> 24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
> LeaderElection: 1@group-47BEDE733167-LeaderElection5 PRE_VOTE round 0: result 
> REJECTED
> 24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] 
> RaftServer$Division: 1@group-47BEDE733167: changes role from CANDIDATE to 
> FOLLOWER at term 59 for REJECTED
> 24/08/11 09:15:50,110 INFO [1@group-47BEDE733167-LeaderElection5] RoleInfo: 
> 1: shutdown 1@group-47BEDE733167-LeaderElection5
> 24/08/11 09:15:50,111 INFO [1@group-47BEDE733167-LeaderElection5] RoleInfo: 
> 1: start 1@group-47BEDE733167-FollowerState
> 24/08/11 09:15:52,075 INFO [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: receive installSnapshot: 
> 3->1#0-t59,chunk:802b8106-076b-4ffc-8cf1-1ace04952c7a,0
> 24/08/11 09:15:52,075 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: change Leader from null to 3 at term 59 for 
> installSnapshot, leader elected after 1967ms
> 24/08/11 09:15:52,075 ERROR [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
> java.lang.IllegalStateException: 1@group-47BEDE733167 log's commit index is 
> 34795893, last included index in snapshot is 34670057
> *(repeat 'election, fail, install snapshot fail' for more than 2 hours)*
> *(still election fail, but this time install snapshot successfully, but 
> failed to updateLastAppliedTermIndex)*
> 24/08/11 11:52:38,168 INFO [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: receive installSnapshot: 
> 3->1#0-t59,chunk:dcc73f7c-c17d-44e5-96b6-f6f1570e68a7,0
> 24/08/11 11:52:38,168 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 1@group-47BEDE733167: change Leader from null to 3 at term 59 for 
> installSnapshot, leader elected after 3610ms
> 24/08/11 11:52:39,434 INFO [nioEventLoopGroup-3-3] SnapshotManager: 
> Installing snapshot:3->1#0-t59,chunk:dcc73f7c-c17d-44e5-96b6-f6f1570e68a7,0, 
> to tmp 
> dir:/data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/snapshot-dcc73f7c-c17d-44e5-96b6-f6f1570e68a7
> 24/08/11 11:52:39,455 INFO [nioEventLoopGroup-3-3] SnapshotManager: Installed 
> snapshot, renaming temporary dir 
> /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/snapshot-dcc73f7c-c17d-44e5-96b6-f6f1570e68a7
>  to /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm
> 24/08/11 11:52:40,829 INFO [1@group-47BEDE733167-StateMachineUpdater] 
> StateMachine: Reinitializing state machine.
> 24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] RaftLog: 
> 1@group-47BEDE733167-SegmentedRaftLog: snapshotIndex: updateIncreasingly 
> 33870802 -> 34870063
> 24/08/11 11:52:40,829 INFO [1@group-47BEDE733167-StateMachineUpdater] 
> StateMachine: obsolete snapshot provided: (t:59, i:34670809)
> 24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
> 1@group-47BEDE733167-SegmentedRaftLogWorker: flushIndex: setUnconditionally 
> 34795893 -> 34870063
> 24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
> 1@group-47BEDE733167-SegmentedRaftLogWorker: safeCacheEvictIndex: 
> setUnconditionally 34783743 -> 34870063
> 24/08/11 11:52:40,829 INFO [1@group-47BEDE733167-StateMachineUpdater] 
> StateMachineUpdater: 1@group-47BEDE733167-StateMachineUpdater: appliedIndex: 
> setUnconditionally 34795893 -> 34670809
> 24/08/11 11:52:40,829 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
> 1@group-47BEDE733167-SegmentedRaftLogWorker: Closing segment 
> log_inprogress_34783744 to index: 34795893
> 24/08/11 11:52:40,837 INFO [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: successfully install the 
> entire snapshot-34870063
> 24/08/11 11:52:40,877 INFO [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: reply installSnapshot: 
> 3<-1#0:OK-t59,SUCCESS,requestIndex=0
> 24/08/11 11:52:40,902 INFO [1@group-47BEDE733167-SegmentedRaftLogWorker] 
> SegmentedRaftLogWorker: 1@group-47BEDE733167-SegmentedRaftLogWorker: Rolled 
> log segment from 
> /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/current/log_inprogress_34783744
>  to 
> /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/current/log_34783744-34795893
> 24/08/11 11:52:41,160 ERROR [1@group-47BEDE733167-StateMachineUpdater] 
> StateMachineUpdater: 1@group-47BEDE733167-StateMachineUpdater caught a 
> Throwable.
> java.lang.IllegalStateException: 1: Failed updateLastAppliedTermIndex: newTI 
> = (t:59, i:34670810) < oldTI = (t:59, i:34795893)
> 24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
> RaftServer$Division: 1@group-47BEDE733167: shutdown
> 24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
> JmxRegister: Successfully un-registered JMX Bean with object name 
> Ratis:service=RaftServer,group=group-47BEDE733167,id=1
> 24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
> RoleInfo: 1: shutdown 1@group-47BEDE733167-FollowerState
> 24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-FollowerState] 
> FollowerState: 1@group-47BEDE733167-FollowerState was interrupted
> 24/08/11 11:52:41,161 INFO [1@group-47BEDE733167-StateMachineUpdater] 
> RaftServer$Division: 1@group-47BEDE733167: closes. applyIndex: 34670810
> 24/08/11 11:52:42,340 INFO [1@group-47BEDE733167-StateMachineUpdater] 
> SegmentedRaftLogWorker: 1@group-47BEDE733167-SegmentedRaftLogWorker close()
> 24/08/11 11:52:48,170 INFO [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: receive installSnapshot: 
> 3->1#0-t59,chunk:7179265b-d007-4835-9b5f-fa9e4d853b72,0
> 24/08/11 11:52:48,170 ERROR [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
> org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
> 1@group-47BEDE733167 is not in [STARTING, RUNNING]: current state is CLOSED
> 24/08/11 11:52:58,171 INFO [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: receive installSnapshot: 
> 3->1#0-t59,chunk:7f56f35f-20f3-4b42-bdb6-c2d52db646fc,0
> 24/08/11 11:52:58,171 ERROR [nioEventLoopGroup-3-3] 
> SnapshotInstallationHandler: 1@group-47BEDE733167: installSnapshot failed
> org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
> 1@group-47BEDE733167 is not in [STARTING, RUNNING]: current state is CLOSED
> *(repeat 'receive installSnapshot and installSnapshot failed' for serval 
> hours until I restarted the server)*
> *Here is what I found in the leader's log*
> 24/08/11 09:03:10,130 WARN 
> [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: 
> 3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries 
> (retry=1): org.apache.ratis.protocol.exceptions.TimeoutIOException
> 24/08/11 09:03:13,714 INFO 
> [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
> 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
> 34795876 -> 34795875
> 24/08/11 09:03:13,715 INFO 
> [3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
> 3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
> 34795875 -> 34795874
> *I guess when the leader called appendEntries for the first time, the 
> follower had been executed successfully, but the leader did not receive the 
> follower's response and times out. When it resent, it found that it could not 
> match the change and started to decreaseNextIndex.*
> *Sometimes it can be fixed automatically by rolling segment log, but not 
> always.*
> 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 2@group-47BEDE733167: Failed appendEntries as the first entry (index 
> 35049128) already exists (snapshotIndex: 35070063, commitIndex: 35259248)
> 24/08/12 11:47:53,351 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
> 2@group-47BEDE733167: inconsistency entries. 
> Reply:3<-2#3583083:FAIL-t59,INCONSISTENCY,nextIndex=35259249,followerCommit=35259248,matchIndex=-1
> 24/08/12 11:47:55,132 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
> 2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment 
> log-35250411_35261308 to index:35261308
> 24/08/12 11:54:46,402 INFO [2@group-47BEDE733167-StateMachineUpdater] 
> StateMachine: Taking a snapshot to file 
> /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/raft_snapshot_1723434886330_4407392949522239377.dat.
> 24/08/12 11:54:46,403 INFO [2@group-47BEDE733167-StateMachineUpdater] 
> StateMachine: Saving digest 272e2c87a8e62785588ab383c0b8c290 for snapshot 
> file 
> /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.59_35270066.
> 24/08/12 11:54:46,458 INFO [2@group-47BEDE733167-StateMachineUpdater] 
> StateMachine: Renaming a snapshot file 
> /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/raft_snapshot_1723434886330_4407392949522239377.dat
>  to 
> /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.59_35270066.
> 24/08/12 11:54:46,487 INFO [2@group-47BEDE733167-StateMachineUpdater] 
> SimpleStateMachineStorage: Deleting old snapshot at 
> /data0/celeborn/ha_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.59_34670063
> 24/08/12 11:54:46,530 INFO [2@group-47BEDE733167-StateMachineUpdater] 
> StateMachineUpdater: 2@group-47BEDE733167-StateMachineUpdater: Took a 
> snapshot at index 35270066
> 24/08/12 11:54:46,530 INFO [2@group-47BEDE733167-StateMachineUpdater] 
> StateMachineUpdater: 2@group-47BEDE733167-StateMachineUpdater: snapshotIndex: 
> updateIncreasingly 35070063 -> 35270066
> 24/08/12 11:57:16,592 INFO [nioEventLoopGroup-3-3] SegmentedRaftLogWorker: 
> 2@group-47BEDE733167-SegmentedRaftLogWorker: Rolling segment 
> log-35261309_35271592 to index:35271592



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to