Roman Puchkovskiy created IGNITE-18495:
------------------------------------------

             Summary: Fix RAFT snapshot installation hang due to response swap 
on retry
                 Key: IGNITE-18495
                 URL: https://issues.apache.org/jira/browse/IGNITE-18495
             Project: Ignite
          Issue Type: Bug
            Reporter: Roman Puchkovskiy
            Assignee: Roman Puchkovskiy
             Fix For: 3.0.0-beta2


The scenario follows:
 # InstallSnapshot request is sent, its processing starts hanging forever (it 
will be cancelled on step 3)
 # After a timeout, second InstallSnapshot request is sent with same index+term 
as the first had; in JRaft, it causes a special handling (previous request 
processing is NOT cancelled)
 # After a timeout, third InstallSnapshot request is sent with DIFFERENT index, 
so it cancels the first snapshot processing effectively unblocking the first 
thread

In the original JRaft implementation, after being unblocked, the first thread 
fails to clean up, so subsequent retries will always see a phantom of an 
unfinished snapshot, so the snapshotting process will be jammed. Also, node 
stop might stuck because one 'download' task will remain unfinished forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to