Roman Puchkovskiy created IGNITE-18495:
------------------------------------------
Summary: Fix RAFT snapshot installation hang due to response swap
on retry
Key: IGNITE-18495
URL: https://issues.apache.org/jira/browse/IGNITE-18495
Project: Ignite
Issue Type: Bug
Reporter: Roman Puchkovskiy
Assignee: Roman Puchkovskiy
Fix For: 3.0.0-beta2
The scenario follows:
# InstallSnapshot request is sent, its processing starts hanging forever (it
will be cancelled on step 3)
# After a timeout, second InstallSnapshot request is sent with same index+term
as the first had; in JRaft, it causes a special handling (previous request
processing is NOT cancelled)
# After a timeout, third InstallSnapshot request is sent with DIFFERENT index,
so it cancels the first snapshot processing effectively unblocking the first
thread
In the original JRaft implementation, after being unblocked, the first thread
fails to clean up, so subsequent retries will always see a phantom of an
unfinished snapshot, so the snapshotting process will be jammed. Also, node
stop might stuck because one 'download' task will remain unfinished forever.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)