Xinyu Tan created RATIS-2236:
--------------------------------
Summary: Fixed bug where manual triggerSnapshot would never finish
Key: RATIS-2236
URL: https://issues.apache.org/jira/browse/RATIS-2236
Project: Ratis
Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan
Attachments: image-2025-01-07-19-20-29-805.png,
image-2025-01-07-19-20-48-233.png, image-2025-01-07-19-21-02-451.png
Currently, there may be some IT failures on IoTDB's CI. By taking a jstack
dump, it was found that the issue was caused by a deadlock triggered by the
Ratis snapshot.
!image-2025-01-07-19-21-02-451.png!
Further investigation of the jstack and the source code of stateMachineUpdater
revealed that if the stateMachineUpdater is in the waitForCommit function and
there are no new writes, manually triggering a snapshot at this time will cause
it to never return. This is because the stateMachineUpdater will remain forever
in the waitForCommit function.
!image-2025-01-07-19-20-48-233.png!
!image-2025-01-07-19-20-29-805.png!
The solution is to add an existence check for snapshot requests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)