Xinyu Tan created RATIS-2236:
--------------------------------

             Summary: Fixed bug where manual triggerSnapshot would never finish
                 Key: RATIS-2236
                 URL: https://issues.apache.org/jira/browse/RATIS-2236
             Project: Ratis
          Issue Type: Improvement
            Reporter: Xinyu Tan
            Assignee: Xinyu Tan
         Attachments: image-2025-01-07-19-20-29-805.png, 
image-2025-01-07-19-20-48-233.png, image-2025-01-07-19-21-02-451.png

Currently, there may be some IT failures on IoTDB's CI. By taking a jstack 
dump, it was found that the issue was caused by a deadlock triggered by the 
Ratis snapshot.

!image-2025-01-07-19-21-02-451.png!  

Further investigation of the jstack and the source code of stateMachineUpdater 
revealed that if the stateMachineUpdater is in the waitForCommit function and 
there are no new writes, manually triggering a snapshot at this time will cause 
it to never return. This is because the stateMachineUpdater will remain forever 
in the waitForCommit function.

!image-2025-01-07-19-20-48-233.png! 
!image-2025-01-07-19-20-29-805.png!

The solution is to add an existence check for snapshot requests.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to