[ https://issues.apache.org/jira/browse/RATIS-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz-wo Sze updated RATIS-2146: ------------------------------ Component/s: server > Fixed possible issues caused by concurrent deletion and election when member > changes > ------------------------------------------------------------------------------------ > > Key: RATIS-2146 > URL: https://issues.apache.org/jira/browse/RATIS-2146 > Project: Ratis > Issue Type: Improvement > Components: server > Reporter: Xinyu Tan > Assignee: Xinyu Tan > Priority: Major > Attachments: image-2024-08-28-14-53-23-259.png, > image-2024-08-28-14-53-27-637.png > > Time Spent: 1h > Remaining Estimate: 0h > > During this process, we encountered some concurrency issues: > * After the member change is complete, node D will no longer be a member of > this consensus group. It will attempt to initiate an election but receive a > NOT_IN_CONF response, after which it will close itself. > * During the removal of member D, it will also close itself first, and then > proceed to delete the file directory. > These two CLOSE operations may occur concurrently, which could result in the > directory being deleted while the StateMachineUpdater thread has not yet > closed, ultimately leading to unexpected errors. > !image-2024-08-28-14-53-23-259.png! > !image-2024-08-28-14-53-27-637.png! > I believe there are two possible solutions for this issue: > * Add concurrency control to the close function, such as adding the > synchronized keyword to the function. > * Add some checks before deleting the directory to ensure that the callback > functions in the close process have already been executed before the > directory is deleted. > What's your opinion? [~szetszwo] -- This message was sent by Atlassian Jira (v8.20.10#820010)