Xinyu Tan created IOTDB-5835: -------------------------------- Summary: Fix wal accumulation caused by datanode restart Key: IOTDB-5835 URL: https://issues.apache.org/jira/browse/IOTDB-5835 Project: Apache IoTDB Issue Type: Improvement Reporter: Xinyu Tan Assignee: Xinyu Tan Attachments: image-2023-04-28-11-08-43-542.png, image-2023-04-28-11-08-51-622.png, image-2023-04-28-11-08-57-549.png, image-2023-04-28-11-09-03-902.png
When cluster is running properly, if replica A of a consensus group becomes the Leader, it continuously sends logs to other followers and updates wal's safelyDeletedSearchIndex after sending logs. wal files is deleted asynchronously. Therefore, if a restart occurs, some logs that have been synchronized to other nodes may not be deleted. After the restart, perhaps another replica B becomes the Leader and the current replica A becomes a Follower receiving logs. Because the current IoTConsensus does not use its recovered syncIndex to set the safelyDeletedSearchIndex of the underlying walnode at startup, replica A cannot delete wal files at this time, which results in the accumulation of WAL files. Write requests of all regions on the node are affected. !image-2023-04-28-11-08-43-542.png|thumbnail! !image-2023-04-28-11-08-51-622.png|thumbnail! !image-2023-04-28-11-08-57-549.png|thumbnail! !image-2023-04-28-11-09-03-902.png|thumbnail! The solution to this problem is to update the safelyDeletedSearchIndex of reader at startup -- This message was sent by Atlassian Jira (v8.20.10#820010)