Xinyu Tan created IOTDB-5835:
--------------------------------

             Summary: Fix wal accumulation caused by datanode restart
                 Key: IOTDB-5835
                 URL: https://issues.apache.org/jira/browse/IOTDB-5835
             Project: Apache IoTDB
          Issue Type: Improvement
            Reporter: Xinyu Tan
            Assignee: Xinyu Tan
         Attachments: image-2023-04-28-11-08-43-542.png, 
image-2023-04-28-11-08-51-622.png, image-2023-04-28-11-08-57-549.png, 
image-2023-04-28-11-09-03-902.png

When cluster is running properly, if replica A of a consensus group becomes the 
Leader, it continuously sends logs to other followers and updates wal's 
safelyDeletedSearchIndex after sending logs. wal files is deleted 
asynchronously. Therefore, if a restart occurs, some logs that have been 
synchronized to other nodes may not be deleted. After the restart, perhaps 
another replica B becomes the Leader and the current replica A becomes a 
Follower receiving logs.
Because the current IoTConsensus does not use its recovered syncIndex to set 
the safelyDeletedSearchIndex of the underlying walnode at startup, replica A 
cannot delete wal files at this time, which results in the accumulation of WAL 
files. Write requests of all regions on the node are affected.
 !image-2023-04-28-11-08-43-542.png|thumbnail! 
 !image-2023-04-28-11-08-51-622.png|thumbnail! 
 !image-2023-04-28-11-08-57-549.png|thumbnail!  
!image-2023-04-28-11-09-03-902.png|thumbnail! 
The solution to this problem is to update the safelyDeletedSearchIndex of 
reader at startup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to