shuwenwei opened a new pull request, #15341:
URL: https://github.com/apache/iotdb/pull/15341

   ## Description
   Currently, during the settle phase of file selection, all files in a 
partition are traversed and categorized based on the amount of remaining data. 
Files are classified as either fully dirty or partial dirty. A fully dirty file 
is one from which all data can be deleted, while a partial dirty file contains 
only some deletable data. In the final compaction tasks, fully dirty files are 
expected to be deleted first, followed by the cleanup of partial dirty files 
through inner-space compaction tasks.
   
   A large number of partial dirty files may be selected within a single 
partition. These files are not all submitted in one compaction task. Instead, 
they are split into multiple tasks based on their size and count. The current 
splitting strategy submits all fully dirty files along with the first group of 
partial dirty files as one task, and each subsequent group of partial dirty 
files is submitted as separate tasks.
   
   This leads to a problem, as shown in the diagram: the second task contains 
File 5 and File 7, with a fully dirty File 6 in between. If the fully dirty 
File 6 has not yet been deleted by another task when Task 2 is executed, an 
overlap may occur between File 6 and the target files produced by the 
compaction, resulting in an error.
   <img width="1127" alt="截屏2025-04-14 18 27 50" 
src="https://github.com/user-attachments/assets/c8fb4f9d-eae9-4949-9bcc-c539407010f8";
 />
   
   This PR change the way to submit fully dirty files.
   <img width="1101" alt="截屏2025-04-14 18 32 14" 
src="https://github.com/user-attachments/assets/0b9619a1-2a7e-45b1-b439-67a375ab1a71";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to