Re: [PR] fix: Fix the timeline compaction blocked caused by the archived file being too large [hudi]

via GitHub Mon, 05 Jan 2026 19:41:31 -0800


TheR1sing3un commented on code in PR #17784:
URL: https://github.com/apache/hudi/pull/17784#discussion_r2663482063



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/LSMTimelineWriter.java:
##########
@@ -382,16 +380,32 @@ private HoodieLSMTimelineManifest.LSMFileEntry 
getFileEntry(String fileName) thr
   private List<String> 
getCandidateFiles(List<HoodieLSMTimelineManifest.LSMFileEntry> files, int 
filesBatch) throws IOException {
     List<String> candidates = new ArrayList<>();
     long totalFileLen = 0L;
-    for (int i = 0; i < filesBatch; i++) {
+    // try to find at most one group of files to compact
+    // 1. files num in the group should be at least 2
+    // 2. files num in the group should not exceed the batch size
+    // 3. the group's total file size should not exceed the threshold
+    // 4. all files in the group should be consecutive in instant order
+    for (int i = 0; i < files.size(); i++) {
       HoodieLSMTimelineManifest.LSMFileEntry fileEntry = files.get(i);
-      if (totalFileLen > MAX_FILE_SIZE_IN_BYTES) {
-        return candidates;
-      }
       // we may also need to consider a single file that is very close to the 
threshold in size,
       // to avoid the write amplification,
       // for e.g, two 800MB files compact into a 1.6GB file.
       totalFileLen += fileEntry.getFileLen();
       candidates.add(fileEntry.getFileName());
+      if (candidates.size() >= filesBatch) {
+        // stop once we reach the batch size
+        break;
+      }
+      if (totalFileLen > writeConfig.getTimelineArchivedFileMaxSize()) {

Review Comment:
   > usually the file size in one layer should be almost the same, and it is 
almost impossible that one file size exceeds the 
`writeConfig.getTimelineArchivedFileMaxSize`, if it is, then the other files in 
the layer should also be very large now.
   
   We encountered a problem in the production environment.
   Our archval is asynchronous and performs once a day.
   However, on a certain day, a table performs many commits, resulting in a 
very large number of instants that needed to be archived.
   Therefore, when archve was triggered on that day, a very large L0 file was 
generated.
   Since this large file was created, the timeline of this table can no longer 
be compact in any way.
   Not only does a large amount of commits result in L0 large files, but also 
some particularly large transaction metadata or plans (compaction, clean) can 
cause a certain L0 file to be very large, thereby blocking the subsequent 
timeline compaction



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: Fix the timeline compaction blocked caused by the archived file being too large [hudi]

Reply via email to