danny0405 commented on code in PR #17784:
URL: https://github.com/apache/hudi/pull/17784#discussion_r2663658547


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/LSMTimelineWriter.java:
##########
@@ -382,16 +380,32 @@ private HoodieLSMTimelineManifest.LSMFileEntry 
getFileEntry(String fileName) thr
   private List<String> 
getCandidateFiles(List<HoodieLSMTimelineManifest.LSMFileEntry> files, int 
filesBatch) throws IOException {
     List<String> candidates = new ArrayList<>();
     long totalFileLen = 0L;
-    for (int i = 0; i < filesBatch; i++) {
+    // try to find at most one group of files to compact
+    // 1. files num in the group should be at least 2
+    // 2. files num in the group should not exceed the batch size
+    // 3. the group's total file size should not exceed the threshold
+    // 4. all files in the group should be consecutive in instant order
+    for (int i = 0; i < files.size(); i++) {
       HoodieLSMTimelineManifest.LSMFileEntry fileEntry = files.get(i);
-      if (totalFileLen > MAX_FILE_SIZE_IN_BYTES) {
-        return candidates;
-      }
       // we may also need to consider a single file that is very close to the 
threshold in size,
       // to avoid the write amplification,
       // for e.g, two 800MB files compact into a 1.6GB file.
       totalFileLen += fileEntry.getFileLen();
       candidates.add(fileEntry.getFileName());
+      if (candidates.size() >= filesBatch) {
+        // stop once we reach the batch size
+        break;
+      }
+      if (totalFileLen > writeConfig.getTimelineArchivedFileMaxSize()) {

Review Comment:
   okay, makes sense somehow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to