yunlou11 commented on issue #5724:
URL: https://github.com/apache/paimon/issues/5724#issuecomment-2975678264

   Write "sno: 8 (-D)" to Kafka:
   ```json
   {
         "before": { "sno": 8, "name": "dyl5", "address": "hefei", "email": 
"1...@qq.com" },
         "after": null,
         "op":"d"
   }
   ```
   
   MergeTreeCompactManager.triggerCompaction()  
   ```java
   
   boolean dropDelete =
        unit.outputLevel() != 0
            && (unit.outputLevel() >= levels.nonEmptyHighestLevel()
             || dvMaintainer != null);
   
   ```
   When "deletion-vectors.enabled" is true and "unit.outputLevel() " is  not 
zero, "dvMaintainer"  is always not null.  So "dropDelete" is always True.
   
   ChangelogMergeTreeRewriter.rewriteOrProduceChangelog():
   ```java
    while (iterator.hasNext()) {
           ChangelogResult result = iterator.next();
           KeyValue keyValue = result.result();
           if (compactFileWriter != null
                   && keyValue != null
                   && (!dropDelete || keyValue.isAdd())) {
               compactFileWriter.write(keyValue);
           }
           ......
   ```
    When "compactFileWriter" is not null and dropDelete  is true,  code 
"compactFileWriter.write(keyValue)" can not be executed.
   So "compactFileWriter.result()" is empty.
   ```java
   List<DataFileMeta> before = extractFilesFromSections(sections);
   List<DataFileMeta> after =
           compactFileWriter != null
                   ? compactFileWriter.result()
                   : before.stream()
                           .map(x -> x.upgrade(outputLevel))
                           .collect(Collectors.toList());
   ```
   So the 'after' variable is empty, which ultimately leads to the parameter 
"List runs" of "UniversalCompaction.forcePickL0" (equals 
levels.numberOfLevels() of MergeTreeCompactManager object)  being unable to 
contain records of "sno: 8 (- D)".
   Therefore, as mentioned earlier, 
"data-c5ad2524-733a-4405-a54e-78838925f501-2.parquet (sno: 8+I)" is still being 
referenced, resulting in a new record with the exact same "sno: 8 (+I)" being 
unable to generate a "Changelog".
   If the result of 'DropDelete' meets expectations, then when determining 
whether the newly added data is the same as the old data, an additional check 
is needed. The data has already been deleted in 'dvMaintainer', not just 
compared to 'after' in 'levels'
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to