yunlou11 commented on issue #5724: URL: https://github.com/apache/paimon/issues/5724#issuecomment-2975678264
Write "sno: 8 (-D)" to Kafka: ```json { "before": { "sno": 8, "name": "dyl5", "address": "hefei", "email": "1...@qq.com" }, "after": null, "op":"d" } ``` MergeTreeCompactManager.triggerCompaction() ```java boolean dropDelete = unit.outputLevel() != 0 && (unit.outputLevel() >= levels.nonEmptyHighestLevel() || dvMaintainer != null); ``` When "deletion-vectors.enabled" is true and "unit.outputLevel() " is not zero, "dvMaintainer" is always not null. So "dropDelete" is always True. ChangelogMergeTreeRewriter.rewriteOrProduceChangelog(): ```java while (iterator.hasNext()) { ChangelogResult result = iterator.next(); KeyValue keyValue = result.result(); if (compactFileWriter != null && keyValue != null && (!dropDelete || keyValue.isAdd())) { compactFileWriter.write(keyValue); } ...... ``` When "compactFileWriter" is not null and dropDelete is true, code "compactFileWriter.write(keyValue)" can not be executed. So "compactFileWriter.result()" is empty. ```java List<DataFileMeta> before = extractFilesFromSections(sections); List<DataFileMeta> after = compactFileWriter != null ? compactFileWriter.result() : before.stream() .map(x -> x.upgrade(outputLevel)) .collect(Collectors.toList()); ``` So the 'after' variable is empty, which ultimately leads to the parameter "List runs" of "UniversalCompaction.forcePickL0" (equals levels.numberOfLevels() of MergeTreeCompactManager object) being unable to contain records of "sno: 8 (- D)". Therefore, as mentioned earlier, "data-c5ad2524-733a-4405-a54e-78838925f501-2.parquet (sno: 8+I)" is still being referenced, resulting in a new record with the exact same "sno: 8 (+I)" being unable to generate a "Changelog". If the result of 'DropDelete' meets expectations, then when determining whether the newly added data is the same as the old data, an additional check is needed. The data has already been deleted in 'dvMaintainer', not just compared to 'after' in 'levels' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org