jasonf20 commented on code in PR #10962:
URL: https://github.com/apache/iceberg/pull/10962#discussion_r1746650336
##########
core/src/test/java/org/apache/iceberg/TestRewriteFiles.java:
##########
@@ -384,6 +386,116 @@ public void testRewriteDataAndAssignOldSequenceNumber() {
assertThat(listManifestFiles()).hasSize(4);
}
+ @TestTemplate
+ public void
testRewriteDataAndAssignOldSequenceNumbersShouldNotDropDeleteFiles() {
+ assumeThat(formatVersion)
+ .as("Sequence number is only supported in iceberg format v2 or later")
+ .isGreaterThan(1);
+ assertThat(listManifestFiles()).isEmpty();
+
+ commit(table,
table.newRowDelta().addRows(FILE_A).addDeletes(FILE_A2_DELETES), branch);
+
+ long firstRewriteSequenceNumber = latestSnapshot(table,
branch).sequenceNumber();
+
+ commit(
+ table,
+
table.newRowDelta().addRows(FILE_B).addRows(FILE_B).addDeletes(FILE_B2_DELETES),
+ branch);
+ commit(
+ table,
+
table.newRowDelta().addRows(FILE_B).addRows(FILE_C).addDeletes(FILE_C2_DELETES),
+ branch);
+
+ long secondRewriteSequenceNumber = latestSnapshot(table,
branch).sequenceNumber();
+
+ commit(
+ table,
+ table
+ .newRewrite()
+ .addFile(FILE_D)
+ .deleteFile(FILE_B)
+ .deleteFile(FILE_C)
+ .dataSequenceNumber(secondRewriteSequenceNumber),
+ branch);
+
+ TableMetadata base = readMetadata();
+ Snapshot baseSnap = latestSnapshot(base, branch);
+ long baseSnapshotId = baseSnap.snapshotId();
+
+ Comparator<ManifestFile> sequenceNumberOrdering =
+ new Comparator<>() {
+ @Override
+ public int compare(ManifestFile o1, ManifestFile o2) {
+ return (int) (o1.sequenceNumber() - o2.sequenceNumber());
+ }
+ };
+
+ // FILE_B2_DELETES and FILE_A2_DELETES should not be removed as the
rewrite specifies
+ // `firstRewriteSequenceNumber`
+ // explicitly which is the same as that of A2_DELETES and before B2_DELETES
+
+ // Technically A1_DELETES could be removed since it's an equality delete
and should apply on
Review Comment:
I meant to say FILE_A2_DELETES. The existing snapshot producer check drops
files older than the minimum sequence number, only if their sequence number is
strictly smaller. So it leaves equality deletes in place that have a SN equal
to the SN of the min SN. While equality deletes are only applied on data files
with SNs that are strictly smaller than the delete file's SN. So you can
actually delete equality delete files that are the same sequence number as the
minimum sequence number of the new snapshot.
In an even broader sense FILE_A2_DELETES could even be ignored completely
when doing the initial commit here since there are no data files before it
since this is the first commit. So effectively it doesn't do anything.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]