jackye1995 commented on a change in pull request #3454:
URL: https://github.com/apache/iceberg/pull/3454#discussion_r741539664



##########
File path: 
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/actions/TestNewRewriteDataFilesAction.java
##########
@@ -177,6 +187,59 @@ public void testBinPackWithFilter() {
     assertEquals("Rows must match", expectedRecords, actualRecords);
   }
 
+  @Test
+  public void testBinPackWithDeletes() throws Exception {
+    Table table = createTablePartitioned(4, 2);
+    table.updateProperties().set(TableProperties.FORMAT_VERSION, "2").commit();
+    shouldHaveFiles(table, 8);
+    table.refresh();
+
+    CloseableIterable<FileScanTask> tasks = table.newScan().planFiles();
+    List<DataFile> dataFiles = 
Lists.newArrayList(CloseableIterable.transform(tasks, FileScanTask::file));
+    GenericAppenderFactory appenderFactory = new 
GenericAppenderFactory(table.schema(), table.spec(),
+        null, null, null);
+    int total = (int) 
dataFiles.stream().mapToLong(ContentFile::recordCount).sum();
+
+    RowDelta rowDelta = table.newRowDelta();
+    // remove 2 rows for odd files, 1 row for even files
+    for (int i = 0; i < dataFiles.size(); i++) {
+      DataFile dataFile = dataFiles.get(i);
+      EncryptedOutputFile outputFile = EncryptedFiles.encryptedOutput(
+          
table.io().newOutputFile(table.locationProvider().newDataLocation(UUID.randomUUID().toString())),
+          EncryptionKeyMetadata.EMPTY);
+      PositionDeleteWriter<Record> posDeleteWriter = 
appenderFactory.newPosDeleteWriter(
+          outputFile, FileFormat.PARQUET, dataFile.partition());
+      posDeleteWriter.delete(dataFile.path(), 0);
+      posDeleteWriter.close();
+      rowDelta.addDeletes(posDeleteWriter.toDeleteFile());
+
+      if (i % 2 != 0) {
+        outputFile = EncryptedFiles.encryptedOutput(
+            
table.io().newOutputFile(table.locationProvider().newDataLocation(UUID.randomUUID().toString())),
+            EncryptionKeyMetadata.EMPTY);
+        posDeleteWriter = appenderFactory.newPosDeleteWriter(outputFile, 
FileFormat.PARQUET, dataFile.partition());
+        posDeleteWriter.delete(dataFile.path(), 1);
+        posDeleteWriter.close();
+        rowDelta.addDeletes(posDeleteWriter.toDeleteFile());
+      }

Review comment:
       I know there are some repeated code here for generating deletes. So far 
I am still not sure what is the correct boundary to create util methods. I am 
planning to refactor after I add more tests for the `RewriteDeleteStrategy`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to