singhpk234 commented on code in PR #15975:
URL: https://github.com/apache/iceberg/pull/15975#discussion_r3132499146


##########
core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java:
##########
@@ -367,7 +369,8 @@ public static RewriteResult<DataFile> rewriteDataManifest(
    * Rewrite a delete manifest, replacing path references.
    *
    * @param manifestFile source delete manifest to rewrite
-   * @param snapshotIds snapshot ids for filtering returned delete manifest 
entries
+   * @param snapshotIds snapshot ids for filtering returned delete manifest 
entries. When null, all
+   *     live entries are included in the copy plan regardless of their 
snapshot id.

Review Comment:
   is null being reused to keep the public api as it is ? i wonder if there is 
a better way to do it, just have a signature which say include all ? default 
false 



##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java:
##########
@@ -310,8 +310,12 @@ private Result rebuildMetadata() {
 
     // rebuild manifest files
     Set<ManifestFile> metaFiles = rewriteManifestListResult.toRewrite();
+    // For full copy (no start version), pass null so all live entries are 
included
+    // in the copy plan. Current snapshot IDs may not cover entries whose 
original
+    // adding snapshot has been expired, but those files can still be live.
+    Set<Snapshot> snapshotFilter = startMetadata == null ? null : 
deltaSnapshots;

Review Comment:
   filter sounds like it should return true or false right ? but its not the 
case do you wanna wrap this to Predicate ? 



##########
core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java:
##########
@@ -429,10 +432,7 @@ private static RewriteResult<DataFile> writeDataFileEntry(
     DataFile newDataFile =
         
DataFiles.builder(spec).copy(entry.file()).withPath(targetDataFilePath).build();
     appendEntryWithFile(entry, writer, newDataFile);
-    // keep the following entries in metadata but exclude them from copyPlan
-    // 1) deleted data files
-    // 2) entries not changed by snapshotIds

Review Comment:
   why did we remove this comment : this is still valid for incremental copies 
right ? where snapshotIds is not null ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to