singhpk234 commented on code in PR #15975:
URL: https://github.com/apache/iceberg/pull/15975#discussion_r3132499146
##########
core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java:
##########
@@ -367,7 +369,8 @@ public static RewriteResult<DataFile> rewriteDataManifest(
* Rewrite a delete manifest, replacing path references.
*
* @param manifestFile source delete manifest to rewrite
- * @param snapshotIds snapshot ids for filtering returned delete manifest
entries
+ * @param snapshotIds snapshot ids for filtering returned delete manifest
entries. When null, all
+ * live entries are included in the copy plan regardless of their
snapshot id.
Review Comment:
is null being reused to keep the public api as it is ? i wonder if there is
a better way to do it, just have a signature which say include all ? default
false
##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java:
##########
@@ -310,8 +310,12 @@ private Result rebuildMetadata() {
// rebuild manifest files
Set<ManifestFile> metaFiles = rewriteManifestListResult.toRewrite();
+ // For full copy (no start version), pass null so all live entries are
included
+ // in the copy plan. Current snapshot IDs may not cover entries whose
original
+ // adding snapshot has been expired, but those files can still be live.
+ Set<Snapshot> snapshotFilter = startMetadata == null ? null :
deltaSnapshots;
Review Comment:
filter sounds like it should return true or false right ? but its not the
case do you wanna wrap this to Predicate ?
##########
core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java:
##########
@@ -429,10 +432,7 @@ private static RewriteResult<DataFile> writeDataFileEntry(
DataFile newDataFile =
DataFiles.builder(spec).copy(entry.file()).withPath(targetDataFilePath).build();
appendEntryWithFile(entry, writer, newDataFile);
- // keep the following entries in metadata but exclude them from copyPlan
- // 1) deleted data files
- // 2) entries not changed by snapshotIds
Review Comment:
why did we remove this comment : this is still valid for incremental copies
right ? where snapshotIds is not null ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]