sreejasahithi commented on code in PR #10145:
URL: https://github.com/apache/ozone/pull/10145#discussion_r3159853381
##########
hadoop-ozone/iceberg/src/main/java/org/apache/hadoop/ozone/iceberg/RewriteTablePathOzoneAction.java:
##########
@@ -251,4 +271,109 @@ private Set<Pair<String, String>>
rewriteVersionFile(TableMetadata metadata, Str
return result;
}
+
+ private Set<String> manifestsToRewrite(Set<Snapshot> deltaSnapshots,
TableMetadata startMetadata) {
+ Table endStaticTable =
RewriteTablePathOzoneUtils.newStaticTable(endVersionName, table.io());
+
+ final Set<Long> deltaSnapshotIds;
+ if (startMetadata != null) {
Review Comment:
When startMetadata is not provided, deltaSnapshots will be equal to the full
set of snapshots collected across all version files during the version file
rewrite phase. When startMetadata is provided, deltaSnapshots will contain only
those snapshots that are not tracked by the start version i.e. snapshots that
appeared in intermediate version files between start and end, minus the
snapshots already present in the start version's metadata.
Because deltaSnapshots is built by reading each intermediate version file's
JSON as it was re-written at that point in time, it can include snapshots that
were subsequently expired.
So we don't use deltaSnapshots for iterating and instead iterate through the
snapshots collected from the endVersion metadata file because we won't be able
to read the manifest list associated with the expired snapshots.
In `manifestsToRewrite` we use deltaSnapshots only to avoid including
manifest files that were already rewritten in a previous incremental run. The
snapshot_id field on each manifest entry identifies the snapshot that
originally created it. By filtering to only those whose snapshot_id falls
within deltaSnapshotIds, we select only manifests that are new since the start
version and exclude those that were inherited from before it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]