sreejasahithi commented on code in PR #10145:
URL: https://github.com/apache/ozone/pull/10145#discussion_r3159853381


##########
hadoop-ozone/iceberg/src/main/java/org/apache/hadoop/ozone/iceberg/RewriteTablePathOzoneAction.java:
##########
@@ -251,4 +271,109 @@ private Set<Pair<String, String>> 
rewriteVersionFile(TableMetadata metadata, Str
 
     return result;
   }
+
+  private Set<String> manifestsToRewrite(Set<Snapshot> deltaSnapshots, 
TableMetadata startMetadata) {
+    Table endStaticTable = 
RewriteTablePathOzoneUtils.newStaticTable(endVersionName, table.io());
+
+    final Set<Long> deltaSnapshotIds;
+    if (startMetadata != null) {

Review Comment:
   When startMetadata is not provided, deltaSnapshots will be equal to the full 
set of snapshots collected across all version files during the version file 
rewrite phase. When startMetadata is provided, deltaSnapshots will contain only 
those snapshots that are not tracked by the start version i.e. snapshots that 
appeared in intermediate version files between start and end, minus the 
snapshots already present in the start version's metadata.
   Because deltaSnapshots is built by reading each intermediate version file's 
JSON as it was re-written at that point in time, it can include snapshots that 
were subsequently expired.
   
   So we don't use deltaSnapshots for iterating and instead iterate through the 
snapshots collected from the endVersion metadata file because we won't be able 
to read the manifest list associated with the expired snapshots. 
   In `manifestsToRewrite` we use deltaSnapshots only to avoid including 
manifest files that were already rewritten in a previous incremental run. The 
snapshot_id field on each manifest entry identifies the snapshot that 
originally created it. By filtering to only those whose snapshot_id falls 
within deltaSnapshotIds, we select only manifests that are new since the start 
version and exclude those that were inherited from before it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to