sreejasahithi commented on code in PR #10145:
URL: https://github.com/apache/ozone/pull/10145#discussion_r3159853381
##########
hadoop-ozone/iceberg/src/main/java/org/apache/hadoop/ozone/iceberg/RewriteTablePathOzoneAction.java:
##########
@@ -251,4 +271,109 @@ private Set<Pair<String, String>>
rewriteVersionFile(TableMetadata metadata, Str
return result;
}
+
+ private Set<String> manifestsToRewrite(Set<Snapshot> deltaSnapshots,
TableMetadata startMetadata) {
+ Table endStaticTable =
RewriteTablePathOzoneUtils.newStaticTable(endVersionName, table.io());
+
+ final Set<Long> deltaSnapshotIds;
+ if (startMetadata != null) {
Review Comment:
When startMetadata is not provided, deltaSnapshots will be equal to the full
set of snapshots collected across all version files during the version file
rewrite phase. When startMetadata is provided, deltaSnapshots will contain only
those snapshots that are not tracked by the start version i.e. snapshots that
appeared in intermediate version files between start and end, minus the
snapshots already present in the start version's metadata.
Because deltaSnapshots is built by reading each intermediate version file's
JSON as it was written at that point in time, it can include snapshots that
were subsequently expired.
So we don't use deltaSnapshots for iterating and instead iterate through the
snapshots collected from the endVersion metadata file because we won't be able
to read the manifest list associated with the expired snapshots.
In `manifestsToRewrite` we use deltaSnapshots only to avoid including
manifest files that were already rewritten in a previous incremental run. The
snapshot_id field on each manifest entry identifies the snapshot that
originally created it. By filtering to only those whose snapshot_id falls
within deltaSnapshotIds, we select only manifests that are new since the start
version and exclude those that were inherited from before it.
##########
hadoop-ozone/iceberg/src/main/java/org/apache/hadoop/ozone/iceberg/RewriteTablePathOzoneAction.java:
##########
@@ -57,11 +69,15 @@ public class RewriteTablePathOzoneAction implements
RewriteTablePath {
private String stagingDir;
private int parallelism;
+ private ExecutorService executorService;
+ private static final int MAX_INFLIGHT_MULTIPLIER = 4;
+ private static final int DEFAULT_THREAD_COUNT = 10;
+
private final Table table;
public RewriteTablePathOzoneAction(Table table) {
this.table = table;
- this.parallelism = Runtime.getRuntime().availableProcessors();
+ this.parallelism = DEFAULT_THREAD_COUNT;
Review Comment:
Yes , we can pass the thread count via command during which
` public RewriteTablePathOzoneAction(Table table, int parallelism)` will be
used , we can add this when the CLI command is added for the rewrite.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]