dramaticlly commented on code in PR #15003:
URL: https://github.com/apache/iceberg/pull/15003#discussion_r2752486001
##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -645,6 +645,35 @@ protected boolean cleanupAfterCommit() {
return true;
}
+ /**
Review Comment:
I guess it's a bit difficult to read with auto formater, but LINE 103 of
https://github.com/apache/iceberg/blob/026ec35b7969539b535af03944fae53d059cd233/core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java#L103
does populate the KEPT_MANIFESTS_COUNT and it's taken cared of.
##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -351,6 +351,62 @@ public void testDeleteFilesOnIndependentBranches() {
statuses(Status.EXISTING, Status.DELETED, Status.DELETED));
}
+ @TestTemplate
+ public void testDeleteFilesWithManifestSnapshotSummary() {
Review Comment:
I added assertions to existing tests which cares about manifest, also for
the TestRowDelta
##########
core/src/main/java/org/apache/iceberg/ManifestMergeManager.java:
##########
@@ -200,8 +217,9 @@ private ManifestFile createManifest(int specId,
List<ManifestFile> bin) {
ManifestFile manifest = writer.toManifestFile();
- // update the cache
+ // update the cache and track replaced manifests
mergedManifests.put(bin, manifest);
+ replacedManifestsCount.addAndGet(bin.size());
Review Comment:
I think this piece of code is a bit tricky but we would actually overcount
the replaced manifest if this count is moved in line 86.
Let me share a bit more, not all manifests in a group are actually merged
for `toBeMerged`. Say we have a bin with size =1, of any bin with size <
minCountToMerge (default to 100), those manifest will be kept as-is and we
expect the manifest replaced count = 0.
The manifest will only be merged when criteria met, and that's the reason
why replaced count is tracked inside `createManifest` to handle cache hits on
retries.
I also improve the code to only track the replaced manifest from previous
snapshot in ManifestMergeManager
##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -645,6 +645,34 @@ protected boolean cleanupAfterCommit() {
return true;
}
+ /**
+ * Builds a snapshot summary with manifest counts.
+ *
+ * @param manifests the list of manifests in the new snapshot
+ * @param replacedManifestsCount the count of manifests that were replaced
(rewritten)
+ * @return a summary builder with manifest count metrics set
+ */
+ protected SnapshotSummary.Builder buildManifestCountSummary(
+ List<ManifestFile> manifests, int replacedManifestsCount) {
+ SnapshotSummary.Builder summaryBuilder = SnapshotSummary.builder();
+ int manifestsCreated = 0;
+ int manifestsKept = 0;
+
+ for (ManifestFile manifest : manifests) {
+ if (snapshotId() == manifest.snapshotId()) {
Review Comment:
I guess if the manifest is written by ManifestWriter, it shall have snapshot
configured on constructor. There's no strong reason to fail when we generate
the optional snapshot summary, if you feel there's the need, we can only
increment the kept-manifest count if `manifest.snapshotId()` is non-null?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]