aokolnychyi commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1849434684
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -307,12 +331,22 @@ private void invalidateFilteredCache() {
/**
* @return a ManifestReader that is a filtered version of the input manifest.
*/
- private ManifestFile filterManifest(Schema tableSchema, ManifestFile
manifest) {
+ private ManifestFile filterManifest(
+ Schema tableSchema, ManifestFile manifest, boolean
trustReferencedManifests) {
ManifestFile cached = filteredManifests.get(manifest);
if (cached != null) {
return cached;
}
+ boolean manifestIsReferenced =
manifestsReferencedForDeletes.contains(manifest.path());
+
+ // The manifest does not need to be rewritten if the referenced set can be
trusted and the
+ // manifest is not referenced
+ if (trustReferencedManifests && !manifestIsReferenced) {
Review Comment:
I wonder whether we can restructure this a bit as there are separate
branches that basically skip rewrites. What about having a common
`canContainDeletedFiles` and just doing something like this?
```
if (!canContainDeletedFiles(manifest, trustManifestReferences)) {
filteredManifests.put(manifest, manifest);
return manifest;
}
try (ManifestReader<F> reader = newManifestReader(manifest)) {
PartitionSpec spec = reader.spec();
PartitionAndMetricsEvaluator evaluator =
new PartitionAndMetricsEvaluator(tableSchema, spec, deleteExpression);
if (manifestHasDeletedFiles(evaluator, reader)) {
return filterManifestWithDeletedFiles(evaluator, manifest, reader);
} else {
filteredManifests.put(manifest, manifest);
return manifest;
}
} catch (IOException e) {
throw new RuntimeIOException(e, "Failed to close manifest: %s", manifest);
}
```
With helper methods:
```
private boolean canContainDeletedFiles(ManifestFile manifest, boolean
trustManifestReferences) {
if (hasNoLiveFiles(manifest)) {
return false;
}
if (trustManifestReferences) {
return manifestsWithDeletes.contains(manifest.path());
}
return canContainDroppedFiles(manifest)
|| canContainExpressionDeletes(manifest)
|| canContainDroppedPartitions(manifest);
}
private boolean hasNoLiveFiles(ManifestFile manifest) {
return !manifest.hasAddedFiles() && !manifest.hasExistingFiles();
}
```
And an extra check in `manifestHasDeletedFiles`:
```
private boolean manifestHasDeletedFiles(
PartitionAndMetricsEvaluator evaluator, ManifestReader<F> reader) {
if (manifestsWithDeletes.contains(reader.file().location())) {
return true;
}
...
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]