ulmako commented on a change in pull request #4307:
URL: https://github.com/apache/iceberg/pull/4307#discussion_r840689777
##########
File path:
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseDeleteOrphanFilesSparkAction.java
##########
@@ -273,4 +285,35 @@ private static void listDirRecursively(
return files.iterator();
};
}
+
+ @VisibleForTesting
+ static class PartitionAwareHiddenPathFilter implements PathFilter,
Serializable {
+
+ private final Set<String> hiddenPathPartitionNames;
+
+ PartitionAwareHiddenPathFilter(Set<String> hiddenPathPartitionNames) {
+ this.hiddenPathPartitionNames = hiddenPathPartitionNames;
+ }
+
+ @Override
+ public boolean accept(Path path) {
+ boolean isHiddenPartitionPath =
hiddenPathPartitionNames.stream().anyMatch(path.getName()::startsWith);
+ return isHiddenPartitionPath || HiddenPathFilter.get().accept(path);
+ }
+
+ static PathFilter build(Map<Integer, PartitionSpec> specs) {
+ if (specs == null) {
+ return HiddenPathFilter.get();
+ }
+
+ Set<String> partitionNames = specs.values().stream()
+ .map(PartitionSpec::fields)
+ .flatMap(List::stream)
+ .filter(partitionField -> partitionField.name().startsWith("_") ||
partitionField.name().startsWith("."))
+ .map(PartitionField::name)
+ .collect(Collectors.toSet());
Review comment:
This is a good point. I thought about considering the partition field
hierarchy for the filter in the beginning, but I was a little spooked by the
complexity or better the time it would take to implement it. The `=` is a good
compromise. Maybe I can find some time in the future to take another look at it
and make a second PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]