kbendick commented on a change in pull request #4307:
URL: https://github.com/apache/iceberg/pull/4307#discussion_r828139584



##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseDeleteOrphanFilesSparkAction.java
##########
@@ -205,15 +211,27 @@ private String jobDesc() {
     JavaRDD<String> subDirRDD = sparkContext().parallelize(subDirs, 
parallelism);
 
     Broadcast<SerializableConfiguration> conf = 
sparkContext().broadcast(hadoopConf);
-    JavaRDD<String> matchingLeafFileRDD = 
subDirRDD.mapPartitions(listDirsRecursively(conf, olderThanTimestamp));
+    JavaRDD<String> matchingLeafFileRDD =
+        subDirRDD.mapPartitions(listDirsRecursively(conf, olderThanTimestamp, 
filter));
 
     JavaRDD<String> completeMatchingFileRDD = 
matchingFileRDD.union(matchingLeafFileRDD);
     return spark().createDataset(completeMatchingFileRDD.rdd(), 
Encoders.STRING()).toDF("file_path");
   }
 
+  private PathFilter pathFilter(PartitionSpec spec) {
+    List<String> partitionNames = Lists.newArrayList();
+    for (PartitionField field : spec.fields()) {
+      if (field.name().startsWith("_") || field.name().startsWith(".")) {
+        partitionNames.add(field.name());
+      }
+    }
+
+    return (partitionNames.isEmpty()) ? HiddenPathFilter.get() : new 
PartitionAwareHiddenPathFilter(partitionNames);
+  }
+
   private static void listDirRecursively(
-      String dir, Predicate<FileStatus> predicate, Configuration conf, int 
maxDepth,
-      int maxDirectSubDirs, List<String> remainingSubDirs, List<String> 
matchingFiles) {
+          String dir, Predicate<FileStatus> predicate, Configuration conf, int 
maxDepth,
+          int maxDirectSubDirs, List<String> remainingSubDirs, PathFilter 
filter, List<String> matchingFiles) {

Review comment:
       Nit: This is over-indented. The original indention is correct - 2 spaces 
for the start of the function declaration, and then 4 spaces for the start of 
any code that continues into the next line (so the argument list here).
   
   If you use IntelliJ (and some other IDEs), there is a code formatter that 
will handle things like indents etc: 
https://iceberg.apache.org/community/#setting-up-ide-and-code-style
   
   I find personally sometimes I have to set it per-subproject (so for Flink, 
and for Spark, etc). But that could be because of the way I set it up.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to