[GitHub] [hudi] teeyog commented on a change in pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

GitBox Wed, 03 Feb 2021 18:01:26 -0800


teeyog commented on a change in pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#discussion_r569889546




##########
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java
##########
@@ -84,6 +86,39 @@ public static String getTablePath(FileSystem fs, Path[] 
userProvidedPaths) throw
     throw new TableNotFoundException("Unable to find a hudi table for the user 
provided paths.");
   }
 
+  public static Option<String> getOnePartitionPath(FileSystem fs, Path 
tablePath) throws IOException {
+    // When the table is not partitioned
+    if (HoodiePartitionMetadata.hasPartitionMetadata(fs, tablePath)) {
+      return Option.of(tablePath.toString());
+    }
+    FileStatus[] statuses = fs.listStatus(tablePath);
+    for (FileStatus status : statuses) {
+      if (status.isDirectory()) {
+        if (HoodiePartitionMetadata.hasPartitionMetadata(fs, 
status.getPath())) {
+          return Option.of(status.getPath().toString());
+        } else {
+          Option<String> partitionPath = getOnePartitionPath(fs, 
status.getPath());
+          if (partitionPath.isPresent()) {
+            return partitionPath;

Review comment:
       Thank you for your review, this method of obtaining partitions is very 
fast. As long as one partition path is obtained, it will return directly. 
FSUtils.getAllPartitionPaths will obtain all partition paths, which is very 
time-consuming.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] teeyog commented on a change in pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

Reply via email to