[GitHub] [hudi] vinothchandar commented on a change in pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

GitBox Wed, 17 Feb 2021 08:26:37 -0800


vinothchandar commented on a change in pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#discussion_r577755265




##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##########
@@ -74,6 +78,19 @@ class DefaultSource extends RelationProvider
     val tablePath = DataSourceUtils.getTablePath(fs, globPaths.toArray)
     log.info("Obtained hudi table path: " + tablePath)
 
+    val sparkEngineContext = new 
HoodieSparkEngineContext(sqlContext.sparkContext)
+    val fsBackedTableMetadata =
+      new FileSystemBackedTableMetadata(sparkEngineContext, new 
SerializableConfiguration(fs.getConf), tablePath, false)
+    val partitionPaths = fsBackedTableMetadata.getAllPartitionPaths
+    val onePartitionPath = if(!partitionPaths.isEmpty && 
!StringUtils.isEmpty(partitionPaths.get(0))) {
+        tablePath + "/" + partitionPaths.get(0)
+      } else {
+        tablePath
+      }
+    val dataPath = DataSourceUtils.getDataPath(tablePath, onePartitionPath)
+    log.info("Obtained hudi data path: " + dataPath)
+    parameters += "path" -> dataPath

Review comment:
       @teeyog Sorry it's not still clear to me. I supplied a globbed path 
`2015/*/*/*` and even that overrides `path -> tablePath/*/*/*/*` 
   
   Won't this incur reading all partitions in the tablePath as opposed only 
2015's? 
   
   
![image](https://user-images.githubusercontent.com/1179324/108234541-bde4fb00-70f9-11eb-8611-58579636b51b.png)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vinothchandar commented on a change in pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

Reply via email to