[GitHub] [hudi] codope commented on a diff in pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix

2023-04-17 Thread via GitHub


codope commented on code in PR #8402:
URL: https://github.com/apache/hudi/pull/8402#discussion_r1169476798


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala:
##
@@ -299,7 +299,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
   // prefix to try to reduce the scope of the required file-listing
   val relativePartitionPathPrefix = 
composeRelativePartitionPath(staticPartitionColumnNameValuePairs)
 
-  if (staticPartitionColumnNameValuePairs.length == 
partitionColumnNames.length) {
+  if (!metaClient.getFs.exists(new Path(getBasePath, 
relativePartitionPathPrefix))) {

Review Comment:
   It does have getAllPartitionPaths but it is based in state of MDT and not 
fs.exists. If i understand correctly, this PR is to fix the issue when metadata 
is disabled and partitionPathPrefixAnalysis is enabled. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix

2023-04-13 Thread via GitHub


codope commented on code in PR #8402:
URL: https://github.com/apache/hudi/pull/8402#discussion_r1165824450


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala:
##
@@ -299,7 +299,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
   // prefix to try to reduce the scope of the required file-listing
   val relativePartitionPathPrefix = 
composeRelativePartitionPath(staticPartitionColumnNameValuePairs)
 
-  if (staticPartitionColumnNameValuePairs.length == 
partitionColumnNames.length) {
+  if (!metaClient.getFs.exists(new Path(getBasePath, 
relativePartitionPathPrefix))) {

Review Comment:
   Got it 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix

2023-04-12 Thread via GitHub


codope commented on code in PR #8402:
URL: https://github.com/apache/hudi/pull/8402#discussion_r1165076903


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala:
##
@@ -299,7 +299,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
   // prefix to try to reduce the scope of the required file-listing
   val relativePartitionPathPrefix = 
composeRelativePartitionPath(staticPartitionColumnNameValuePairs)
 
-  if (staticPartitionColumnNameValuePairs.length == 
partitionColumnNames.length) {
+  if (!metaClient.getFs.exists(new Path(getBasePath, 
relativePartitionPathPrefix))) {

Review Comment:
   `fs.exists` call is costly. This will impact latency. How often do we run 
into this scenario? FS cache is invalidated on each refresh anyway, so I am 
wondering if we really need to do fs.exists check everytime.
   Can we not simply catch the exception and continue? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org