[GitHub] [hudi] codope commented on a diff in pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix
codope commented on code in PR #8402: URL: https://github.com/apache/hudi/pull/8402#discussion_r1169476798 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala: ## @@ -299,7 +299,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession, // prefix to try to reduce the scope of the required file-listing val relativePartitionPathPrefix = composeRelativePartitionPath(staticPartitionColumnNameValuePairs) - if (staticPartitionColumnNameValuePairs.length == partitionColumnNames.length) { + if (!metaClient.getFs.exists(new Path(getBasePath, relativePartitionPathPrefix))) { Review Comment: It does have getAllPartitionPaths but it is based in state of MDT and not fs.exists. If i understand correctly, this PR is to fix the issue when metadata is disabled and partitionPathPrefixAnalysis is enabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix
codope commented on code in PR #8402: URL: https://github.com/apache/hudi/pull/8402#discussion_r1165824450 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala: ## @@ -299,7 +299,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession, // prefix to try to reduce the scope of the required file-listing val relativePartitionPathPrefix = composeRelativePartitionPath(staticPartitionColumnNameValuePairs) - if (staticPartitionColumnNameValuePairs.length == partitionColumnNames.length) { + if (!metaClient.getFs.exists(new Path(getBasePath, relativePartitionPathPrefix))) { Review Comment: Got it 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix
codope commented on code in PR #8402: URL: https://github.com/apache/hudi/pull/8402#discussion_r1165076903 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala: ## @@ -299,7 +299,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession, // prefix to try to reduce the scope of the required file-listing val relativePartitionPathPrefix = composeRelativePartitionPath(staticPartitionColumnNameValuePairs) - if (staticPartitionColumnNameValuePairs.length == partitionColumnNames.length) { + if (!metaClient.getFs.exists(new Path(getBasePath, relativePartitionPathPrefix))) { Review Comment: `fs.exists` call is costly. This will impact latency. How often do we run into this scenario? FS cache is invalidated on each refresh anyway, so I am wondering if we really need to do fs.exists check everytime. Can we not simply catch the exception and continue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org