Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
danny0405 merged PR #11150: URL: https://github.com/apache/hudi/pull/11150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
danny0405 commented on code in PR #11150: URL: https://github.com/apache/hudi/pull/11150#discussion_r1593352816 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -457,9 +457,10 @@ private Dataset readRecordsForGroupAsRow(JavaSparkContext jsc, String readPathString = String.join(",", Arrays.stream(paths).map(StoragePath::toString).toArray(String[]::new)); +String globPathString = String.join(",", Arrays.stream(paths).map(StoragePath::getParent).map(StoragePath::toString).distinct().toArray(String[]::new)); params.put("hoodie.datasource.read.paths", readPathString); // Building HoodieFileIndex needs this param to decide query path -params.put("glob.paths", readPathString); +params.put("glob.paths", globPathString); Review Comment: Fine, let's merge it first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
the-other-tim-brown commented on code in PR #11150: URL: https://github.com/apache/hudi/pull/11150#discussion_r1591567763 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -457,9 +457,10 @@ private Dataset readRecordsForGroupAsRow(JavaSparkContext jsc, String readPathString = String.join(",", Arrays.stream(paths).map(StoragePath::toString).toArray(String[]::new)); +String globPathString = String.join(",", Arrays.stream(paths).map(StoragePath::getParent).map(StoragePath::toString).distinct().toArray(String[]::new)); params.put("hoodie.datasource.read.paths", readPathString); // Building HoodieFileIndex needs this param to decide query path -params.put("glob.paths", readPathString); +params.put("glob.paths", globPathString); Review Comment: Looks like it is already covered there -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
danny0405 commented on code in PR #11150: URL: https://github.com/apache/hudi/pull/11150#discussion_r1590454993 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -457,9 +457,10 @@ private Dataset readRecordsForGroupAsRow(JavaSparkContext jsc, String readPathString = String.join(",", Arrays.stream(paths).map(StoragePath::toString).toArray(String[]::new)); +String globPathString = String.join(",", Arrays.stream(paths).map(StoragePath::getParent).map(StoragePath::toString).distinct().toArray(String[]::new)); params.put("hoodie.datasource.read.paths", readPathString); // Building HoodieFileIndex needs this param to decide query path -params.put("glob.paths", readPathString); +params.put("glob.paths", globPathString); Review Comment: Not sure whether `TestHoodieSparkMergeOnReadTableClustering` is the candidate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
the-other-tim-brown commented on code in PR #11150: URL: https://github.com/apache/hudi/pull/11150#discussion_r1590187272 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -457,9 +457,10 @@ private Dataset readRecordsForGroupAsRow(JavaSparkContext jsc, String readPathString = String.join(",", Arrays.stream(paths).map(StoragePath::toString).toArray(String[]::new)); +String globPathString = String.join(",", Arrays.stream(paths).map(StoragePath::getParent).map(StoragePath::toString).distinct().toArray(String[]::new)); params.put("hoodie.datasource.read.paths", readPathString); // Building HoodieFileIndex needs this param to decide query path -params.put("glob.paths", readPathString); +params.put("glob.paths", globPathString); Review Comment: I can't find a test class matching this class name. Is there a clustering test suite I should look in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
danny0405 commented on code in PR #11150: URL: https://github.com/apache/hudi/pull/11150#discussion_r1590186996 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -457,9 +457,10 @@ private Dataset readRecordsForGroupAsRow(JavaSparkContext jsc, String readPathString = String.join(",", Arrays.stream(paths).map(StoragePath::toString).toArray(String[]::new)); +String globPathString = String.join(",", Arrays.stream(paths).map(StoragePath::getParent).map(StoragePath::toString).distinct().toArray(String[]::new)); params.put("hoodie.datasource.read.paths", readPathString); // Building HoodieFileIndex needs this param to decide query path -params.put("glob.paths", readPathString); +params.put("glob.paths", globPathString); Review Comment: do we have any test cases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
hudi-bot commented on PR #11150: URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094365713 ## CI report: * 353708c54b454bf3749596f74267970f1c332b7b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23660) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
hudi-bot commented on PR #11150: URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094339138 ## CI report: * 11abd3eb1b9418d9013f820e3779f56c50810dfd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23658) * 353708c54b454bf3749596f74267970f1c332b7b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23660) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
hudi-bot commented on PR #11150: URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094337137 ## CI report: * 11abd3eb1b9418d9013f820e3779f56c50810dfd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23658) * 353708c54b454bf3749596f74267970f1c332b7b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
hudi-bot commented on PR #11150: URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094335218 ## CI report: * 11abd3eb1b9418d9013f820e3779f56c50810dfd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23658) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
hudi-bot commented on PR #11150: URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094313377 ## CI report: * 11abd3eb1b9418d9013f820e3779f56c50810dfd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23658) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]
hudi-bot commented on PR #11150: URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094309158 ## CI report: * 11abd3eb1b9418d9013f820e3779f56c50810dfd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org