Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
nsivabalan closed pull request #10909: [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce URL: https://github.com/apache/hudi/pull/10909 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2092181639 ## CI report: * 78efc7ca1cc033e445086b925cae48204d214871 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23642) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2092056966 ## CI report: * b5ebcf8de8abc367918e5ab570be4bcd52b33208 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22990) * 78efc7ca1cc033e445086b925cae48204d214871 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23642) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2092052645 ## CI report: * b5ebcf8de8abc367918e5ab570be4bcd52b33208 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22990) * 78efc7ca1cc033e445086b925cae48204d214871 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
yihua commented on code in PR #10909: URL: https://github.com/apache/hudi/pull/10909#discussion_r1537958935 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RowCustomColumnsSortPartitioner.java: ## @@ -51,7 +51,7 @@ public RowCustomColumnsSortPartitioner(String[] columnNames, HoodieWriteConfig c public Dataset repartitionRecords(Dataset records, int outputSparkPartitions) { return records .sort(Arrays.stream(sortColumnNames).map(Column::new).toArray(Column[]::new)) -.coalesce(outputSparkPartitions); +.repartition(outputSparkPartitions); Review Comment: Should the `.coalesce` happen before `.sort` to control the parallelism? Is that the reason why the parallelism is not honored in the global sorting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014766403 ## CI report: * b5ebcf8de8abc367918e5ab570be4bcd52b33208 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22990) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014693276 ## CI report: * 5f6135593aab6329b060ea1ee30388eb22c0dc97 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22986) * b5ebcf8de8abc367918e5ab570be4bcd52b33208 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22990) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014679290 ## CI report: * 5f6135593aab6329b060ea1ee30388eb22c0dc97 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22986) * b5ebcf8de8abc367918e5ab570be4bcd52b33208 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014572622 ## CI report: * 5f6135593aab6329b060ea1ee30388eb22c0dc97 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22986) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014505672 ## CI report: * 5f6135593aab6329b060ea1ee30388eb22c0dc97 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22986) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014497841 ## CI report: * 5f6135593aab6329b060ea1ee30388eb22c0dc97 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]
nsivabalan opened a new pull request, #10909: URL: https://github.com/apache/hudi/pull/10909 ### Change Logs Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce. w/ coalesce, chances that user defined shuffle parallelism may not be honored. ### Impact User defined shuffle parallelism will be honored with RowCustomColumnsSortPartitioner ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org