Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-05-09 Thread via GitHub


nsivabalan closed pull request #10909: [HUDI-7528] Fixing 
RowCustomColumnsSortPartitioner to use repartition instead of coalesce
URL: https://github.com/apache/hudi/pull/10909


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-05-02 Thread via GitHub


hudi-bot commented on PR #10909:
URL: https://github.com/apache/hudi/pull/10909#issuecomment-2092181639

   
   ## CI report:
   
   * 78efc7ca1cc033e445086b925cae48204d214871 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23642)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-05-02 Thread via GitHub


hudi-bot commented on PR #10909:
URL: https://github.com/apache/hudi/pull/10909#issuecomment-2092056966

   
   ## CI report:
   
   * b5ebcf8de8abc367918e5ab570be4bcd52b33208 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22990)
 
   * 78efc7ca1cc033e445086b925cae48204d214871 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23642)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-05-02 Thread via GitHub


hudi-bot commented on PR #10909:
URL: https://github.com/apache/hudi/pull/10909#issuecomment-2092052645

   
   ## CI report:
   
   * b5ebcf8de8abc367918e5ab570be4bcd52b33208 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22990)
 
   * 78efc7ca1cc033e445086b925cae48204d214871 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-25 Thread via GitHub


yihua commented on code in PR #10909:
URL: https://github.com/apache/hudi/pull/10909#discussion_r1537958935


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RowCustomColumnsSortPartitioner.java:
##
@@ -51,7 +51,7 @@ public RowCustomColumnsSortPartitioner(String[] columnNames, 
HoodieWriteConfig c
   public Dataset repartitionRecords(Dataset records, int 
outputSparkPartitions) {
 return records
 
.sort(Arrays.stream(sortColumnNames).map(Column::new).toArray(Column[]::new))
-.coalesce(outputSparkPartitions);
+.repartition(outputSparkPartitions);

Review Comment:
   Should the `.coalesce` happen before `.sort` to control the parallelism? Is 
that the reason why the parallelism is not honored in the global sorting?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-22 Thread via GitHub


hudi-bot commented on PR #10909:
URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014766403

   
   ## CI report:
   
   * b5ebcf8de8abc367918e5ab570be4bcd52b33208 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22990)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-22 Thread via GitHub


hudi-bot commented on PR #10909:
URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014693276

   
   ## CI report:
   
   * 5f6135593aab6329b060ea1ee30388eb22c0dc97 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22986)
 
   * b5ebcf8de8abc367918e5ab570be4bcd52b33208 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22990)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-22 Thread via GitHub


hudi-bot commented on PR #10909:
URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014679290

   
   ## CI report:
   
   * 5f6135593aab6329b060ea1ee30388eb22c0dc97 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22986)
 
   * b5ebcf8de8abc367918e5ab570be4bcd52b33208 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-22 Thread via GitHub


hudi-bot commented on PR #10909:
URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014572622

   
   ## CI report:
   
   * 5f6135593aab6329b060ea1ee30388eb22c0dc97 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22986)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-22 Thread via GitHub


hudi-bot commented on PR #10909:
URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014505672

   
   ## CI report:
   
   * 5f6135593aab6329b060ea1ee30388eb22c0dc97 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22986)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-22 Thread via GitHub


hudi-bot commented on PR #10909:
URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014497841

   
   ## CI report:
   
   * 5f6135593aab6329b060ea1ee30388eb22c0dc97 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-21 Thread via GitHub


nsivabalan opened a new pull request, #10909:
URL: https://github.com/apache/hudi/pull/10909

   ### Change Logs
   
   Fixing RowCustomColumnsSortPartitioner to use repartition instead of 
coalesce. w/ coalesce, chances that user defined shuffle parallelism may not be 
honored. 
   
   ### Impact
   
   User defined shuffle parallelism will be honored with 
RowCustomColumnsSortPartitioner
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org