[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5012: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12, 2023-01-09 (was: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12, 0.13.0 Final Sprint) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5012: -- Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12, 0.13.0 Final Sprint (was: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5012: -- Status: Open (was: Patch Available) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5012: -- Fix Version/s: 0.13.0 (was: 0.12.2) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5012: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01 (was: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/29) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5012: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/06 (was: 2022/10/04, 2022/10/18, 2022/11/01) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5012: -- Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/29 (was: 2022/10/04, 2022/10/18, 2022/11/01) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5012: -- Sprint: 2022/10/04, 2022/10/18, 2022/11/01 (was: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5012: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15 (was: 2022/10/04, 2022/10/18, 2022/11/01) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5012: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01 (was: 2022/10/04, 2022/10/18) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5012: - Sprint: 2022/10/04, 2022/10/18 (was: 2022/10/04) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaojing Yu updated HUDI-5012: -- Fix Version/s: 0.12.2 (was: 0.12.1) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5012: -- Status: Patch Available (was: In Progress) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.1 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5012: -- Status: In Progress (was: Open) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.1 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5012: - Labels: pull-request-available (was: ) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.1 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5012: -- Story Points: 2 > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Fix For: 0.12.1 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5012: -- Sprint: 2022/10/04 > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Fix For: 0.12.1 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5012: -- Priority: Critical (was: Major) > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Priority: Critical > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions
[ https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5012: -- Fix Version/s: 0.12.1 > Fix clean planning for very large partitions > > > Key: HUDI-5012 > URL: https://issues.apache.org/jira/browse/HUDI-5012 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning >Reporter: sivabalan narayanan >Priority: Critical > Fix For: 0.12.1 > > > Within clean planning phase, we do a map() for every partition and then > trigger planning for each partition within that. > > For very large number of partitions, and if cleaner shuffle parallelism is > small, this results in more sequential planning. We can enhance this with > mapPartitions call and optimize it > -- This message was sent by Atlassian Jira (v8.20.10#820010)