[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-12-20 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5012:
-
Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12, 2023-01-09  (was: 
2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12, 0.13.0 Final Sprint)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-12-19 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-5012:
--
Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12, 0.13.0 Final Sprint 
 (was: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-12-19 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5012:
--
Status: Open  (was: Patch Available)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-12-08 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-5012:
--
Fix Version/s: 0.13.0
   (was: 0.12.2)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-12-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5012:
-
Sprint: 2022/10/04, 2022/10/18, 2022/11/01  (was: 2022/10/04, 2022/10/18, 
2022/11/01, 2022/11/29)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-12-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5012:
-
Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/06  (was: 2022/10/04, 
2022/10/18, 2022/11/01)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-11-24 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5012:
--
Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/29  (was: 2022/10/04, 
2022/10/18, 2022/11/01)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-11-24 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5012:
--
Sprint: 2022/10/04, 2022/10/18, 2022/11/01  (was: 2022/10/04, 2022/10/18, 
2022/11/01, 2022/11/15)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-11-15 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5012:
-
Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15  (was: 2022/10/04, 
2022/10/18, 2022/11/01)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-11-01 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5012:
-
Sprint: 2022/10/04, 2022/10/18, 2022/11/01  (was: 2022/10/04, 2022/10/18)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-10-19 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5012:
-
Sprint: 2022/10/04, 2022/10/18  (was: 2022/10/04)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-10-17 Thread Zhaojing Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaojing Yu updated HUDI-5012:
--
Fix Version/s: 0.12.2
   (was: 0.12.1)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-10-13 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5012:
--
Status: Patch Available  (was: In Progress)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-10-13 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5012:
--
Status: In Progress  (was: Open)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5012:
-
Labels: pull-request-available  (was: )

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-10-11 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5012:
--
Story Points: 2

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
> Fix For: 0.12.1
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-10-11 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5012:
--
Sprint: 2022/10/04

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Critical
> Fix For: 0.12.1
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-10-11 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5012:
--
Priority: Critical  (was: Major)

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Priority: Critical
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5012) Fix clean planning for very large partitions

2022-10-11 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5012:
--
Fix Version/s: 0.12.1

> Fix clean planning for very large partitions
> 
>
> Key: HUDI-5012
> URL: https://issues.apache.org/jira/browse/HUDI-5012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Priority: Critical
> Fix For: 0.12.1
>
>
> Within clean planning phase, we do a map() for every partition and then 
> trigger planning for each partition within that. 
>  
> For very large number of partitions, and if cleaner shuffle parallelism is 
> small, this results in more sequential planning. We can enhance this with 
> mapPartitions call and optimize it 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)