[jira] [Commented] (FLINK-32124) Add option to enable partition alignment for sources

2023-05-18 Thread Zhanghao Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724125#comment-17724125
 ] 

Zhanghao Chen commented on FLINK-32124:
---

Hi [~gyfora]. I confirmed that operator actually aligns it, an internal code 
change in our production breaks it. Sorry for the confusion and I'll close this 
ticket.

> Add option to enable partition alignment for sources
> 
>
> Key: FLINK-32124
> URL: https://issues.apache.org/jira/browse/FLINK-32124
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler
>Reporter: Zhanghao Chen
>Priority: Major
>
> Currently, autoscaler did not consider balancing partitions among source 
> tasks. In our production env, partition skew has proven to be a severe 
> problem for many jobs. Especially in a job topology with all forward or 
> rescale shuffles,  partition skew on the source side can further lead to data 
> imbalance in later operators.
> We should add an option to enable partition alignment for sources for that, 
> but making it disabled by default as this has a side effect in that partition 
> usu. has limited factors and enabling alignment will greatly limit our 
> scaling choices. Also, if data among partitions are imbalanced in the first 
> place, partition alignment won't help as well (this is not a common case 
> inside our company though).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32124) Add option to enable partition alignment for sources

2023-05-18 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723821#comment-17723821
 ] 

Gyula Fora commented on FLINK-32124:


[~Zhanghao Chen] can you please confirm that the current behaviour is actually 
always partition alignment? How could we align it even better?

> Add option to enable partition alignment for sources
> 
>
> Key: FLINK-32124
> URL: https://issues.apache.org/jira/browse/FLINK-32124
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler
>Reporter: Zhanghao Chen
>Priority: Major
>
> Currently, autoscaler did not consider balancing partitions among source 
> tasks. In our production env, partition skew has proven to be a severe 
> problem for many jobs. Especially in a job topology with all forward or 
> rescale shuffles,  partition skew on the source side can further lead to data 
> imbalance in later operators.
> We should add an option to enable partition alignment for sources for that, 
> but making it disabled by default as this has a side effect in that partition 
> usu. has limited factors and enabling alignment will greatly limit our 
> scaling choices. Also, if data among partitions are imbalanced in the first 
> place, partition alignment won't help as well (this is not a common case 
> inside our company though).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32124) Add option to enable partition alignment for sources

2023-05-18 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723820#comment-17723820
 ] 

Gyula Fora commented on FLINK-32124:


My bad, this is what I meant: https://issues.apache.org/jira/browse/FLINK-32119

 

> Add option to enable partition alignment for sources
> 
>
> Key: FLINK-32124
> URL: https://issues.apache.org/jira/browse/FLINK-32124
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler
>Reporter: Zhanghao Chen
>Priority: Major
>
> Currently, autoscaler did not consider balancing partitions among source 
> tasks. In our production env, partition skew has proven to be a severe 
> problem for many jobs. Especially in a job topology with all forward or 
> rescale shuffles,  partition skew on the source side can further lead to data 
> imbalance in later operators.
> We should add an option to enable partition alignment for sources for that, 
> but making it disabled by default as this has a side effect in that partition 
> usu. has limited factors and enabling alignment will greatly limit our 
> scaling choices. Also, if data among partitions are imbalanced in the first 
> place, partition alignment won't help as well (this is not a common case 
> inside our company though).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32124) Add option to enable partition alignment for sources

2023-05-18 Thread Fang Yong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723819#comment-17723819
 ] 

Fang Yong commented on FLINK-32124:
---

[~gyfora]Is the link https://issues.apache.org/jira/browse/FLINK-32124 wrong? 
It's just the link of the current issue

> Add option to enable partition alignment for sources
> 
>
> Key: FLINK-32124
> URL: https://issues.apache.org/jira/browse/FLINK-32124
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler
>Reporter: Zhanghao Chen
>Priority: Major
>
> Currently, autoscaler did not consider balancing partitions among source 
> tasks. In our production env, partition skew has proven to be a severe 
> problem for many jobs. Especially in a job topology with all forward or 
> rescale shuffles,  partition skew on the source side can further lead to data 
> imbalance in later operators.
> We should add an option to enable partition alignment for sources for that, 
> but making it disabled by default as this has a side effect in that partition 
> usu. has limited factors and enabling alignment will greatly limit our 
> scaling choices. Also, if data among partitions are imbalanced in the first 
> place, partition alignment won't help as well (this is not a common case 
> inside our company though).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32124) Add option to enable partition alignment for sources

2023-05-18 Thread Zhanghao Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723801#comment-17723801
 ] 

Zhanghao Chen commented on FLINK-32124:
---

Thanks [~gyfora]. I'll follow up there.

> Add option to enable partition alignment for sources
> 
>
> Key: FLINK-32124
> URL: https://issues.apache.org/jira/browse/FLINK-32124
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler
>Reporter: Zhanghao Chen
>Priority: Major
>
> Currently, autoscaler did not consider balancing partitions among source 
> tasks. In our production env, partition skew has proven to be a severe 
> problem for many jobs. Especially in a job topology with all forward or 
> rescale shuffles,  partition skew on the source side can further lead to data 
> imbalance in later operators.
> We should add an option to enable partition alignment for sources for that, 
> but making it disabled by default as this has a side effect in that partition 
> usu. has limited factors and enabling alignment will greatly limit our 
> scaling choices. Also, if data among partitions are imbalanced in the first 
> place, partition alignment won't help as well (this is not a common case 
> inside our company though).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32124) Add option to enable partition alignment for sources

2023-05-18 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723796#comment-17723796
 ] 

Gyula Fora commented on FLINK-32124:


This is related to https://issues.apache.org/jira/browse/FLINK-32124

But I think currently we actually consider partition balance. We set the max 
parallelism to the number of partitions and only select parallelisms that are 
divisors of this (so there is always balance)

> Add option to enable partition alignment for sources
> 
>
> Key: FLINK-32124
> URL: https://issues.apache.org/jira/browse/FLINK-32124
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler
>Reporter: Zhanghao Chen
>Priority: Major
>
> Currently, autoscaler did not consider balancing partitions among source 
> tasks. In our production env, partition skew has proven to be a severe 
> problem for many jobs. Especially in a job topology with all forward or 
> rescale shuffles,  partition skew on the source side can further lead to data 
> imbalance in later operators.
> We should add an option to enable partition alignment for sources for that, 
> but making it disabled by default as this has a side effect in that partition 
> usu. has limited factors and enabling alignment will greatly limit our 
> scaling choices. Also, if data among partitions are imbalanced in the first 
> place, partition alignment won't help as well (this is not a common case 
> inside our company though).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)