[
https://issues.apache.org/jira/browse/FLINK-36576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lei Yang updated FLINK-36576:
-----------------------------
Description:
Currently, the DefaultVertexParallelismAndInputInfosDecider is able to
implement a balanced distribution algorithm based on the amount of data and the
number of subpartitions, however it also has some limitations:
# Currently, Decider selects the data distribution algorithm via the AllToAll
or Pointwise attribute of the input, which limits the ability of the operator
to dynamically modify the data distribution algorithm.
# Doesn't support data volume-based balanced distribution for Pointwise inputs.
# For AllToAll type inputs, it does not support splitting the data
corresponding to the specific key, i.e., it cannot solve the data skewing
caused by single-key hotspot.
For that we plan to introduce the following improvements:
# Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the
input characterisation which allows the operator to flexibly choose the data
balanced distribution algorithm.
# Introducing a data volume-based data balanced distribution algorithm for
Pointwise inputs
# Introducing the ability to split data corresponding to the specific key to
optimise AllToAll's data volume-based data balancing distribution algorithm.
was:
Currently, the DefaultVertexParallelismAndInputInfosDecider is able to
implement a balanced distribution algorithm based on the amount of data and the
number of subpartitions, however it also has some limitations:
#
Currently, Decider selects the data distribution algorithm via the AllToAll or
Pointwise attribute of the input, which limits the ability of the operator to
dynamically modify the data distribution algorithm.
#
Doesn't support data volume-based balanced distribution for Pointwise inputs.
#
For AllToAll type inputs, it does not support splitting the data corresponding
to the specific key, i.e., it cannot solve the data skewing caused by
single-key hotspot.
For that we plan to introduce the following improvements:
#
Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the input
characterisation which allows the operator to flexibly choose the data balanced
distribution algorithm.
#
Introducing a data volume-based data balanced distribution algorithm for
Pointwise inputs
#
Introducing the ability to split data corresponding to the specific key to
optimise AllToAll's data volume-based data balancing distribution algorithm.
> Improving amount-based data balancing distribution algorithm for
> DefaultVertexParallelismAndInputInfosDecider
> -------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-36576
> URL: https://issues.apache.org/jira/browse/FLINK-36576
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Reporter: Lei Yang
> Priority: Major
>
> Currently, the DefaultVertexParallelismAndInputInfosDecider is able to
> implement a balanced distribution algorithm based on the amount of data and
> the number of subpartitions, however it also has some limitations:
> # Currently, Decider selects the data distribution algorithm via the
> AllToAll or Pointwise attribute of the input, which limits the ability of the
> operator to dynamically modify the data distribution algorithm.
> # Doesn't support data volume-based balanced distribution for Pointwise
> inputs.
> # For AllToAll type inputs, it does not support splitting the data
> corresponding to the specific key, i.e., it cannot solve the data skewing
> caused by single-key hotspot.
> For that we plan to introduce the following improvements:
> # Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the
> input characterisation which allows the operator to flexibly choose the data
> balanced distribution algorithm.
> # Introducing a data volume-based data balanced distribution algorithm for
> Pointwise inputs
> # Introducing the ability to split data corresponding to the specific key to
> optimise AllToAll's data volume-based data balancing distribution algorithm.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)