nobleyd created FLINK-18799:
-------------------------------
Summary: improve slot allocation to make resource balance among
machines.
Key: FLINK-18799
URL: https://issues.apache.org/jira/browse/FLINK-18799
Project: Flink
Issue Type: Improvement
Components: API / Core, Client / Job Submission
Reporter: nobleyd
I have a completed job, and each vertex may have different parallelism, and
what troubles me is that the metric 'cpu used' differs among machines.
It comes to be good when I upgraded to use flink1.10, and add
'cluster.evenly-spread-out-slots: true' to flink config. This is good, while
sometimes it is not enough.
For example, I have 5 taskmanagers(each deployed in one machine). I have a
job, and some vertexs and the parallelism info is below.
||vertex||parallelism||
|A|1|
|B|15|
|C|20|
|D|1|
|E|15|
In this case, the resource sometimes won't balance very good. What I expected
is that the vertext B/C/E can distribute evenly amont 5 taskmanagers. Vertex A
and D only have 1 parallelism, and it is just some config stream.
Expected allocation strategy: For each vertex, allocate slot evenly among
taskmanagers. Then next vertex and repeat. For example, the result below:
||TaskManager1||TaskManager2||TaskManager3||TaskManager4||TaskManager5||
|{color:#FF0000}A1{color}|{color:#00875a}B1{color}|{color:#00875a}B2{color}|{color:#00875a}B3{color}|{color:#00875a}B4{color}|
|{color:#00875a}B5{color}|{color:#00875a}B6{color}|{color:#00875a}B7{color}|{color:#00875a}B8{color}|{color:#00875a}B9{color}|
|{color:#00875a}B10{color}|{color:#00875a}B11{color}|{color:#00875a}B12{color}|{color:#00875a}B13{color}|{color:#00875a}B14{color}|
|{color:#00875a}B15{color}|{color:#ff8b00}C1{color}|{color:#ff8b00}C2{color}|{color:#ff8b00}C3{color}|{color:#ff8b00}C4{color}|
|{color:#ff8b00}C5{color}|{color:#ff8b00}C6{color}|{color:#ff8b00}C7{color}|{color:#ff8b00}C8{color}|{color:#ff8b00}C9{color}|
|{color:#ff8b00}C10{color}|{color:#ff8b00}C11{color}|{color:#ff8b00}C12{color}|{color:#ff8b00}C13{color}|{color:#ff8b00}C14{color}|
|{color:#ff8b00}C15{color}|{color:#ff8b00}C16{color}|{color:#ff8b00}C17{color}|{color:#ff8b00}C18{color}|{color:#ff8b00}C19{color}|
|{color:#ff8b00}C20{color}|{color:#403294}D1{color}|E1|E2|E3|
|E4|E5|E6|E7|E8|
|E9|E10|E11|E12|E13|
|E14|E15| | | |
| | | | | |
The allocation order is A -> B -> C -> D -> E or some other order, it doesn't
matter. The key point is one vertex's all parallel subtasks should be
allocated at one time, and then to consider the next vertex. With this
strategy, vertext A/D won't influence other vertex's distribution equilibrium.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)