[jira] [Commented] (FLINK-18799) improve slot allocation to make resource balance among machines.

Zhu Zhu (Jira) Mon, 03 Aug 2020 03:05:03 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169874#comment-17169874
 ]


Zhu Zhu commented on FLINK-18799:
---------------------------------

Thanks for the explanation [~nobleyd]
In your case there would be 20 shared slots allocated to tasks, usually the 
slot would be in the shape like S1={A1, B1, C1, D1, E1}, S2={B2, C2, 
E2},S3={B3,C3,E3},...,S15={B15,C15,E15},S16={C16},...,S20={C20}
I think what you want is to have S1-S15 evenly distributed in task managers. 
However, it is not supported currently because the resource manager does not 
see tasks and thus cannot know the difference between S1 and S20. I think it 
would not be an easy work in current framework.
A workaroud would be to set the parallelism of B/E to 20 as well so that the 
evenly distribution of slots can result in evenly distribution of B/E.

> improve slot allocation to make resource balance among machines.
> ----------------------------------------------------------------
>
>                 Key: FLINK-18799
>                 URL: https://issues.apache.org/jira/browse/FLINK-18799
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Core, Client / Job Submission
>            Reporter: nobleyd
>            Priority: Major
>
>   I have a completed job, and each vertex may have different parallelism, and 
> what troubles me is that the metric 'cpu used' differs among machines.
>   It comes to be good when I upgraded to use flink1.10, and add 
> 'cluster.evenly-spread-out-slots: true' to flink config. This is good, while 
> sometimes it is not enough.
>   For example, I have 5 taskmanagers(each deployed in one machine). I have a 
> job, and some vertexs and the parallelism info is below.
>  
> ||vertex||parallelism||
> |A|1|
> |B|15|
> |C|20|
> |D|1|
> |E|15|
> In this case, the resource sometimes won't balance very good. What I expected 
> is that the vertext B/C/E can distribute evenly amont 5 taskmanagers. Vertex 
> A and D only have 1 parallelism, and it is just some config stream. 
>   Expected allocation strategy: For each vertex, allocate slot evenly among 
> taskmanagers. Then next vertex and repeat. For example, the result below:
>  
>  
> ||TaskManager1||TaskManager2||TaskManager3||TaskManager4||TaskManager5||
> |{color:#ff0000}A1{color}|{color:#00875a}B1{color}|{color:#00875a}B2{color}|{color:#00875a}B3{color}|{color:#00875a}B4{color}|
> |{color:#00875a}B5{color}|{color:#00875a}B6{color}|{color:#00875a}B7{color}|{color:#00875a}B8{color}|{color:#00875a}B9{color}|
> |{color:#00875a}B10{color}|{color:#00875a}B11{color}|{color:#00875a}B12{color}|{color:#00875a}B13{color}|{color:#00875a}B14{color}|
> |{color:#00875a}B15{color}|{color:#ff8b00}C1{color}|{color:#ff8b00}C2{color}|{color:#ff8b00}C3{color}|{color:#ff8b00}C4{color}|
> |{color:#ff8b00}C5{color}|{color:#ff8b00}C6{color}|{color:#ff8b00}C7{color}|{color:#ff8b00}C8{color}|{color:#ff8b00}C9{color}|
> |{color:#ff8b00}C10{color}|{color:#ff8b00}C11{color}|{color:#ff8b00}C12{color}|{color:#ff8b00}C13{color}|{color:#ff8b00}C14{color}|
> |{color:#ff8b00}C15{color}|{color:#ff8b00}C16{color}|{color:#ff8b00}C17{color}|{color:#ff8b00}C18{color}|{color:#ff8b00}C19{color}|
> |{color:#ff8b00}C20{color}|{color:#403294}D1{color}|E1|E2|E3|
> |E4|E5|E6|E7|E8|
> |E9|E10|E11|E12|E13|
> |E14|E15| | | |
> | | | | | |
> The allocation order is A -> B -> C -> D -> E or some other order, it doesn't 
> matter. The key point is one vertex's all  parallel subtasks should be 
> allocated at one time, and then to consider the next vertex. With this 
> strategy, vertex A/D won't influence other vertex's distribution equilibrium.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18799) improve slot allocation to make resource balance among machines.

Reply via email to