[ https://issues.apache.org/jira/browse/FLINK-18799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169874#comment-17169874 ]
Zhu Zhu commented on FLINK-18799: --------------------------------- Thanks for the explanation [~nobleyd] In your case there would be 20 shared slots allocated to tasks, usually the slot would be in the shape like S1={A1, B1, C1, D1, E1}, S2={B2, C2, E2},S3={B3,C3,E3},...,S15={B15,C15,E15},S16={C16},...,S20={C20} I think what you want is to have S1-S15 evenly distributed in task managers. However, it is not supported currently because the resource manager does not see tasks and thus cannot know the difference between S1 and S20. I think it would not be an easy work in current framework. A workaroud would be to set the parallelism of B/E to 20 as well so that the evenly distribution of slots can result in evenly distribution of B/E. > improve slot allocation to make resource balance among machines. > ---------------------------------------------------------------- > > Key: FLINK-18799 > URL: https://issues.apache.org/jira/browse/FLINK-18799 > Project: Flink > Issue Type: Improvement > Components: API / Core, Client / Job Submission > Reporter: nobleyd > Priority: Major > > I have a completed job, and each vertex may have different parallelism, and > what troubles me is that the metric 'cpu used' differs among machines. > It comes to be good when I upgraded to use flink1.10, and add > 'cluster.evenly-spread-out-slots: true' to flink config. This is good, while > sometimes it is not enough. > For example, I have 5 taskmanagers(each deployed in one machine). I have a > job, and some vertexs and the parallelism info is below. > > ||vertex||parallelism|| > |A|1| > |B|15| > |C|20| > |D|1| > |E|15| > In this case, the resource sometimes won't balance very good. What I expected > is that the vertext B/C/E can distribute evenly amont 5 taskmanagers. Vertex > A and D only have 1 parallelism, and it is just some config stream. > Expected allocation strategy: For each vertex, allocate slot evenly among > taskmanagers. Then next vertex and repeat. For example, the result below: > > > ||TaskManager1||TaskManager2||TaskManager3||TaskManager4||TaskManager5|| > |{color:#ff0000}A1{color}|{color:#00875a}B1{color}|{color:#00875a}B2{color}|{color:#00875a}B3{color}|{color:#00875a}B4{color}| > |{color:#00875a}B5{color}|{color:#00875a}B6{color}|{color:#00875a}B7{color}|{color:#00875a}B8{color}|{color:#00875a}B9{color}| > |{color:#00875a}B10{color}|{color:#00875a}B11{color}|{color:#00875a}B12{color}|{color:#00875a}B13{color}|{color:#00875a}B14{color}| > |{color:#00875a}B15{color}|{color:#ff8b00}C1{color}|{color:#ff8b00}C2{color}|{color:#ff8b00}C3{color}|{color:#ff8b00}C4{color}| > |{color:#ff8b00}C5{color}|{color:#ff8b00}C6{color}|{color:#ff8b00}C7{color}|{color:#ff8b00}C8{color}|{color:#ff8b00}C9{color}| > |{color:#ff8b00}C10{color}|{color:#ff8b00}C11{color}|{color:#ff8b00}C12{color}|{color:#ff8b00}C13{color}|{color:#ff8b00}C14{color}| > |{color:#ff8b00}C15{color}|{color:#ff8b00}C16{color}|{color:#ff8b00}C17{color}|{color:#ff8b00}C18{color}|{color:#ff8b00}C19{color}| > |{color:#ff8b00}C20{color}|{color:#403294}D1{color}|E1|E2|E3| > |E4|E5|E6|E7|E8| > |E9|E10|E11|E12|E13| > |E14|E15| | | | > | | | | | | > The allocation order is A -> B -> C -> D -> E or some other order, it doesn't > matter. The key point is one vertex's all parallel subtasks should be > allocated at one time, and then to consider the next vertex. With this > strategy, vertex A/D won't influence other vertex's distribution equilibrium. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)