[ https://issues.apache.org/jira/browse/FLINK-15959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755894#comment-17755894 ]
Zhu Zhu commented on FLINK-15959: --------------------------------- Thanks for reviving this work. [~xiangyu0xf] Need to mention that a FLIP and at a public discussion in the dev ML is needed before adding new config options, since it changes public interface. > Add min number of slots configuration to limit total number of slots > -------------------------------------------------------------------- > > Key: FLINK-15959 > URL: https://issues.apache.org/jira/browse/FLINK-15959 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination > Affects Versions: 1.11.0 > Reporter: YufeiLiu > Assignee: xiangyu feng > Priority: Major > Labels: auto-deprioritized-major, auto-deprioritized-minor, > auto-unassigned, pull-request-available > > Flink removed `-n` option after FLIP-6, change to ResourceManager start a new > worker when required. But I think maintain a certain amount of slots is > necessary. These workers will start immediately when ResourceManager starts > and would not release even if all slots are free. > Here are some resons: > # Users actually know how many resources are needed when run a single job, > initialize all workers when cluster starts can speed up startup process. > # Job schedule in topology order, next operator won't schedule until prior > execution slot allocated. The TaskExecutors will start in several batchs in > some cases, it might slow down the startup speed. > # Flink support > [FLINK-12122|https://issues.apache.org/jira/browse/FLINK-12122] [Spread out > tasks evenly across all available registered TaskManagers], but it will only > effect if all TMs are registered. Start all TMs at begining can slove this > problem. > *suggestion:* > * Add config "taskmanager.minimum.numberOfTotalSlots" and > "taskmanager.maximum.numberOfTotalSlots", default behavior is still like > before. > * Start plenty number of workers to satisfy minimum slots when > ResourceManager accept leadership(subtract recovered workers). > * Don't comlete slot request until minimum number of slots are registered, > and throw exeception when exceed maximum. > *update* > Finally, we'd like to introduce three config options related to the minimum > resources restriction: > * slotmanager.min-total-resource.cpu > * slotmanager.min-total-resource.memory > * slotmanager.number-of-slots.min > Note that these configuration do not take effect for standalone clusters, > where how many slots are allocated is not controlled by Flink. These config > are best effort and Flink will not block the job progress even if the min > resources are not guaranteed. -- This message was sent by Atlassian Jira (v8.20.10#820010)