[jira] [Commented] (YUNIKORN-99) Enhanced FIFO scheduling for batch workloads
[ https://issues.apache.org/jira/browse/YUNIKORN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136306#comment-17136306 ] Wilfred Spiegelenburg commented on YUNIKORN-99: --- Updated design document with all issues fixed in a pdf form [^FIFO Scheduling for batch workloads.pdf] > Enhanced FIFO scheduling for batch workloads > > > Key: YUNIKORN-99 > URL: https://issues.apache.org/jira/browse/YUNIKORN-99 > Project: Apache YuniKorn > Issue Type: New Feature > Components: core - scheduler >Reporter: Weiwei Yang >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 0.9 > > Attachments: FIFO Scheduling for batch workloads.pdf > > > An enhanced version of FIFO scheduling for batch workloads -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-99) Enhanced FIFO scheduling for batch workloads
[ https://issues.apache.org/jira/browse/YUNIKORN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129974#comment-17129974 ] Wilfred Spiegelenburg commented on YUNIKORN-99: --- PR created for documentation > Enhanced FIFO scheduling for batch workloads > > > Key: YUNIKORN-99 > URL: https://issues.apache.org/jira/browse/YUNIKORN-99 > Project: Apache YuniKorn > Issue Type: New Feature > Components: core - scheduler >Reporter: Weiwei Yang >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 0.9 > > > An enhanced version of FIFO scheduling for batch workloads -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-99) Enhanced FIFO scheduling for batch workloads
[ https://issues.apache.org/jira/browse/YUNIKORN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114130#comment-17114130 ] Wilfred Spiegelenburg commented on YUNIKORN-99: --- code changes committed, leaving open to add documentation > Enhanced FIFO scheduling for batch workloads > > > Key: YUNIKORN-99 > URL: https://issues.apache.org/jira/browse/YUNIKORN-99 > Project: Apache YuniKorn > Issue Type: New Feature > Components: core - scheduler >Reporter: Weiwei Yang >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 0.9 > > > An enhanced version of FIFO scheduling for batch workloads -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-99) Enhanced FIFO scheduling for batch workloads
[ https://issues.apache.org/jira/browse/YUNIKORN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112208#comment-17112208 ] Wilfred Spiegelenburg commented on YUNIKORN-99: --- First part of the documentation which will need to be added when done: *Application states* * New: new app that is incomplete, from here the app transitions into the accepted state when it is completed (i.e. add an ask) * Accepted: the application is ready and part of the scheduling cycle. On allocation of the first ask the app moves into a starting state. * Starting: the app has exactly one running container/pod. The application transitions to running after more allocations are added to the app or if no more no pods are expected (timeout or just one cont/pod for the app) * Running: the normal state for the application, Containers/pods can start and stop and are scheduled when requested. * Waiting: An app that has no pending asks or running containers/pod will be Waiting for new asks to be added and then move back into Running * Completed: The app has signalled it is done and not expected to return * Killed: Removed by the shim at the request of an admin or user * Rejected: The application is rejected when it was added to the scheduler. This only happens when a shim tries to add an app, when it gets created in a New state, and the scheduler rejects the creation > Enhanced FIFO scheduling for batch workloads > > > Key: YUNIKORN-99 > URL: https://issues.apache.org/jira/browse/YUNIKORN-99 > Project: Apache YuniKorn > Issue Type: New Feature > Components: core - scheduler >Reporter: Weiwei Yang >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > > An enhanced version of FIFO scheduling for batch workloads -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-99) Enhanced FIFO scheduling for batch workloads
[ https://issues.apache.org/jira/browse/YUNIKORN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106362#comment-17106362 ] Wilfred Spiegelenburg commented on YUNIKORN-99: --- PR opened with the base changes for the state aware scheduling of applications. Two new states added to the app: waiting and starting. An app is now moving from a new state to an accepted state when the first ask is added to the application. As soon as that ask is allocated the app moves from accepted to starting. From starting the app moves to running. An app can stay in the starting state for a maximum of 5 minutes or it moves before that if more allocations are added to the application. The rest of the time the app will spend in the running state. It can leave the running state and move to waiting if there are no outstanding asks and no allocations for that app. That means the app is not done but there is nothing scheduled for the app. If a new ask gets added the app moves back to running and gets scheduled as normal. Applications can be killed or marked as completed by the RM if it can determine the state. The scheduler does not move the app to those state itself. The state aware policy for applications in a queue leverages the new state to sort the applications. The logic for sorting applications in a queue is as follows: - only apps with pending resources are scheduled - apps are sorted based on submission time, oldest app first - all running applications are candidates - a maximum of one app in the starting state will be added to the list of running apps - if the queue contains no (0) starting apps, with or without pending resources, the oldest app in the accepted state will be added The queue will then use that list of apps to schedule. On recovery apps that have existing allocations are considered to be in a running state. > Enhanced FIFO scheduling for batch workloads > > > Key: YUNIKORN-99 > URL: https://issues.apache.org/jira/browse/YUNIKORN-99 > Project: Apache YuniKorn > Issue Type: New Feature > Components: core - scheduler >Reporter: Weiwei Yang >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > > An enhanced version of FIFO scheduling for batch workloads -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org