[ 
https://issues.apache.org/jira/browse/YUNIKORN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310333#comment-17310333
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-614:
------------------------------------------------

In a deployment without the admission controller you can already have multiple 
schedulers, even multiple YuniKorn schedulers. You need to set the 
{{/spec/schedulerName}} in the pod spec to route to the different schedulers. 
That is just a tiny part of the problem.

We looked at this before and had a number of problems with the co-existence. 
The default scheduler assumes it is the only one in the cluster.

Pods can be pushed to a scheduler via a simple annotation (see above). Picking 
up a limited set of pods is thus simple, although dynamically routing and 
changing that becomes a lot more tricky (complex admission controller). 
Scheduling looks at the pod level but still nodes are shared. This can lead to 
node conflicts. Currently any node conflicts that arise will just cause 
scheduling to fail for both pods. Then the race starts for both schedulers to 
get the pod on that node (this is by design).

Picking up a subset of nodes is way more difficult. Node labels could be used 
to split a cluster, static splits look simple but how do you tell the default 
scheduler to ignore nodes? From a YuniKorn perspective we could implement 
something like this as we have control over what we do.

There will still be issues that we might never be able to solve. For instance 
the auto scaler: it would need to become aware of multiple schedulers and or 
node splits etc. Pod affinity will only work in a scheduler but if the nodes 
are shared that might break.

> support co-deployment of YuniKorn with other schedulers, or multiple 
> YuniKorn, in a single cluster
> --------------------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-614
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-614
>             Project: Apache YuniKorn
>          Issue Type: New Feature
>          Components: core - scheduler, deployment
>            Reporter: Bowen Li
>            Priority: Major
>
> IIUIC, YuniKorn right now cannot be co-deployed with other schedulers, like 
> default K8S scheduler, nor can users deploy multiple YuniKorn, in a single 
> cluster.
> We have use cases of such requirements:
>  # we want to have YuniKorn as the "active" scheduler to take job requests, 
> and still be able to route requests to another scheduler as "backup" in case 
> of any issues.
>  # deploy and test a new YuniKorn version to take like 20% traffic to a K8S 
> cluster, while keeping an old, stable version taking the remaining 80% traffic
> Seems we cannot do either at the moment, and a workaround is to deploy 
> another cluster for the above use cases, which inevitably bring in huge 
> DevOps costs.
> Ideally, we should support co-deployment of YuniKorn with other schedulers, 
> or multiple YuniKorn, in a single cluster. We can further break down this 
> ticket to sub-tasks to handle each of the above use cases.
> Traffic routing to different schedulers can be scoped and determined by some 
> K8S native concepts like namespaces and node groups, or some custom concepts 
> of YuniKorn which can be mapped to K8S native concepts for easier management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to