[ 
https://issues.apache.org/jira/browse/YUNIKORN-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1347.
----------------------------------
    Resolution: Implemented

> Yunikorn triggers EKS auto-scaling even pods requests have exceeded the queue 
> limit 
> ------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-1347
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1347
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler, shim - kubernetes
>            Reporter: Anthony Wu
>            Priority: Major
>
> Hi guys,
> We are trying to utilise Yunikorn to manage our AWS EKS infrastructure to 
> limit resource usage for different users and groups. We also use k8s cluster 
> auto-scaler 
> ([https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler]) 
> for auto scaling of the cluster when necessary.
> *Environment*
>  * AWS EKS on k8s 1.21
>  * Yunikorn 1.1 running as k8s scheduler plugin to be most compatible
>  * cluster-autoscaler V1.21.0
> {*}Issues{*}:
> Let's say we have quene has be below limit
> {code:yaml}
> queues:               
> - name: dev
>   submitacl: "*"
>   resources: 
>     max: 
>       memory: 100Gi
>       vcore: 10 
> {code}
>  
> Then we try to create 4 pods in the `dev` queue each requires 5 cores and 
> 50Gi memory
> Then we are getting 2 pods {{Running}} and 2 pods {{{}Pending{}}}, because 
> the queue has reached its limit of 10Gi memory and 10 cpus.
> We would expect the queued pods to not triggering EKS auto scaling, as they 
> would not be able to be allocated until other resources have been release in 
> the queue.
> But what we see is that, the Queued pods still trigger the cluster 
> auto-scaling regardless. As shown in the example below:
> {code:java}
> Status:       Pending
> ...
> Conditions:
>   Type           Status
>   PodScheduled   False
> Events:
>   Type     Reason            Age    From                Message
>   ----     ------            ----   ----                -------
>   Warning  FailedScheduling  3m5s   yunikorn            0/147 nodes are 
> available: 147 Pod is not ready for scheduling.
>   Warning  FailedScheduling  3m5s   yunikorn            0/147 nodes are 
> available: 147 Pod is not ready for scheduling.
>   Normal   Scheduling        3m3s   yunikorn            
> yunikorn/dask-user-07ff5f3b-8qjkl8 is queued and waiting for allocation
>   Normal   TriggeredScaleUp  2m53s  cluster-autoscaler  pod triggered 
> scale-up: 
> [{eksctl-cluster-nodegroup-spot-xlarge-compute-1-NodeGroup-8VURTD4WKCYV 0->4 
> (max: 16)}]
> {code}
> So eventually, EKS auto-added some hosts but not actually been used and 
> allocated as the pods are not approved to be scheduled yet.
> We also tried Gang scheduling with the pods in a task group, but it is also 
> having similar issues: Even the whole gang is not ready to schedule, Yunikorn 
> creates the place-holder pods which triggers auto-scaling of EKS cluster
> *Causes and potential solutions*
> We tried to look at both source code in the auto-scaler and Yunikorn, and we 
> think the reason is just that the auto-scaler does not know about Yunikorn 
> specific events and state (Pending but not QuotaApproved) of a Pod. It 
> searches all the Pods with `PodScheduled=False` to then check whether it 
> needs to add resources for them.
> The issue could be resolved from both side:
>  - To solve from auto-scaler side, it needs to know the special events and 
> state of Yunikorn
>  - To solve from Yunikorn side, I think it needs to not create the pod or at 
> least not in `Pending` phase until it is quota approved 
>  ** not sure how hard to achieve this, but as long as a pod is created and it 
> goes to Pending then auto-scaler will try to pick it up
> We think solving it from Yunikron side would be cleaner, since auto-scaler 
> should not need to know the k8s scheduler implementation in order to make a 
> decision. Also there are other auto-scaler alternatives like AWS Karpenter 
> could suffers the same issue when interact with Yunikorn.
> Wondering whether this issue report make sense to you guys. Let us know if 
> there are any other solutions and whether it is possible to be solved in 
> future :)
> Thanks a lot!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to