Chaoran Yu created YUNIKORN-1085:
------------------------------------

             Summary: DaemonSet pods may fail to be scheduled on new nodes 
added during autoscaling
                 Key: YUNIKORN-1085
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1085
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: shim - kubernetes
    Affects Versions: 0.12.2
         Environment: Amazon EKS, K8s 1.20, Cluster Autoscaler
            Reporter: Chaoran Yu


After YUNIKORN-704 was done, YuniKorn should have the same mechanism as the 
default scheduler when it comes to scheduling DaemonSet pods. That's the case 
most times in our deployments. But recently we have found that DaemonSet 
scheduling became problematic again: When K8s Cluster Autoscaler adds new nodes 
in response to pending pods in the cluster, EKS will automatically create a CNI 
DaemonSet (Amazon's container networking module), one pod on each newly created 
node. But YuniKorn could not schedule these pods successfully. There's no 
informative error messages. The default queue that these pods belong to have 
available resources too. Because they couldn't be scheduled, EKS refuses to 
mark the new nodes as ready, they then get stuck in NotReady state. This issue 
is not always reproducible, but it has happened a few times. The root cause 
needs to be further researched



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to