[ https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882842#comment-17882842 ]
Wilfred Spiegelenburg commented on YUNIKORN-2784: ------------------------------------------------- Correct there is no instant way to move. That is why we are looking at the change in YUNIKORN-2791. It will expose all pods even the ones not scheduled by YuniKorn inside YuniKorn. Instead of the pods showing up as a usage on the node only we see the pod and can look at possible preemption. This is the same case for all pod types not just daemon sets. You have a limit range set on your cluster. The pods might be tiny when you create them but they are not when you schedule them. The pod asks for 3GB of memory as each container is given a minimum of 1GB. Check the pod for details it is annotated on the pod that the container resources were changed. The limit range will be applied to every pod in the cluster. Which means that a pod with 3 containers each asking for 100MB of memory, 300MB total for the pod, after the limit range application needs 3GB when scheduling. A 10 fold increase. If that happens for all your pods you waste a huge amount of resources. It could explain also why the node is seen as "full" when you expect it to be empty. > Scheduler stuck > --------------- > > Key: YUNIKORN-2784 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2784 > Project: Apache YuniKorn > Issue Type: Bug > Reporter: Dmitry > Priority: Major > Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot > 2024-08-02 at 1.20.23 PM.png, dumps.tgz, logs > > > Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending > (screenshot 1). Also all other ones, but these are the most visible and > should be running 100%. > After restarting the scheduler, all get scheduled immediately (screenshot 2). > Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and > `/debug/pprof/goroutine?debug=2` > Also logs from the scheduler. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org