Rainie Li created YUNIKORN-1988:
-----------------------------------

             Summary: Preemption happens when a queue lower than its guaranteed 
capacity 
                 Key: YUNIKORN-1988
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1988
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
            Reporter: Rainie Li
            Assignee: Rainie Li


*Background:* 
We set tier based priorityClass and using 1.3 with Admission controller in 
production (our prod cluster has hundreds of EKS nodes). 
Many production tier2 jobs got preempted unexpectedly. From application log, we 
saw driver pods all got shutdown.

Most failed jobs were from the same queue, we set 300G as guaranteed memory for 
queue that got preempted, all driver pods required 24G memory. We disabled 
preemption feature in production to mitigate the issue.

*Investigation:* 

Reproduced the issue on dev env, preemption can happen when a queue lower than 
its guaranteed capacity 

I am investigating how to fix the issue. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to