[jira] [Commented] (YUNIKORN-2030) Check Headroom checking doesn't prevent failure to allocate resource due to max resource limit exceeded

Wilfred Spiegelenburg (Jira) Thu, 12 Oct 2023 18:49:57 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774755#comment-17774755
 ]


Wilfred Spiegelenburg commented on YUNIKORN-2030:
-------------------------------------------------

There will never be more than one allocation in progress at the same time. 
Allocation processing is by nature single threaded. There are a number of 
points in the allocation process that make running them in parallel difficult. 
We currently do not need it either, performance is more than good enough with a 
single go routine.

The message around the as reported in the details is most likely the result of 
the race condition that was fixed in YUNIKORN-1993. The race condition causes 
the allocated resources of the queue(s) to not be updated correctly. When you 
are in a state like that it will not resolve itself until you restart. 

> Check Headroom checking doesn't prevent failure to allocate resource due to 
> max resource limit exceeded
> -------------------------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2030
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2030
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>            Priority: Major
>
> As reported in YUNIKORN-1996, we are seeing many messages like below from 
> time to time:
> {code:java}
>  WARN    objects/application.go:1504     queue update failed unexpectedly     
>    {“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts 
> queue ‘root.test-queue’ over maximum allocation (map[memory:3300011278336 
> vcore:390584]), current usage (map[memory:3291983380480 pods:91 
> vcore:186000])“}{code}
> Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate 
> why it happened, because it's not supposed to happen as we check if there is 
> enough resource headroom before calling 
>  
> {code:java}
> func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation 
> {code}
> which printed the above message, and only call it when there is enough 
> headroom.
> There maybe a bug in headroom checking?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

[jira] [Commented] (YUNIKORN-2030) Check Headroom checking doesn't prevent failure to allocate resource due to max resource limit exceeded

Reply via email to