[ https://issues.apache.org/jira/browse/YUNIKORN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774755#comment-17774755 ]
Wilfred Spiegelenburg commented on YUNIKORN-2030: ------------------------------------------------- There will never be more than one allocation in progress at the same time. Allocation processing is by nature single threaded. There are a number of points in the allocation process that make running them in parallel difficult. We currently do not need it either, performance is more than good enough with a single go routine. The message around the as reported in the details is most likely the result of the race condition that was fixed in YUNIKORN-1993. The race condition causes the allocated resources of the queue(s) to not be updated correctly. When you are in a state like that it will not resolve itself until you restart. > Check Headroom checking doesn't prevent failure to allocate resource due to > max resource limit exceeded > ------------------------------------------------------------------------------------------------------- > > Key: YUNIKORN-2030 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2030 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Priority: Major > > As reported in YUNIKORN-1996, we are seeing many messages like below from > time to time: > {code:java} > WARN objects/application.go:1504 queue update failed unexpectedly > {“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts > queue ‘root.test-queue’ over maximum allocation (map[memory:3300011278336 > vcore:390584]), current usage (map[memory:3291983380480 pods:91 > vcore:186000])“}{code} > Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate > why it happened, because it's not supposed to happen as we check if there is > enough resource headroom before calling > > {code:java} > func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation > {code} > which printed the above message, and only call it when there is enough > headroom. > There maybe a bug in headroom checking? > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org