Weiwei Yang created YUNIKORN-677: ------------------------------------ Summary: Potential resource leak when complete and allocate pod happens simultaneously Key: YUNIKORN-677 URL: https://issues.apache.org/jira/browse/YUNIKORN-677 Project: Apache YuniKorn Issue Type: Bug Reporter: Weiwei Yang
Let's say we have an app that has 1 pod needs for scheduling. The shim submits an app to the core, and start the schedule the pod. In the shim side, this is a task in the Scheduling state. Then we have a race if the following things happen simultaneously: # User deletes the pod, this triggers a CompleteTask event in the shim side, and the shim will send a ReleaseAllocationAskRequest to the core. # Before handling the ReleaseAllocationAskRequest from the shim, the core made an allocation for the given pod and send an Allocation to the shim then the core generates an allocation on a node, core receives the release request and deletes the pending ask; the shim side receives the new allocation, but since the pod has already been deleted so the shim ignores this allocation. In this case, the allocation will be left-over causing the resource leak. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org