Peter Bacsko created YUNIKORN-2520:
--------------------------------------

             Summary: PVC errors in AssumePod() is not handled properly
                 Key: YUNIKORN-2520
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2520
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: shim - kubernetes
            Reporter: Peter Bacsko


When there is an error caused by a volume operation in {{{}AssumePod(){}}}, the 
allocation on core side will not be removed.

Although we check the result from UpdateAllocation, the error handling is just 
logging:
{noformat}
                if err := callback.UpdateAllocation(response); err != nil {
                        rmp.handleUpdateResponseError(rmID, err)
                }
...

func (rmp *RMProxy) handleUpdateResponseError(rmID string, err error) {
    log.Log(log.RMProxy).Error("failed to handle response",
       zap.String("rmID", rmID),
       zap.Error(err))
}{noformat}
I suggest moving volume-related code to {{{}Task.postTaskAllocated{}}}. In this 
case, the task will transition to "Failed" state and we'll have allocationID 
available, so we can release both the ask and the allocation:
{noformat}
func (task *Task) releaseAllocation() {
                ...
                var releaseRequest *si.AllocationRequest
                s := TaskStates()
                switch task.GetTaskState() {
                case s.New, s.Pending, s.Scheduling, s.Rejected:
                        releaseRequest = common.CreateReleaseAskRequestForTask(
                                task.applicationID, task.taskID, 
task.application.partition)  <-- release ask + allocation if possible
                default:
                        if task.allocationID == "" {
                                ... log error ...
                                return
                        }
                        releaseRequest = 
common.CreateReleaseAllocationRequestForTask(
                                task.applicationID, task.taskID, 
task.allocationID, task.application.partition, task.terminationType)
                }
...{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to