Weiwei Yang created YUNIKORN-528:
------------------------------------

             Summary: Nil pointer exception while getting both termination and 
delete pod event
                 Key: YUNIKORN-528
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-528
             Project: Apache YuniKorn
          Issue Type: Sub-task
          Components: shim - kubernetes
            Reporter: Weiwei Yang


During the test, I observed on some occasions the scheduler could run into Nil 
pointer exception like below:

{code}
4-261f-4448-bc0f-5ea14d23f9e8"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x190cf5d]

goroutine 114 [running]:
github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Application).ReplaceAllocation(0xc004250000,
 0xc0038e01b0, 0x24, 0x0)
        
/Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/objects/application.go:1026
 +0xcd
github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation(0xc0026de600,
 0xc0003c0a10, 0x0, 0x0, 0x0, 0x0)
        
/Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/partition.go:1137
 +0x14b5
github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases(0xc0001400f0,
 0xc0066400c0, 0x1, 0x1, 0x7ffeefbff80f, 0x9)
        
/Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/context.go:683
 +0x150
github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocations(0xc0001400f0,
 0xc006730000)
        
/Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/context.go:606
 +0x185
github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterContext).processRMUpdateEvent(0xc0001400f0,
 0xc0066ee0b8)
        
/Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/context.go:213
 +0x77
github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent(0xc00000e3c0)
        
/Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/scheduler.go:112
 +0x416
created by 
github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).StartService
        
/Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/scheduler.go:54
 +0xa2
make: *** [run] Error 2
{code}

the root cause is when the shim deletes a placeholder, it can trigger 2 events 
sometime,
* Pod Update
* Pod Delete
When a pod updated to TERMINATED state and when a pod gets DELETED, the shim 
will send a release request to the core. But when there is a second release 
request, as the previous one already removed the allocation, then we are 
hitting the Nil pointer. We need to avoid sending a second time release if the 
pod is already released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to