[ https://issues.apache.org/jira/browse/YUNIKORN-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko resolved YUNIKORN-2543. ------------------------------------ Fix Version/s: 1.6.0 Resolution: Fixed > Fix locking in RMProxy > ---------------------- > > Key: YUNIKORN-2543 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2543 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > After merging YUNIKORN-2539, we already saw a potential issue with > {{rmproxy.RMProxy}} and {{cache.Context}}: > Gourutine 1: > {noformat} > github.com/apache/yunikorn-core@v0.0.0-20240405160823-c94a7d938c41/pkg/rmproxy/rmproxy.go:307 > rmproxy.(*RMProxy).GetResourceManagerCallback ??? <<<<< > github.com/apache/yunikorn-core@v0.0.0-20240405160823-c94a7d938c41/pkg/rmproxy/rmproxy.go:306 > rmproxy.(*RMProxy).GetResourceManagerCallback ??? > github.com/apache/yunikorn-core@v0.0.0-20240405160823-c94a7d938c41/pkg/rmproxy/rmproxy.go:359 > rmproxy.(*RMProxy).UpdateNode ??? > github.com/apache/yunikorn-k8shim/pkg/cache/context.go:1603 > cache.(*Context).updateNodeResources ??? > github.com/apache/yunikorn-k8shim/pkg/cache/context.go:484 > cache.(*Context).updateNodeOccupiedResources ??? > github.com/apache/yunikorn-k8shim/pkg/cache/context.go:392 > cache.(*Context).updateForeignPod ??? > github.com/apache/yunikorn-k8shim/pkg/cache/context.go:286 > cache.(*Context).UpdatePod ??? > {noformat} > Goroutine 2: > {noformat} > github.com/apache/yunikorn-k8shim/pkg/cache/context.go:847 > cache.(*Context).ForgetPod ??? <<<<< > github.com/apache/yunikorn-k8shim/pkg/cache/context.go:846 > cache.(*Context).ForgetPod ??? > github.com/apache/yunikorn-k8shim/pkg/cache/scheduler_callback.go:104 > cache.(*AsyncRMCallback).UpdateAllocation ??? > github.com/apache/yunikorn-core@v0.0.0-20240405160823-c94a7d938c41/pkg/rmproxy/rmproxy.go:162 > rmproxy.(*RMProxy).triggerUpdateAllocation ??? > github.com/apache/yunikorn-core@v0.0.0-20240405160823-c94a7d938c41/pkg/rmproxy/rmproxy.go:150 > rmproxy.(*RMProxy).processRMReleaseAllocationEvent ??? > github.com/apache/yunikorn-core@v0.0.0-20240405160823-c94a7d938c41/pkg/rmproxy/rmproxy.go:234 > rmproxy.(*RMProxy).handleRMEvents ??? > {noformat} > Right now this seems to be safe because we only call {{RLock()}} in the > {{RMProxy}} methods. However, should any of this change, we're in trouble due > to lock ordering (Cache->RMProxy and RMProxy->Cache). > We need to investigate why we use only {{RLock()}} and whether it's needed at > all. If nothing is modified, then we can drop the mutex completely. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org