[jira] [Created] (YUNIKORN-2575) Make logging for IsPodFitNode clear
Wilfred Spiegelenburg created YUNIKORN-2575: --- Summary: Make logging for IsPodFitNode clear Key: YUNIKORN-2575 URL: https://issues.apache.org/jira/browse/YUNIKORN-2575 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The logging in {{IsPodFitNode()}} logs the same message for a missing pod and node. We should log clearly which thing is missing: the node or the pod. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2544) [UMBRELLA] Fix Yunikorn potential locking issues
[ https://issues.apache.org/jira/browse/YUNIKORN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2544. Resolution: Fixed All subtasks have been resolved, closing ticket. > [UMBRELLA] Fix Yunikorn potential locking issues > > > Key: YUNIKORN-2544 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2544 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Go tool [go-deadlock|https://github.com/sasha-s/go-deadlock/] identified > several potential deadlocks in Yunikorn. > Some of these do not cause problems right now, but a lock-related change in > the future can trigger a deadlock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2563) [shim] Enable deadlock detection during unit tests
[ https://issues.apache.org/jira/browse/YUNIKORN-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved YUNIKORN-2563. -- Fix Version/s: 1.6.0 Resolution: Fixed > [shim] Enable deadlock detection during unit tests > -- > > Key: YUNIKORN-2563 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2563 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes, test - unit >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2574) totalPartitionResource should not be mutated with AddTo/SubFrom
Peter Bacsko created YUNIKORN-2574: -- Summary: totalPartitionResource should not be mutated with AddTo/SubFrom Key: YUNIKORN-2574 URL: https://issues.apache.org/jira/browse/YUNIKORN-2574 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Affects Versions: 1.5.0, 1.4.0 Reporter: Peter Bacsko Assignee: Peter Bacsko There is a potential data race in PartitionContext: the field "totalPartitionResource" is mutated in place. The problem is that the method {{GetTotalPartitionResource()}} does not clone it. {noformat} func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource { pc.RLock() defer pc.RUnlock() return pc.totalPartitionResource } {noformat} In general, we should prefer the immutable approach for variables like this, just like in {{objects.Queue}}: {noformat} func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported bool) error { // check this queue: failure stops checks if the allocation is not part of a node addition newAllocated := resources.Add(sq.allocatedResource, alloc) [ ... removed ... ] sq.Lock() defer sq.Unlock() // all OK update this queue sq.allocatedResource = newAllocated sq.updateAllocatedResourceMetrics() return nil } // incPendingResource increments pending resource of this queue and its parents. func (sq *Queue) incPendingResource(delta *resources.Resource) { // update the parent if sq.parent != nil { sq.parent.incPendingResource(delta) } // update this queue sq.Lock() defer sq.Unlock() sq.pending = resources.Add(sq.pending, delta) sq.updatePendingResourceMetrics() } {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2547) Queue: clean up logic when adding application
[ https://issues.apache.org/jira/browse/YUNIKORN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved YUNIKORN-2547. -- Fix Version/s: 1.6.0 Resolution: Fixed > Queue: clean up logic when adding application > - > > Key: YUNIKORN-2547 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2547 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > We found two issues when adding an application to a queue: > # Inside {{Queue.AddApplication()}}, we parse and process "quota" and > "guaranteed" from the application tags, then we set them on the queue if they > have a valid value. We shouldn't be doing this inside {{AddApplication()}}, > but rather when we're constructing the application object. That way, they're > already available when the app is being added. > # We an add application to the Queue, but this can be reverted immediately if > the placeholder doesn't fit or the "sortType" is not FIFO. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2541) Fix CVE-2023-45288
[ https://issues.apache.org/jira/browse/YUNIKORN-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu-Lin Chen resolved YUNIKORN-2541. --- Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. Thanks for [~targetoee]'s contribution. > Fix CVE-2023-45288 > -- > > Key: YUNIKORN-2541 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2541 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: JiaChi Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Update golang.org/x/net to 0.23.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2562) Nil pointer panic in Application.ReplaceAllocation()
[ https://issues.apache.org/jira/browse/YUNIKORN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2562. Fix Version/s: 1.6.0 1.5.1 Resolution: Fixed > Nil pointer panic in Application.ReplaceAllocation() > > > Key: YUNIKORN-2562 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2562 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.1 > > > The following panic was generated during placeholder replacement: > {noformat} > 2024-04-16T13:46:58.583Z INFOshim.cache.task cache/task.go:542 > releasing allocations {"numOfAsksToRelease": 1, > "numOfAllocationsToRelease": 1} > 2024-04-16T13:46:58.583Z INFOshim.fsmcache/task_state.go:380 > Task state transition {"app": "application-spark-abrdrsmo8no2", "task": > "cd73be15-af61-4248-89e1-d3296e72214e", "taskAlias": > "obem-spark/tg-application-spark-abrdrsmo8n-spark-driver-y71h0amzo5", > "source": "Bound", "destination": "Completed", "event": "CompleteTask"} > 2024-04-16T13:46:58.584Z INFOcore.scheduler.application > objects/application.go:616 ask removed successfully from application > {"appID": "application-spark-abrdrsmo8no2", "ask": > "cd73be15-af61-4248-89e1-d3296e72214e", "pendingDelta": "map[]"} > 2024-04-16T13:46:58.584Z INFOcore.scheduler.partition > scheduler/partition.go:1281 replacing placeholder allocation > {"appID": "application-spark-abrdrsmo8no2", "allocationID": > "cd73be15-af61-4248-89e1-d3296e72214e"} > panic: runtime error: invalid memory address or nil pointer dereference > [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x17e1255] > goroutine 117 [running]: > github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).ReplaceAllocation(0xc008c46600, > {0xc007710cf0, 0x24}) > > github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/objects/application.go:1745 > +0x615 > github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation(0x?, > 0xc009786700) > > github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/partition.go:1284 > +0x28b > github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases(0xc00be64ba0?, > {0xc00bb1af90, 0x1, 0x40a0fa?}, {0x1e0d902, 0x9}) > github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/context.go:870 > +0x9e > github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent(0xc0005f5f58?, > 0xc0071a3f10?) > github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/context.go:750 > +0xa5 > github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent(0xc000700540) > github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/scheduler.go:133 > +0x1c5 > created by > github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService in > goroutine 1 > github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/scheduler.go:60 > +0x9c > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org