Thomas created YUNIKORN-3038:
--------------------------------
Summary: Nil pointer dereference on GetQueuePath
Key: YUNIKORN-3038
URL: https://issues.apache.org/jira/browse/YUNIKORN-3038
Project: Apache YuniKorn
Issue Type: Bug
Reporter: Thomas
We're observing quite some occurences of following panic:
{code:java}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0
pc=0x1b037c9]goroutine 81 [running]:
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).GetQueuePath(0x0)
github.com/apache/[email protected]/pkg/scheduler/objects/queue.go:548
+0x29
github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation(0xc00b4d0b60,
0xc0135d9a40)
github.com/apache/[email protected]/pkg/scheduler/partition.go:1501
+0xf1c
github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases(0xc0000fa080,
{0xc0146546a0, 0x1, 0xc0000a82a0?}, {0xc00645a340, 0x9})
github.com/apache/[email protected]/pkg/scheduler/context.go:780 +0xa9
github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent(0xc00b9ebf88?,
0xc015e3ff08?)
github.com/apache/[email protected]/pkg/scheduler/context.go:716 +0x5d
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent(0xc0004002d0)
github.com/apache/[email protected]/pkg/scheduler/scheduler.go:133
+0x18e
created by
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService in
goroutine 1
github.com/apache/[email protected]/pkg/scheduler/scheduler.go:60
+0x9c{code}
This is always preceded by
{code:java}
2025-03-03T14:35:07.138Z WARN core.scheduler.partition
scheduler/partition.go:1483 failed to release resources from queue
{"appID": "spark-045e89205af548a6b2661e82fd3a0704", "allocationKey":
"bb7808f0-4a77-4469-a394-86a9de766609", "error": "queue is nil"} {code}
On inspection of the mentioned appID, we do see a queue defined:
{code:java}
annotations:
yunikorn.apache.org/task-groups:
'[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"1","memory":"5120Mi"},"labels":{"deploy_env":"prod","driver-type":"batch","job_id":"j-250303112642423baa5a25458a7b2037","name":"2503031126-d
river","openeo-role":"batch-driver","openeo_component":"batchjobs","queue":"root.default","role":"driver","user_id":"9e001a2a-1186-4b46-8f90-0f44cbcb13a9","version":"3.2.0"}},{"name":"spark-executor","minMember":2,"minResource":{"cpu":"50
0.0m","memory":"6920Mi"},"labels":{"deploy_env":"prod","job_id":"j-250303112642423baa5a25458a7b2037","openeo-role":"executor","openeo_component":"batchjobs","queue":"root.default","user_id":"9e001a2a-1186-4b46-8f90-0f44cbcb13a9","version"
:"3.2.0"}}]'{code}
Also, the label `queue: root.default` exists
{*}Yunikorn version{*}: 1.6.1
{*}Yunikorn config{*}:
{code:java}
queues.yaml: |
partitions:
- name: default
placementrules:
- name: provided
create: true
queues:
- name: root
parent: true
submitacl: "*"
queues:
- name: default
parent: false
properties:
preemption.policy: disabled
- name: cdse-prod
parent: true
queues:
- name: batch
childtemplate:
maxapplications: 2
properties:
preemption.policy: disabled
service.clusterId: cdse-prod{code}
{*}Kubernetes version{*}: 1.25.7
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]