[ https://issues.apache.org/jira/browse/YUNIKORN-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adam Antal updated YUNIKORN-390: -------------------------------- Description: The scheduler has crashed if the parent specified for the tag placement rule is not existing. The bug is in this line ({{core/pkg/scheduler/placement/tag_rule.go#93}}) {code:go} if info.GetQueue(parentName).IsLeafQueue() { return "", fmt.Errorf("parent rule returned a leaf queue: %s", parentName) } {code} {{info.GetQueue(parentName)}} returns nil, which causes the crash. Full stack trace: {noformat} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x198b707] goroutine 116 [running]: github.com/apache/incubator-yunikorn-core/pkg/cache.(*QueueInfo).IsLeafQueue(...) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/cache/queue_info.go:198 github.com/apache/incubator-yunikorn-core/pkg/scheduler/placement.(*tagRule).placeApplication(0xc005d50050, 0xc000494100, 0xc0006bc210, 0xc00644a300, 0x2, 0x2, 0x10502b1) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/placement/tag_rule.go:93 +0xb47 github.com/apache/incubator-yunikorn-core/pkg/scheduler/placement.(*AppPlacementManager).PlaceApplication(0xc005d50000, 0xc000494100, 0x0, 0x0) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/placement/placement.go:141 +0x485 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*partitionSchedulingContext).addSchedulingApplication(0xc0002e20e0, 0xc005b36120, 0x0, 0x0) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduling_partition.go:108 +0x892 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterSchedulingContext).addSchedulingApplication(0xc000012000, 0xc005b36120, 0x0, 0x0) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduling_context.go:114 +0x1d5 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).addNewApplication(0xc000390000, 0xc000494100, 0xc000738121, 0x9) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:209 +0x277 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).processApplicationUpdateEvent(0xc000390000, 0xc00a7541e0) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:447 +0x9ec github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).handleSchedulerEvent(0xc000390000) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:596 +0x40a created by github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).StartService /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:67 +0x9e {noformat} I also attach the placement rule, but note that I was working on YUNIKORN-368, so the code is not 100% the same: {noformat} partitions: - name: default placementrules: - name: tag value: namespace create: true parent: name: tag value: "namespace.parentqueue" create: true queues: - name: root submitacl: '*' queues: - name: default submitacl: '*' {noformat} where the {{namespace.parentqueue}} is set to "root.special". My proposal is that even if the queue does not exist, it shouldn't crash. Let's make a double check before doing getting the {{QueueInfo}} object. was: The scheduler has crashed if the parent specified for the tag placement rule is not existing. The bug is in this line: {code:go} if info.GetQueue(parentName).IsLeafQueue() { return "", fmt.Errorf("parent rule returned a leaf queue: %s", parentName) } {code} {{info.GetQueue(parentName)}} returns nil, which causes the crash. Full stack trace: {noformat} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x198b707] goroutine 116 [running]: github.com/apache/incubator-yunikorn-core/pkg/cache.(*QueueInfo).IsLeafQueue(...) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/cache/queue_info.go:198 github.com/apache/incubator-yunikorn-core/pkg/scheduler/placement.(*tagRule).placeApplication(0xc005d50050, 0xc000494100, 0xc0006bc210, 0xc00644a300, 0x2, 0x2, 0x10502b1) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/placement/tag_rule.go:93 +0xb47 github.com/apache/incubator-yunikorn-core/pkg/scheduler/placement.(*AppPlacementManager).PlaceApplication(0xc005d50000, 0xc000494100, 0x0, 0x0) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/placement/placement.go:141 +0x485 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*partitionSchedulingContext).addSchedulingApplication(0xc0002e20e0, 0xc005b36120, 0x0, 0x0) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduling_partition.go:108 +0x892 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterSchedulingContext).addSchedulingApplication(0xc000012000, 0xc005b36120, 0x0, 0x0) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduling_context.go:114 +0x1d5 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).addNewApplication(0xc000390000, 0xc000494100, 0xc000738121, 0x9) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:209 +0x277 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).processApplicationUpdateEvent(0xc000390000, 0xc00a7541e0) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:447 +0x9ec github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).handleSchedulerEvent(0xc000390000) /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:596 +0x40a created by github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).StartService /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:67 +0x9e {noformat} I also attach the placement rule, but note that I was working on YUNIKORN-368, so the code is not 100% the same: {noformat} partitions: - name: default placementrules: - name: tag value: namespace create: true parent: name: tag value: "namespace.parentqueue" create: true queues: - name: root submitacl: '*' queues: - name: default submitacl: '*' {noformat} where the {{namespace.parentqueue}} is set to "root.special". My proposal is that even if the queue does not exist, it shouldn't crash. Let's make a double check before doing getting the {{QueueInfo}} object. > SIGSEGV if parent queue does not exist for tag rule > --------------------------------------------------- > > Key: YUNIKORN-390 > URL: https://issues.apache.org/jira/browse/YUNIKORN-390 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes > Affects Versions: 0.10 > Reporter: Adam Antal > Priority: Blocker > > The scheduler has crashed if the parent specified for the tag placement rule > is not existing. > The bug is in this line ({{core/pkg/scheduler/placement/tag_rule.go#93}}) > {code:go} > if info.GetQueue(parentName).IsLeafQueue() { > return "", fmt.Errorf("parent rule returned a leaf queue: %s", parentName) > } > {code} > {{info.GetQueue(parentName)}} returns nil, which causes the crash. > Full stack trace: > {noformat} > panic: runtime error: invalid memory address or nil pointer dereference > [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x198b707] > goroutine 116 [running]: > github.com/apache/incubator-yunikorn-core/pkg/cache.(*QueueInfo).IsLeafQueue(...) > > /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/cache/queue_info.go:198 > github.com/apache/incubator-yunikorn-core/pkg/scheduler/placement.(*tagRule).placeApplication(0xc005d50050, > 0xc000494100, 0xc0006bc210, 0xc00644a300, 0x2, 0x2, 0x10502b1) > > /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/placement/tag_rule.go:93 > +0xb47 > github.com/apache/incubator-yunikorn-core/pkg/scheduler/placement.(*AppPlacementManager).PlaceApplication(0xc005d50000, > 0xc000494100, 0x0, 0x0) > > /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/placement/placement.go:141 > +0x485 > github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*partitionSchedulingContext).addSchedulingApplication(0xc0002e20e0, > 0xc005b36120, 0x0, 0x0) > > /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduling_partition.go:108 > +0x892 > github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterSchedulingContext).addSchedulingApplication(0xc000012000, > 0xc005b36120, 0x0, 0x0) > > /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduling_context.go:114 > +0x1d5 > github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).addNewApplication(0xc000390000, > 0xc000494100, 0xc000738121, 0x9) > > /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:209 > +0x277 > github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).processApplicationUpdateEvent(0xc000390000, > 0xc00a7541e0) > > /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:447 > +0x9ec > github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).handleSchedulerEvent(0xc000390000) > > /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:596 > +0x40a > created by > github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).StartService > > /Users/adamantal/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20200827055746-57d663e73cb1/pkg/scheduler/scheduler.go:67 > +0x9e > {noformat} > I also attach the placement rule, but note that I was working on > YUNIKORN-368, so the code is not 100% the same: > {noformat} > partitions: > - name: default > placementrules: > - name: tag > value: namespace > create: true > parent: > name: tag > value: "namespace.parentqueue" > create: true > queues: > - name: root > submitacl: '*' > queues: > - name: default > submitacl: '*' > {noformat} > where the {{namespace.parentqueue}} is set to "root.special". > My proposal is that even if the queue does not exist, it shouldn't crash. > Let's make a double check before doing getting the {{QueueInfo}} object. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org