[jira] [Resolved] (YUNIKORN-2869) Tagging for 1.6.0
[ https://issues.apache.org/jira/browse/YUNIKORN-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2869. Fix Version/s: 1.6.0 Target Version: 1.6.0 Resolution: Fixed > Tagging for 1.6.0 > - > > Key: YUNIKORN-2869 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2869 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2874) Cannot generate reproducible builds during release
Peter Bacsko created YUNIKORN-2874: -- Summary: Cannot generate reproducible builds during release Key: YUNIKORN-2874 URL: https://issues.apache.org/jira/browse/YUNIKORN-2874 Project: Apache YuniKorn Issue Type: Bug Components: release Reporter: Peter Bacsko When trying to release Yunikorn with REPRODUCIBLE_BUILDS=1 (default), then the following error occurs: {noformat} ~/repos/yunikorn-release/staging/tmp/apache-yunikorn-1.6.0-src/k8shim$ make REPRODUCIBLE_BUILDS=1 scheduler building binary for scheduler docker image docker run -t --rm=true --volume "/home/bacskop/repos/yunikorn-release/staging/tmp/apache-yunikorn-1.6.0-src/k8shim/:/buildroot" "golang:1.22.1" sh -c "cd /buildroot && \ CGO_ENABLED=0 GOOS=linux GOARCH=\"amd64\" go build \ -a \ -o=build/bin/yunikorn-scheduler \ -trimpath \ -ldflags '-buildid= -extldflags \"-static\" -X github.com/apache/yunikorn-k8shim/pkg/conf.buildVersion=1.6.0 -X github.com/apache/yunikorn-k8shim/pkg/conf.buildDate=2024-09-10T08:26:23+00:00 -X github.com/apache/yunikorn-k8shim/pkg/conf.isPluginVersion=false -X github.com/apache/yunikorn-k8shim/pkg/conf.goVersion=1.22.1 -X github.com/apache/yunikorn-k8shim/pkg/conf.arch=amd64 -X github.com/apache/yunikorn-k8shim/pkg/conf.coreSHA=a2d40c81fee104356f9e33120fd557a928f74f2b -X github.com/apache/yunikorn-k8shim/pkg/conf.siSHA=68e8c6cca28a743d797e7908b1225392a3a2 -X github.com/apache/yunikorn-k8shim/pkg/conf.shimSHA=240aeb90951a30c677890b61b50f6bfcafb227b5' \ -tags netgo \ -installsuffix netgo \ ./pkg/cmd/shim/" go: downloading go.uber.org/zap v1.26.0 go: downloading k8s.io/api v0.31.0 ... go: downloading github.com/spf13/cobra v1.8.1 go: downloading golang.org/x/sync v0.8.0 go: downloading github.com/asaskevich/govalidator v0.0.0-20190424111038-f61b66f89f4a pkg/cmd/shim/main.go:31:2: github.com/apache/yunikorn-core@v1.6.0-1: replacement directory ../core/ does not exist pkg/common/constants/constants.go:22:2: github.com/apache/yunikorn-scheduler-interface@v1.6.0-1: replacement directory ../scheduler-interface/ does not exist pkg/locking/locking.go:26:2: github.com/apache/yunikorn-core@v1.6.0-1: replacement directory ../core/ does not exist pkg/common/test/schedulerapi_mock.go:25:2: github.com/apache/yunikorn-scheduler-interface@v1.6.0-1: replacement directory ../scheduler-interface/ does not exist pkg/common/test/recoverable_apps_mock.go:25:2: github.com/apache/yunikorn-scheduler-interface@v1.6.0-1: replacement directory ../scheduler-interface/ does not exist {noformat} The problem is, we don't mount "../core" and "../scheduler-interface" during the release procedure. This is slightly different from a normal build, because we use the "replace" directive in "go.mod", so the two extra directories are necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2493) Preemption Hardening Phase 1
[ https://issues.apache.org/jira/browse/YUNIKORN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2493. Fix Version/s: 1.6.0 Resolution: Fixed > Preemption Hardening Phase 1 > > > Key: YUNIKORN-2493 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2493 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2870) Release notes for 1.6.0
Peter Bacsko created YUNIKORN-2870: -- Summary: Release notes for 1.6.0 Key: YUNIKORN-2870 URL: https://issues.apache.org/jira/browse/YUNIKORN-2870 Project: Apache YuniKorn Issue Type: Sub-task Reporter: Peter Bacsko -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2341) New Queue Web UI
[ https://issues.apache.org/jira/browse/YUNIKORN-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2341. Resolution: Fixed > New Queue Web UI > - > > Key: YUNIKORN-2341 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2341 > Project: Apache YuniKorn > Issue Type: New Feature > Components: webapp >Reporter: Dong-Lin Hsieh >Assignee: Dong-Lin Hsieh >Priority: Major > Fix For: 1.6.0 > > > Fresh new Web UI to visualize queues in YuniKorn. > Subtasks 1 through 12 are the basic components of the new UI. > These components will be used in at least two places. > # Visualize any valid YuniKorn {{{}config.yaml{}}}. > # Visualize the current queues that YuniKorn is using. > Inspired by [tableau/query-graphs|https://github.com/tableau/query-graphs] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2863) New Queue Web UI phase II
Peter Bacsko created YUNIKORN-2863: -- Summary: New Queue Web UI phase II Key: YUNIKORN-2863 URL: https://issues.apache.org/jira/browse/YUNIKORN-2863 Project: Apache YuniKorn Issue Type: Improvement Reporter: Peter Bacsko Assignee: Dong-Lin Hsieh -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2323) Gang scheduling user experience issues
[ https://issues.apache.org/jira/browse/YUNIKORN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2323. Fix Version/s: 1.6.0 Resolution: Fixed > Gang scheduling user experience issues > -- > > Key: YUNIKORN-2323 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2323 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.4.0 >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > In case of any issues, users are finding it bit difficult to understand what > is going on with the gang app. > Issue 1: > "driver pod is getting struck" > At times, when driver pod is not able to run successfully for some reasons, > users are getting the perspective that pod is getting struck and app is > hanged, not moving further. Users are waiting for some time and don't > understand the clear picture. How do we close the gap quickly and communicate > accordingly through events? > Issue 2: > ResumeApplication is fired when all ph's are timed out. Do we need to inform > the users about this event as they may not clue any about this significant > change? > Issue 3: > When Gang app ph's are in progress (and allocated), when there is request for > real asks and there is resource crunch, do we need to trigger auto scaling? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2842) Improve metadata & gang_utils funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2842. Fix Version/s: 1.6.0 Resolution: Fixed > Improve metadata & gang_utils funtion's test coverage > - > > Key: YUNIKORN-2842 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2842 > Project: Apache YuniKorn > Issue Type: Test > Components: shim - kubernetes >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Improve the following test coverage: > * GetPlaceholderResourceRequests (empty resource key case) > * getTaskMetadata (appID empty string case) > * getAppMetadata (get GetTaskGroupsFromAnnotation error case) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2845) Remove SchedulerConf.TestMode
Peter Bacsko created YUNIKORN-2845: -- Summary: Remove SchedulerConf.TestMode Key: YUNIKORN-2845 URL: https://issues.apache.org/jira/browse/YUNIKORN-2845 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Peter Bacsko Assignee: Peter Bacsko After YUNIKORN-2844, there will be no need to use {{SchedulerConf.IsTestMode()}} in the production code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2844) Inject event recorder externally
Peter Bacsko created YUNIKORN-2844: -- Summary: Inject event recorder externally Key: YUNIKORN-2844 URL: https://issues.apache.org/jira/browse/YUNIKORN-2844 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Peter Bacsko Assignee: Peter Bacsko The current implementation creates an event recorder like that: {noformat} func GetRecorder() events.EventRecorder { lock.Lock() defer lock.Unlock() once.Do(func() { // note, the initiation of the event recorder requires on a workable Kubernetes client, // in test mode we should skip this and just use a fake recorder instead. configs := conf.GetSchedulerConf() if !configs.IsTestMode() { k8sClient := client.NewKubeClient(configs.KubeConfig) eventBroadcaster := events.NewBroadcaster(&events.EventSinkImpl{ Interface: k8sClient.GetClientSet().EventsV1()}) eventBroadcaster.StartRecordingToSink(make(<-chan struct{})) eventRecorder = eventBroadcaster.NewRecorder(scheme.Scheme, constants.SchedulerName) } }) return eventRecorder } {noformat} The problem with this approach is that we need to indicate "test mode" in the config, which just complicates things. We can simplify this code if the recorder is set during Yunikorn initialization in {{NewScheduler()}}. The plugin code already does this in {{NewSchedulerPlugin()}} and calls {{events.SetRecorder(handle.EventRecorder())}}. We should also get rid of the default fake recorder. This uses a buffered channel with the size of 1024. This isn't a problem now, but if a new test somehow ends up generating a lot of events, message sending will block. It might not be obvious to someone to understand why running a unit test just starts to block suddenly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2833) [SI] Track non-Yunikorn allocations
Peter Bacsko created YUNIKORN-2833: -- Summary: [SI] Track non-Yunikorn allocations Key: YUNIKORN-2833 URL: https://issues.apache.org/jira/browse/YUNIKORN-2833 Project: Apache YuniKorn Issue Type: Sub-task Components: scheduler-interface Reporter: Peter Bacsko Assignee: Peter Bacsko -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2834) [shim] Track non-Yunikorn allocations
Peter Bacsko created YUNIKORN-2834: -- Summary: [shim] Track non-Yunikorn allocations Key: YUNIKORN-2834 URL: https://issues.apache.org/jira/browse/YUNIKORN-2834 Project: Apache YuniKorn Issue Type: Sub-task Reporter: Peter Bacsko -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2832) [core] Track non-Yunikorn allocations
Peter Bacsko created YUNIKORN-2832: -- Summary: [core] Track non-Yunikorn allocations Key: YUNIKORN-2832 URL: https://issues.apache.org/jira/browse/YUNIKORN-2832 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Peter Bacsko Assignee: Peter Bacsko -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2831) Update golangcli lint
Peter Bacsko created YUNIKORN-2831: -- Summary: Update golangcli lint Key: YUNIKORN-2831 URL: https://issues.apache.org/jira/browse/YUNIKORN-2831 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler, shim - kubernetes Reporter: Peter Bacsko Assignee: Peter Bacsko Go 1.23 has been released, but the linter version 1.57.2 does not support it: {noformat} ~/repos/yunikorn-core$ make lint installing golangci-lint v1.57.2 running golangci-lint Killed make: *** [Makefile:133: lint] Error 137 {noformat} According to the [release page|https://github.com/golangci/golangci-lint/releases], we need at least 1.60.1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2777) Improve TrackedResource type
[ https://issues.apache.org/jira/browse/YUNIKORN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2777. Fix Version/s: 1.6.0 Resolution: Fixed > Improve TrackedResource type > > > Key: YUNIKORN-2777 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2777 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Currently, TrackedResource is defined as: > {noformat} > type TrackedResource struct { > TrackedResourceMap map[string]map[string]int64 > locking.RWMutex > } > {noformat} > As it turned out during the review of > [YUNIKORN-2652|https://github.com/apache/yunikorn-core/pull/897], > {{TrackedResourceMap}} is actually {{map[string]*Resource}}. If we change the > definition, we'll be able to use the existing functions for {{Resource}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2528) Increase coverage for UGM code
[ https://issues.apache.org/jira/browse/YUNIKORN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2528. Fix Version/s: 1.6.0 Resolution: Fixed > Increase coverage for UGM code > -- > > Key: YUNIKORN-2528 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2528 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > The following branches are not covered properly by unit tests: > # {{GroupTracker.decreaseTrackedResource()}}: when {{gt == nil}} > # {{Manager.DecreaseTrackedResource()}}: when there's no UserTracker > # {{Manager.DecreaseTrackedResource()}}: when there's no GroupTracker > # {{Manager.DecreaseTrackedResource()}}: when groupTracker decrement returns > true > # {{QueueTracker.decreaseTrackedResource()}}: when there's no child tracker > See https://app.codecov.io/gh/apache/yunikorn-core/pull/810. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2429) Enhance UGM Manager test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2429. Fix Version/s: 1.6.0 Resolution: Fixed > Enhance UGM Manager test coverage > - > > Key: YUNIKORN-2429 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2429 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > During the review of YUNIKORN-2116, we noticed that a certain mistake was > made in {{clearEarlierSetUserWildCardLimits()}}, but it was not caught by the > unit tests. > Ensure proper coverage to verify configuration update. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2756) Consider moving event_system#defaultEventChannelSize to configs#const
[ https://issues.apache.org/jira/browse/YUNIKORN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2756. Fix Version/s: 1.6.0 Resolution: Fixed > Consider moving event_system#defaultEventChannelSize to configs#const > - > > Key: YUNIKORN-2756 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2756 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Trivial > Labels: newbie, pull-request-available > Fix For: 1.6.0 > > > All other event-related configs are in configs#const[1], so we should move > `defaultEventChannelSize`[0] make them together. > BTW, `defaultRingBufferSize`[2] will be removed by > https://github.com/apache/yunikorn-core/pull/915, since its replacement is in > configs#const already [3] > [0] > https://github.com/apache/yunikorn-core/blob/f25bee90c2abd2c6682912dfdd0013ef2f4bc0ba/pkg/events/event_system.go#L37 > [1] > https://github.com/apache/yunikorn-core/blob/f25bee90c2abd2c6682912dfdd0013ef2f4bc0ba/pkg/common/configs/configs.go#L43 > [2] > https://github.com/apache/yunikorn-core/blob/f25bee90c2abd2c6682912dfdd0013ef2f4bc0ba/pkg/events/event_system.go#L38 > [3] > https://github.com/apache/yunikorn-core/blob/f25bee90c2abd2c6682912dfdd0013ef2f4bc0ba/pkg/common/configs/configs.go#L47 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2681) Data race in TestCheckHealthStatusNotFound
[ https://issues.apache.org/jira/browse/YUNIKORN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2681. Fix Version/s: 1.6.0 Resolution: Fixed > Data race in TestCheckHealthStatusNotFound > --- > > Key: YUNIKORN-2681 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2681 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler, test - unit >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Data race was detected during an unit test: > {noformat} > == > WARNING: DATA RACE > Write at 0x0170c220 by goroutine 2575: > github.com/apache/yunikorn-core/pkg/webservice.NewWebApp() > > /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/webservice.go:82 > +0x11c > > github.com/apache/yunikorn-core/pkg/webservice.TestCheckHealthStatusNotFound() > > /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2574 > +0x2f > testing.tRunner() > /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e > testing.(*T).Run.gowrap1() > /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x44 > Previous read at 0x0170c220 by goroutine 2542: > github.com/apache/yunikorn-core/pkg/webservice.getStream() > > /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers.go:1225 > +0xbd3 > github.com/apache/yunikorn-core/pkg/webservice.TestGetStream_Limit.gowrap4() > > /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2308 > +0x4f > Goroutine 2575 (running) created at: > testing.(*T).Run() > /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x825 > testing.runTests.func1() > /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2161 +0x85 > testing.tRunner() > /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e > testing.runTests() > /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2159 +0x8be > testing.(*M).Run() > /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2027 +0xf17 > main.main() > _testmain.go:163 +0x2e4 > Goroutine 2542 (running) created at: > github.com/apache/yunikorn-core/pkg/webservice.TestGetStream_Limit() > > /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2308 > +0xbb7 > testing.tRunner() > /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e > testing.(*T).Run.gowrap1() > /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x44 > == > 2024-06-18T13:40:54.182Z INFOcore.events > events/event_streaming.go:164 Removing event stream consumer {"name": > "host-1", "creation time": "2024-06-18T13:40:54.181Z"} > 2024-06-18T13:40:54.182Z INFOcore.scheduler.health > webservice/handlers.go:623 Health check is not available > --- FAIL: TestCheckHealthStatusNotFound (0.00s) > testing.go:1398: race detected during execution of test > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2792) Create design doc
Peter Bacsko created YUNIKORN-2792: -- Summary: Create design doc Key: YUNIKORN-2792 URL: https://issues.apache.org/jira/browse/YUNIKORN-2792 Project: Apache YuniKorn Issue Type: Sub-task Reporter: Peter Bacsko Assignee: Peter Bacsko -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2791) Track non-Yunikorn allocations in the core
Peter Bacsko created YUNIKORN-2791: -- Summary: Track non-Yunikorn allocations in the core Key: YUNIKORN-2791 URL: https://issues.apache.org/jira/browse/YUNIKORN-2791 Project: Apache YuniKorn Issue Type: New Feature Reporter: Peter Bacsko Assignee: Peter Bacsko Currently, we don't know what non-YK pods are assigned to a particular node in the core. We only track the total amount of allocations as {{occupiedResources}} object inside the {{objects.Node}} type. If the tracking somehow becomes out of sync with the actual cluster state, it's very difficult to know what went wrong, because these allocations are not shown in the state dump. In order to enhance supportability, we want to track all non-YK pods per node on the core side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2760) `make tools` should check the version of tools
[ https://issues.apache.org/jira/browse/YUNIKORN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2760. Resolution: Fixed Merged to master in both repos. > `make tools` should check the version of tools > -- > > Key: YUNIKORN-2760 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2760 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Tzu-Hua Lan >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Makefile, by default, checks only the existence of file. Hence, developers > need to remove tools folder (or call `make distclean`) manually to trigger > the installation after we update the version of tools. > However, how developers can be aware of the tools updates? Personally, I > smell fishy from the error of warning, but that could be implicit and noisy > :cry > In order to fix that, I'd like to introduce the new folder structure to tools > folder: > {code:java} > /tools/{tool_name}-{version} > {code} > That offers a unique path to each version of tool. Developers will not miss > the updates anymore. > *rejected proposal* > {code:java} > /tools/{tool_name}/{version} > {code} > That offers a unique path to each version of tool. Developers will not miss > the updates anymore. > NOTED: we need to remove the existent tool binary if there is naming conflict > in creating the new path. For example, creating /tools/golangci-lint/1.57.2 > will fail if /tools/golangci-lint is a existent file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2787) Eliminate gosec lint warnings
[ https://issues.apache.org/jira/browse/YUNIKORN-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2787. Resolution: Not A Problem > Eliminate gosec lint warnings > -- > > Key: YUNIKORN-2787 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2787 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Labels: pull-request-available > > Use the nolint directive to get rid of the following warnings from the test > code > {noformat} > pkg/scheduler/objects/node_collection_test.go:315:18: G602: Potentially > accessing slice out of bounds (gosec) > assert.Equal(t, nodes[0].NodeID, "node-2", "wrong node 0") > ^ > pkg/scheduler/objects/node_collection_test.go:316:18: G602: Potentially > accessing slice out of bounds (gosec) > assert.Equal(t, nodes[1].NodeID, "node-4", "wrong node 1") > ^ > pkg/scheduler/objects/node_collection_test.go:317:18: G602: Potentially > accessing slice out of bounds (gosec) > assert.Equal(t, nodes[2].NodeID, "node-1", "wrong node 2") > ^ > pkg/scheduler/objects/node_collection_test.go:318:18: G602: Potentially > accessing slice out of bounds (gosec) > assert.Equal(t, nodes[3].NodeID, "node-3", "wrong node 3") > ^ > pkg/scheduler/objects/nodesorting_test.go:200:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node > (fair)") > ^ > pkg/scheduler/objects/nodesorting_test.go:201:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node > (fair)") > ^ > pkg/scheduler/objects/nodesorting_test.go:214:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node2.NodeID, nodes[0].NodeID, "wrong initial node > (binpacking)") > ^ > pkg/scheduler/objects/nodesorting_test.go:215:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node1.NodeID, nodes[1].NodeID, "wrong second node > (binpacking)") > ^ > pkg/scheduler/objects/nodesorting_test.go:244:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node > (binpacking, empty allocation)") > ^ > pkg/scheduler/objects/nodesorting_test.go:245:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node > (binpacking, empty allocation)") > ^ > pkg/scheduler/objects/nodesorting_test.go:256:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node > (fair, empty allocation)") > ^ > pkg/scheduler/objects/nodesorting_test.go:257:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node > (fair, empty allocation)") > ^ > pkg/scheduler/objects/nodesorting_test.go:274:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node2.NodeID, nodes[0].NodeID, "wrong initial node > (binpacking, node2 half-filled)") > ^ > pkg/scheduler/objects/nodesorting_test.go:275:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node1.NodeID, nodes[1].NodeID, "wrong second node > (binpacking, node2 half-filled") > ^ > pkg/scheduler/objects/nodesorting_test.go:287:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node > (fair, node2 half-filled)") > ^ > pkg/scheduler/objects/nodesorting_test.go:288:32: G602: Potentially accessing > slice out of bounds (gosec) > assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node > (binpacking, node2 half-filled") > ^ > make: *** [Makefile:131: lint] Error 1 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2787) Eliminate gosec lint warnings
Peter Bacsko created YUNIKORN-2787: -- Summary: Eliminate gosec lint warnings Key: YUNIKORN-2787 URL: https://issues.apache.org/jira/browse/YUNIKORN-2787 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Peter Bacsko Assignee: Peter Bacsko Use the nolint directive to get rid of the following warnings from the test code {noformat} running golangci-lint pkg/scheduler/objects/node_collection_test.go:315:18: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, nodes[0].NodeID, "node-2", "wrong node 0") ^ pkg/scheduler/objects/node_collection_test.go:316:18: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, nodes[1].NodeID, "node-4", "wrong node 1") ^ pkg/scheduler/objects/node_collection_test.go:317:18: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, nodes[2].NodeID, "node-1", "wrong node 2") ^ pkg/scheduler/objects/node_collection_test.go:318:18: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, nodes[3].NodeID, "node-3", "wrong node 3") ^ pkg/scheduler/objects/nodesorting_test.go:200:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node (fair)") ^ pkg/scheduler/objects/nodesorting_test.go:201:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node (fair)") ^ pkg/scheduler/objects/nodesorting_test.go:214:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node2.NodeID, nodes[0].NodeID, "wrong initial node (binpacking)") ^ pkg/scheduler/objects/nodesorting_test.go:215:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node1.NodeID, nodes[1].NodeID, "wrong second node (binpacking)") ^ pkg/scheduler/objects/nodesorting_test.go:244:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node (binpacking, empty allocation)") ^ pkg/scheduler/objects/nodesorting_test.go:245:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node (binpacking, empty allocation)") ^ pkg/scheduler/objects/nodesorting_test.go:256:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node (fair, empty allocation)") ^ pkg/scheduler/objects/nodesorting_test.go:257:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node (fair, empty allocation)") ^ pkg/scheduler/objects/nodesorting_test.go:274:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node2.NodeID, nodes[0].NodeID, "wrong initial node (binpacking, node2 half-filled)") ^ pkg/scheduler/objects/nodesorting_test.go:275:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node1.NodeID, nodes[1].NodeID, "wrong second node (binpacking, node2 half-filled") ^ pkg/scheduler/objects/nodesorting_test.go:287:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node (fair, node2 half-filled)") ^ pkg/scheduler/objects/nodesorting_test.go:288:32: G602: Potentially accessing slice out of bounds (gosec) assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node (binpacking, node2 half-filled") ^ make: *** [Makefile:131: lint] Error 1 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2706) [UMBRELLA] YuniKorn 1.5.2 release efforts
[ https://issues.apache.org/jira/browse/YUNIKORN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2706. Fix Version/s: 1.5.2 Resolution: Fixed > [UMBRELLA] YuniKorn 1.5.2 release efforts > - > > Key: YUNIKORN-2706 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2706 > Project: Apache YuniKorn > Issue Type: Task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.5.2 > > > This umbrella is to track the work items needed for the 1.5.2 release. > Release manager: Peter Bacsko. > This release only contains bug fixes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2629) Adding a node can result in a deadlock
[ https://issues.apache.org/jira/browse/YUNIKORN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2629. Fix Version/s: 1.6.0 Target Version: 1.5.2, 1.6.0 (was: 1.6.0, 1.5.2) Resolution: Fixed > Adding a node can result in a deadlock > -- > > Key: YUNIKORN-2629 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2629 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Affects Versions: 1.5.0 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Blocker > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > Attachments: updateNode_deadlock_trace.txt, > yunikorn-scheduler-20240627.log, yunikorn_stuck_stack_20240708.txt > > > Adding a new node after Yunikorn state initialization can result in a > deadlock. > The problem is that {{Context.addNode()}} holds a lock while we're waiting > for the {{NodeAccepted}} event: > {noformat} >dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, > func(event interface{}) { > nodeEvent, ok := event.(CachedSchedulerNodeEvent) > if !ok { > return > } > [...] removed for clarity > wg.Done() > }) > defer dispatcher.UnregisterEventHandler(handlerID, > dispatcher.EventTypeNode) > if err := > ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateNode(&si.NodeRequest{ > Nodes: nodesToRegister, > RmID: schedulerconf.GetSchedulerConf().ClusterID, > }); err != nil { > log.Log(log.ShimContext).Error("Failed to register nodes", > zap.Error(err)) > return nil, err > } > // wait for all responses to accumulate > wg.Wait() <--- shim gets stuck here > {noformat} > If tasks are being processed, then the dispatcher will try to retrieve the > evend handler, which is returned from Context: > {noformat} > go func() { > for { > select { > case event := <-getDispatcher().eventChan: > switch v := event.(type) { > case events.TaskEvent: > getEventHandler(EventTypeTask)(v) <--- > eventually calls Context.getTask() > case events.ApplicationEvent: > getEventHandler(EventTypeApp)(v) > case events.SchedulerNodeEvent: > getEventHandler(EventTypeNode)(v) > {noformat} > Since {{addNode()}} is holding a write lock, the event processing loop gets > stuck, so {{registerNodes()}} will never progress. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2319) cache.Task: reference to old pod object is kept after update
[ https://issues.apache.org/jira/browse/YUNIKORN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2319. Fix Version/s: 1.6.0 Resolution: Fixed > cache.Task: reference to old pod object is kept after update > > > Key: YUNIKORN-2319 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2319 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > Attachments: 2024-01-09 134112.png, 2024-01-09 134130.png > > > There is a kind of memory leak in the shim: when the pod is updated, the old > pod object is still referenced from Task, so the GC has no chance to remove > it (only when the pod terminates). > See screenshot: task points to version 80199, scheduler cache already has a > newer version 81216. > We have two solutions: > 1. Update the object in the Task together with the scheduler cache > 2. Don't store the pointer to the pod, instead, always retrieve it from the > scheduler cache -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler
[ https://issues.apache.org/jira/browse/YUNIKORN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2717. Fix Version/s: 1.6.0 Resolution: Fixed [~chiahsuan] > Assert invalid queue name in get queue applications handler > --- > > Key: YUNIKORN-2717 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2717 > Project: Apache YuniKorn > Issue Type: Test > Components: core - scheduler >Reporter: Manikandan R >Assignee: Chia Hsuan Chang >Priority: Minor > Labels: newbie > Fix For: 1.6.0 > > > Assert invalid queue name in TestGetQueueApplicationsHandler test method > using > assertQueueInvalid(). Also cleanup the method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Reopened] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler
[ https://issues.apache.org/jira/browse/YUNIKORN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko reopened YUNIKORN-2717: > Assert invalid queue name in get queue applications handler > --- > > Key: YUNIKORN-2717 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2717 > Project: Apache YuniKorn > Issue Type: Test > Components: core - scheduler >Reporter: Manikandan R >Assignee: Chia Hsuan Chang >Priority: Minor > Labels: newbie > > Assert invalid queue name in TestGetQueueApplicationsHandler test method > using > assertQueueInvalid(). Also cleanup the method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2766) Only generate event if all predicates failed
[ https://issues.apache.org/jira/browse/YUNIKORN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2766. Fix Version/s: 1.6.0 Resolution: Fixed > Only generate event if all predicates failed > > > Key: YUNIKORN-2766 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2766 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Right now, we send an event to the pod if a predicate failed: > {noformat} >if err := plugin.Predicates(&si.PredicatesArgs{ > AllocationKey: allocationKey, > NodeID:sn.NodeID, > Allocate: allocate, > }); err != nil { > log.Log(log.SchedNode).Debug("running predicates > failed", > zap.String("allocationKey", allocationKey), > zap.String("nodeID", sn.NodeID), > zap.Bool("allocateFlag", allocate), > zap.Error(err)) > // running predicates failed > msg := err.Error() > ask.LogAllocationFailure(msg, allocate) > ask.SendPredicateFailedEvent(msg) > return false > } > {noformat} > This is, however, not correct. We should only generate an event if *all* > predicates have failed, which means that the pod cannot be scheduled. A > failing predicate for a given node can be perfectly normal in many cases. > Instead, we should aggregate the failed predicates and send an event like: > {noformat} > All predicates failed for request '345d70d7-243a-4077-a9f8-0bb76c3532d7': > node(s) didn't match Pod's node affinity/selector (20x); node(s) had taints > that the pod didn't tolerate (5x) > {noformat} > where 20x and 5x tell how many times a certain predicate failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2652) Expand getApplication() endpoint handler to return resource usage
[ https://issues.apache.org/jira/browse/YUNIKORN-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2652. Fix Version/s: 1.6.0 Resolution: Fixed > Expand getApplication() endpoint handler to return resource usage > - > > Key: YUNIKORN-2652 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2652 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common >Reporter: Rich Scott >Assignee: Rich Scott >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Some users would like to be able to see resource usage (preempted, > placeholder resource, etc) for applications that have been completed. The > `getApplication()` endpoint handler should be enhanced to take an optional > parameter specifying that the user would like details about resources > included in the response, and a new `ApplicationXXXDAOInfo` object that is a > slight superset of `ApplicationDAOInfo` should be introduced, and can be used > in the response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2777) Improve TrackedResource type
Peter Bacsko created YUNIKORN-2777: -- Summary: Improve TrackedResource type Key: YUNIKORN-2777 URL: https://issues.apache.org/jira/browse/YUNIKORN-2777 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Peter Bacsko Currently, TrackedResource is defined as: {noformat} type TrackedResource struct { TrackedResourceMap map[string]map[string]int64 locking.RWMutex } {noformat} As it turned out during the review of [YUNIKORN-2652|https://github.com/apache/yunikorn-core/pull/897], {{TrackedResourceMap}} is actually {{map[string]*Resource}}. If we change the definition, we'll be able to use the existing functions that already exist for {{Resource}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2707) Tagging for 1.5.2
[ https://issues.apache.org/jira/browse/YUNIKORN-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2707. Fix Version/s: 1.5.2 Resolution: Fixed > Tagging for 1.5.2 > - > > Key: YUNIKORN-2707 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2707 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2759) Replace %w by Errors.join
[ https://issues.apache.org/jira/browse/YUNIKORN-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2759. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Replace %w by Errors.join > - > > Key: YUNIKORN-2759 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2759 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > original discussion: https://issues.apache.org/jira/browse/YUNIKORN-2262 > Errors.join can make the code more performant and readable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2770) Simplify Application.GetTask()
[ https://issues.apache.org/jira/browse/YUNIKORN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2770. Fix Version/s: 1.6.0 Resolution: Fixed > Simplify Application.GetTask() > -- > > Key: YUNIKORN-2770 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2770 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > {{Application.GetTask()}} returns a {{*Task}} and an {{error}}, but the > {{error}} is completely unnecessary. We either have the task for the given > taskID or we don't. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2765) Improve si_helper & resource funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2765. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Improve si_helper & resource funtion's test coverage > > > Key: YUNIKORN-2765 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2765 > Project: Apache YuniKorn > Issue Type: Test > Components: shim - kubernetes >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Improve the following funtion's test coverage > * GetTerminationTypeFromString (unknow terminationtype) > * getMaxResource (requested resource types are fewer than allocated types) > * GetResource > * GetTGResource -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2770) Simplify Application.GetTask()
Peter Bacsko created YUNIKORN-2770: -- Summary: Simplify Application.GetTask() Key: YUNIKORN-2770 URL: https://issues.apache.org/jira/browse/YUNIKORN-2770 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Peter Bacsko Assignee: Peter Bacsko {{Application.GetTask()}} returns a {{*Task}} and an {{error}}, but the {{error}} is completely unnecessary. We either have the task for the given taskID or we don't. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2707) Tagging for 1.5.2
[ https://issues.apache.org/jira/browse/YUNIKORN-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2707. Fix Version/s: 1.5.2 Resolution: Fixed > Tagging for 1.5.2 > - > > Key: YUNIKORN-2707 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2707 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Reopened] (YUNIKORN-2707) Tagging for 1.5.2
[ https://issues.apache.org/jira/browse/YUNIKORN-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko reopened YUNIKORN-2707: > Tagging for 1.5.2 > - > > Key: YUNIKORN-2707 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2707 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2766) Only generate event if all predicates failed
Peter Bacsko created YUNIKORN-2766: -- Summary: Only generate event if all predicates failed Key: YUNIKORN-2766 URL: https://issues.apache.org/jira/browse/YUNIKORN-2766 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler Reporter: Peter Bacsko Assignee: Peter Bacsko Right now, we send an event to the pod if a predicate failed: {noformat} if err := plugin.Predicates(&si.PredicatesArgs{ AllocationKey: allocationKey, NodeID:sn.NodeID, Allocate: allocate, }); err != nil { log.Log(log.SchedNode).Debug("running predicates failed", zap.String("allocationKey", allocationKey), zap.String("nodeID", sn.NodeID), zap.Bool("allocateFlag", allocate), zap.Error(err)) // running predicates failed msg := err.Error() ask.LogAllocationFailure(msg, allocate) ask.SendPredicateFailedEvent(msg) return false } {noformat} This is, however, not correct. We should only generate an event if *all* predicates have failed, which means that the pod cannot be scheduled. A failing predicate for a given node can be perfectly normal in many cases. Instead, we should aggregate the failed predicates and send an event like: {noformat} All predicates failed for request '345d70d7-243a-4077-a9f8-0bb76c3532d7': node(s) didn't match Pod's node affinity/selector (20x), node(s) had taints that the pod didn't tolerate (5x) {noformat} where 20x and 5x tell how many times a certain predicate failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2725) Temporarily disable failing e2e preemption tests
[ https://issues.apache.org/jira/browse/YUNIKORN-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2725. Fix Version/s: 1.6.0 Resolution: Fixed > Temporarily disable failing e2e preemption tests > > > Key: YUNIKORN-2725 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2725 > Project: Apache YuniKorn > Issue Type: Test > Components: shim - kubernetes, test - e2e >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Disable the following tests to have green builds: > Verify_preemption_on_priority_queue > Verify_basic_preemption > Verify_allow_preemption_tag -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2725) Temporarily disable failing e2e tests
Peter Bacsko created YUNIKORN-2725: -- Summary: Temporarily disable failing e2e tests Key: YUNIKORN-2725 URL: https://issues.apache.org/jira/browse/YUNIKORN-2725 Project: Apache YuniKorn Issue Type: Test Components: shim - kubernetes, test - e2e Reporter: Peter Bacsko Assignee: Peter Bacsko Disable the following tests to have green builds: Verify_preemption_on_priority_queue Verify_basic_preemption -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2724) Improve the signature of methods notifyTaskComplete() and ensureAppAndTaskCreated()
Peter Bacsko created YUNIKORN-2724: -- Summary: Improve the signature of methods notifyTaskComplete() and ensureAppAndTaskCreated() Key: YUNIKORN-2724 URL: https://issues.apache.org/jira/browse/YUNIKORN-2724 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Peter Bacsko >From the review [https://github.com/apache/yunikorn-k8shim/pull/864] "I also think we need to change the signature for {{notifyTaskComplete(string, string)}} to {{notifyTaskComplete(*Application, string)}} Probably better to use a separate jira for that as it flows through into {{NotifyTaskComplete()}} and some tests. The 2 tests have the application pointer already. It removes a number of extra getApplication() calls we really do not need. Similar for {{ensureAppAndTaskCreated()}} which is only ever called from this function. Add a parameter to it to make it: {{ensureAppAndTaskCreated(*v1.Pod, *Application)}} and only execute application creation {{{}if app == nil{}}}. This can be either in this jira or in a separate one." That is, optimize the methods so that we avoid unnecessary {{GetApplication()}} calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2182) Set ReadHeaderTimeout in http server
[ https://issues.apache.org/jira/browse/YUNIKORN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2182. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Set ReadHeaderTimeout in http server > > > Key: YUNIKORN-2182 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2182 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common, webapp >Reporter: Wilfred Spiegelenburg >Assignee: Chenchen Lai >Priority: Major > Labels: newbie, pull-request-available > Fix For: 1.6.0 > > > Potential Slowloris Attack because ReadHeaderTimeout is not configured in the > http.Server (gosec) > We do not set ReadTimeout or ReadHeaderTimeout so we do not have a timeout at > all at the moment. > BTW: this is not important for the webtest servers we build as they are just > for our tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2568) Move all xxxEvents types to objects/events
[ https://issues.apache.org/jira/browse/YUNIKORN-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2568. Fix Version/s: 1.6.0 Resolution: Fixed > Move all xxxEvents types to objects/events > -- > > Key: YUNIKORN-2568 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2568 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2564) [Umbrella] Move xxxEvents types to a different package
[ https://issues.apache.org/jira/browse/YUNIKORN-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2564. Fix Version/s: 1.6.0 Resolution: Fixed > [Umbrella] Move xxxEvents types to a different package > -- > > Key: YUNIKORN-2564 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2564 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.6.0 > > > There are several Events that can be moved to a different package: > * queueEvents > * applicationEvents > * askEvents > * nodeEvents > There are numerous files in {{pkg/scheduler/objects}}. This is an opportunity > to clean it up a bit and move these under eg. > {{pkg/scheduler/objects/events}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2708) Release notes for 1.5.2
Peter Bacsko created YUNIKORN-2708: -- Summary: Release notes for 1.5.2 Key: YUNIKORN-2708 URL: https://issues.apache.org/jira/browse/YUNIKORN-2708 Project: Apache YuniKorn Issue Type: Sub-task Reporter: Peter Bacsko -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2709) Update website for 1.5.2
Peter Bacsko created YUNIKORN-2709: -- Summary: Update website for 1.5.2 Key: YUNIKORN-2709 URL: https://issues.apache.org/jira/browse/YUNIKORN-2709 Project: Apache YuniKorn Issue Type: Sub-task Components: release Reporter: Peter Bacsko Assignee: Peter Bacsko -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2706) [UMBRELLA] YuniKorn 1.5.2 release efforts
Peter Bacsko created YUNIKORN-2706: -- Summary: [UMBRELLA] YuniKorn 1.5.2 release efforts Key: YUNIKORN-2706 URL: https://issues.apache.org/jira/browse/YUNIKORN-2706 Project: Apache YuniKorn Issue Type: Task Components: release Reporter: Peter Bacsko Assignee: Peter Bacsko This umbrella is to track the work items needed for the 1.5.2 release. Release manager: Peter Bacsko. This release only contains bug fixes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2707) Tagging for 1.5.2
Peter Bacsko created YUNIKORN-2707: -- Summary: Tagging for 1.5.2 Key: YUNIKORN-2707 URL: https://issues.apache.org/jira/browse/YUNIKORN-2707 Project: Apache YuniKorn Issue Type: Sub-task Reporter: Peter Bacsko Assignee: Peter Bacsko -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2704) Event publish errors out when predicates fail
[ https://issues.apache.org/jira/browse/YUNIKORN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2704. Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed Merged to master & branch-1.5 > Event publish errors out when predicates fail > - > > Key: YUNIKORN-2704 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2704 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Mit Desai >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.6.0, 1.5.2 > > > I consistently see this error in the logs when events are published. > I did put some debug logs and found that I only get it when the events for > untolerated taints are published. > E0618 17:43:17.858946 1 event_broadcaster.go:270] "Server rejected > event (will not retry!)" err="Event \"<>.17da2a31072bb32f\" is > invalid: [action: Required value, reason: Required value]" > event="&Event\{ObjectMeta:{<>.17da2a31072bb32f dpi-dev 0 > 0001-01-01 00:00:00 + UTC map[] map[] [] [] > []},EventTime:2024-06-18 17:43:17.857332069 + UTC > m=+84279.014490005,Series:nil,ReportingController:yunikorn,ReportingInstance:yunikorn-yunikorn-scheduler-59bdc88fdc-7h5bt,Action:,Reason:,Regarding:\{Pod > <> <> 5c90315c-a07d-4801-9ecc-baf61ee45f11 v1 > 4323324038 },Related:nil,Note:Predicate failed for request > '5c90315c-a07d-4801-9ecc-baf61ee45f11' with message: 'node(s) had untolerated > taint \{<>: <>}',Type:Normal,DeprecatedSource:\{ > },DeprecatedFirstTimestamp:0001-01-01 00:00:00 + > UTC,DeprecatedLastTimestamp:0001-01-01 00:00:00 + UTC,DeprecatedCount:0,}" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2694) Improve placement rule funtion's test coverage - 2
[ https://issues.apache.org/jira/browse/YUNIKORN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2694. Fix Version/s: 1.6.0 Resolution: Fixed > Improve placement rule funtion's test coverage - 2 > -- > > Key: YUNIKORN-2694 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2694 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2683) Unnecessary error is logged when resource usage is increased
[ https://issues.apache.org/jira/browse/YUNIKORN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2683. Fix Version/s: 1.6.0 Resolution: Fixed > Unnecessary error is logged when resource usage is increased > > > Key: YUNIKORN-2683 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2683 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0 > > > The refactored code in YUNIKORN-2542 contains an unnecessary warning message: > {noformat} > appGroup := userTracker.getGroupForApp(applicationID) > log.Log(log.SchedUGM).Debug("Increasing resource usage for user", > zap.String("user", user.User), > zap.String("queue path", queuePath), > zap.String("application", applicationID), > zap.String("group", appGroup), > zap.Stringer("resource", usage)) > groupTracker := m.GetGroupTracker(appGroup) > if groupTracker == nil { > log.Log(log.SchedUGM).Error("group tracker should be available > in groupTrackers map", > zap.String("application", applicationID), > zap.String("group", appGroup)) > return > } > ... > {noformat} > We don't always have a {{groupTracker}}. The previous code simply called > {{increaseTrackedResource()}} on an empty tracker: > {noformat} > func (ut *UserTracker) increaseTrackedResource(queuePath string, > applicationID string, usage *resources.Resource) { > ut.Lock() > defer ut.Unlock() > ut.events.sendIncResourceUsageForUser(ut.userName, queuePath, usage) > hierarchy := strings.Split(queuePath, configs.DOT) > ut.queueTracker.increaseTrackedResource(hierarchy, applicationID, user, > usage) > gt := ut.appGroupTrackers[applicationID] > log.Log(log.SchedUGM).Debug("Increasing resource usage for group", > zap.String("group", gt.getName()), > zap.Strings("queue path", hierarchy), > zap.String("application", applicationID), > zap.Stringer("resource", usage)) > gt.increaseTrackedResource(queuePath, applicationID, usage, > ut.userName) <- can be null > } > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2661) Fix hard-coded boolean in setLimit
[ https://issues.apache.org/jira/browse/YUNIKORN-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2661. Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed Merged to master & branch-1.5 > Fix hard-coded boolean in setLimit > -- > > Key: YUNIKORN-2661 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2661 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > Inside the UGM code {{setLimit()}}, we don't pass down {{doWildcardCheck}}, > so this variables never reaches the leafs: > {noformat} > / Note: Lock free call. The Lock of the linked tracker (UserTracker and > GroupTracker) should be held before calling this function. > func (qt *QueueTracker) setLimit(hierarchy []string, maxResource > *resources.Resource, maxApps uint64, useWildCard bool, trackType > trackingType, doWildCardCheck bool) { > log.Log(log.SchedUGM).Debug("Setting limits", > zap.String("queue path", qt.queuePath), > zap.Strings("hierarchy", hierarchy), > zap.Uint64("max applications", maxApps), > zap.Stringer("max resources", maxResource), > zap.Bool("use wild card", useWildCard)) > // depth first: all the way to the leaf, create if not exists > // more than 1 in the slice means we need to recurse down > if len(hierarchy) > 1 { > childName := hierarchy[1] > if qt.childQueueTrackers[childName] == nil { > qt.childQueueTrackers[childName] = > newQueueTracker(qt.queuePath, childName, trackType) > } > qt.childQueueTrackers[childName].setLimit(hierarchy[1:], > maxResource, maxApps, useWildCard, trackType, false) <-- should be > "doWildCardCheck" not "false" > ... > {noformat} > Fix this and create a unit test for {{setLimit()}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2516) Update documentation about event.RESTResponseSize
[ https://issues.apache.org/jira/browse/YUNIKORN-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2516. Fix Version/s: 1.6.0 Resolution: Fixed > Update documentation about event.RESTResponseSize > - > > Key: YUNIKORN-2516 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2516 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: documentation >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2512) Event system properties are not used
[ https://issues.apache.org/jira/browse/YUNIKORN-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2512. Fix Version/s: 1.6.0 Resolution: Fixed > Event system properties are not used > > > Key: YUNIKORN-2512 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2512 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.6.0 > > > There two properties which are not used by the event system: > # The property "event.requestCapacity" is supposed to determine the size of a > slice which is used between the core and shim to transfer events in every 2 > seconds. However, right now it's not used at all, we use the default (1000) > every time. > # The property "RESTResponseSize" is not even in the code at all. It > influences the maximum number of entries returned in the batch API. > Currently, the hard coded value is 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2245) Application sorting: improve pending resource filtering
[ https://issues.apache.org/jira/browse/YUNIKORN-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2245. Resolution: Won't Do > Application sorting: improve pending resource filtering > --- > > Key: YUNIKORN-2245 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2245 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > > When sorting applications, we do a filtering on pending resources: > {noformat} > func filterOnPendingResources(apps map[string]*Application) []*Application { > filteredApps := make([]*Application, 0) > for _, app := range apps { > // Only look at app when pending-res > 0 > if resources.StrictlyGreaterThanZero(app.GetPendingResource()) { > filteredApps = append(filteredApps, app) > } > } > return filteredApps > } > {noformat} > This filtering is relatively expensive, but necessary, because during the > lifecycle of an application, {{sa.pending}} can become 0 and in this case, we > don't want to schedule anything from the app. > Suggested approach is to track total pendingAskRepeats inside the app. That > way we don't need to call {{resources.StrictlyGreaterThanZero()}} and we > perform a simple integer comparison. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2221) Performance improvements phase II
[ https://issues.apache.org/jira/browse/YUNIKORN-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko closed YUNIKORN-2221. -- > Performance improvements phase II > - > > Key: YUNIKORN-2221 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2221 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler, shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Fix For: 1.5.0 > > > Umbrella JIRA for further performance improvements in Yunikorn. > The main issues have been addressed in YUNIKORN-1715. However, it's still > possible to reduce memory and CPU usage further by doing smaller things. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2221) Performance improvements phase II
[ https://issues.apache.org/jira/browse/YUNIKORN-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2221. Fix Version/s: 1.5.0 Resolution: Fixed > Performance improvements phase II > - > > Key: YUNIKORN-2221 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2221 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler, shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Fix For: 1.5.0 > > > Umbrella JIRA for further performance improvements in Yunikorn. > The main issues have been addressed in YUNIKORN-1715. However, it's still > possible to reduce memory and CPU usage further by doing smaller things. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2653) Gang scheduling K8s event formatting compliance
[ https://issues.apache.org/jira/browse/YUNIKORN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2653. Fix Version/s: 1.6.0 Resolution: Fixed > Gang scheduling K8s event formatting compliance > --- > > Key: YUNIKORN-2653 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2653 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > The K8s events provide definitions and rules around the content of the fields > within the event. Adjust the content of gang scheduling related events to > comply with the rules. > Focussed on the reason and action fields only. > * 'reason' is the reason this event is generated. 'reason' should be short > and unique; it should be in UpperCamelCase format (starting with a capital > letter). > * 'action' explains what happened with regarding/ what action did the > ReportingController take in objects name; it should be in UpperCamelCase > format (starting with a capital letter). > No space or long text. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2683) Unnecessary error is logged when resource usage is increased
Peter Bacsko created YUNIKORN-2683: -- Summary: Unnecessary error is logged when resource usage is increased Key: YUNIKORN-2683 URL: https://issues.apache.org/jira/browse/YUNIKORN-2683 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Peter Bacsko The refactored code in YUNIKORN-2542 contains an unnecessary warning message: {noformat} appGroup := userTracker.getGroupForApp(applicationID) log.Log(log.SchedUGM).Debug("Increasing resource usage for user", zap.String("user", user.User), zap.String("queue path", queuePath), zap.String("application", applicationID), zap.String("group", appGroup), zap.Stringer("resource", usage)) groupTracker := m.GetGroupTracker(appGroup) if groupTracker == nil { log.Log(log.SchedUGM).Error("group tracker should be available in groupTrackers map", zap.String("application", applicationID), zap.String("group", appGroup)) return } ... {noformat} We don't always have a {{groupTracker}}. The previous code simply called {{increaseTrackedResource()}} on an empty tracker: {noformat} func (ut *UserTracker) increaseTrackedResource(queuePath string, applicationID string, usage *resources.Resource) { ut.Lock() defer ut.Unlock() ut.events.sendIncResourceUsageForUser(ut.userName, queuePath, usage) hierarchy := strings.Split(queuePath, configs.DOT) ut.queueTracker.increaseTrackedResource(hierarchy, applicationID, user, usage) gt := ut.appGroupTrackers[applicationID] log.Log(log.SchedUGM).Debug("Increasing resource usage for group", zap.String("group", gt.getName()), zap.Strings("queue path", hierarchy), zap.String("application", applicationID), zap.Stringer("resource", usage)) gt.increaseTrackedResource(queuePath, applicationID, usage, ut.userName) <- can be null } {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2680) Improve placement rule funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2680. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Improve placement rule funtion's test coverage > -- > > Key: YUNIKORN-2680 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2680 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2681) Data race in TestGetStream_Limit
Peter Bacsko created YUNIKORN-2681: -- Summary: Data race in TestGetStream_Limit Key: YUNIKORN-2681 URL: https://issues.apache.org/jira/browse/YUNIKORN-2681 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler, test - unit Reporter: Peter Bacsko Assignee: Peter Bacsko Data race was detected during an unit test: {noformat} == WARNING: DATA RACE Write at 0x0170c220 by goroutine 2575: github.com/apache/yunikorn-core/pkg/webservice.NewWebApp() /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/webservice.go:82 +0x11c github.com/apache/yunikorn-core/pkg/webservice.TestCheckHealthStatusNotFound() /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2574 +0x2f testing.tRunner() /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e testing.(*T).Run.gowrap1() /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x44 Previous read at 0x0170c220 by goroutine 2542: github.com/apache/yunikorn-core/pkg/webservice.getStream() /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers.go:1225 +0xbd3 github.com/apache/yunikorn-core/pkg/webservice.TestGetStream_Limit.gowrap4() /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2308 +0x4f Goroutine 2575 (running) created at: testing.(*T).Run() /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x825 testing.runTests.func1() /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2161 +0x85 testing.tRunner() /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e testing.runTests() /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2159 +0x8be testing.(*M).Run() /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2027 +0xf17 main.main() _testmain.go:163 +0x2e4 Goroutine 2542 (running) created at: github.com/apache/yunikorn-core/pkg/webservice.TestGetStream_Limit() /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2308 +0xbb7 testing.tRunner() /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e testing.(*T).Run.gowrap1() /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x44 == 2024-06-18T13:40:54.182ZINFOcore.events events/event_streaming.go:164 Removing event stream consumer {"name": "host-1", "creation time": "2024-06-18T13:40:54.181Z"} 2024-06-18T13:40:54.182ZINFOcore.scheduler.health webservice/handlers.go:623 Health check is not available --- FAIL: TestCheckHealthStatusNotFound (0.00s) testing.go:1398: race detected during execution of test {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2673) Improve newFilter funtion's test coverage in filter.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2673. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Improve newFilter funtion's test coverage in filter.go > -- > > Key: YUNIKORN-2673 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2673 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2515) Add property event.RESTResponseSize to the batch event handler
[ https://issues.apache.org/jira/browse/YUNIKORN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2515. Fix Version/s: 1.6.0 Resolution: Fixed > Add property event.RESTResponseSize to the batch event handler > -- > > Key: YUNIKORN-2515 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2515 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2670) Improve util funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2670. Fix Version/s: 1.6.0 Resolution: Fixed > Improve util funtion's test coverage > > > Key: YUNIKORN-2670 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2670 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Improve the following funtion's test coverage in util.go > * ZeroTimeInUnixNano > * GetNewUUID > * IsRecoveryQueue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2669) nil pointer dereference error
[ https://issues.apache.org/jira/browse/YUNIKORN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2669. Resolution: Duplicate This looks like a dup of YUNIKORN-2562. The solution for this has been delivered in 1.5.1. It's also on master. > nil pointer dereference error > - > > Key: YUNIKORN-2669 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2669 > Project: Apache YuniKorn > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Junyoung Park >Assignee: Peter Bacsko >Priority: Major > > Environment: AWS EKS 1.26 > yunikorn-scheduler logs > {code:java} > panic: runtime error: invalid memory address or nil pointer > dereference[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 > pc=0x179b2f5] > goroutine 50 > [running]:github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).ReplaceAllocation(0xc000661000, > {0xc008ad14a0, 0x24}) > github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/objects/application.go:1739 > > +0x615github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation(0xc00046a100?, > 0xc01436c880) > github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/partition.go:1281 > +0x27fgithub.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases(0xc000502680?, > {0xc02014da60, 0x1, 0xc0112f5ee8?}, {0xc0060f8980, 0xb}) > github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/context.go:868 > +0x9egithub.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent(0xc00046a100?, > 0xc0145e8eb0?) > github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/context.go:750 > +0xa5github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent(0xc000120990) > > github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/scheduler.go:111 > +0x16ecreated by > github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService in > goroutine 1 > github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/scheduler.go:55 +0x9c > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2637) finalizePods should ignore pods like registerPods does
[ https://issues.apache.org/jira/browse/YUNIKORN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2637. Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed Merged to master & branch-1.5. > finalizePods should ignore pods like registerPods does > -- > > Key: YUNIKORN-2637 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2637 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > The initialisation code is a two step process for pods: first list all pods > and add them to the system in registerPods(). This returns a list of pods > processed. > The second step happens after event handlers are turned on and nodes have > been cleaned up etc. During the second step pods from the first step are > checked and removed. However pods that were already in a terminated state in > step 1 get removed again. Although the step should be idempotent this is > unneeded. When iterating over the existing pods any pod in a terminal state > should be skipped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2668) Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails
[ https://issues.apache.org/jira/browse/YUNIKORN-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2668. Fix Version/s: 1.6.0 Resolution: Fixed > Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails > > > Key: YUNIKORN-2668 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2668 > Project: Apache YuniKorn > Issue Type: Task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > The test case TestUpdateAllocation_NewTask_AssumePodFails occasionally fails > due to a deadlock problem described in YUNIKORN-2629. Until that ticket is > resolved, let's disable this test for the time being, so upstream tests don't > fail. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2668) Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails
Peter Bacsko created YUNIKORN-2668: -- Summary: Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails Key: YUNIKORN-2668 URL: https://issues.apache.org/jira/browse/YUNIKORN-2668 Project: Apache YuniKorn Issue Type: Task Reporter: Peter Bacsko Assignee: Peter Bacsko The test case TestUpdateAllocation_NewTask_AssumePodFails occasionally fails due to a deadlock problem described in YUNIKORN-2629. Until that ticket is resolved, let's disable this test for the time being, so upstream tests don't fail. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2561) Support topology spread constraints on placeholder pods
[ https://issues.apache.org/jira/browse/YUNIKORN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2561. Fix Version/s: 1.6.0 Resolution: Fixed > Support topology spread constraints on placeholder pods > --- > > Key: YUNIKORN-2561 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2561 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Jacob Salway >Assignee: Jacob Salway >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > If a pod has a topology spread constraint with a `whenUnsatisfiable: > DoNotSchedule` constraint and is used as part of a task group, it is not > possible to pass the constraint to the placeholder pods created by Yunikorn. > This can result in placeholder pods being placed on a node that would violate > the original pod's topology spread constraint. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2643) utils.go WaitForCondition improvement
[ https://issues.apache.org/jira/browse/YUNIKORN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2643. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. Thanks [~mean-world] for the contribution. > utils.go WaitForCondition improvement > -- > > Key: YUNIKORN-2643 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2643 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: HUAN-IU LIOU >Assignee: HUAN-IU LIOU >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2663) Improve ACL struct funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2663. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Improve ACL struct funtion's test coverage > -- > > Key: YUNIKORN-2663 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2663 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Remove unreachable code in NewACL func > Improve the following funtion's test coverage in acl.go > * TestSetUsers > * TestSetGroups -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2666) Fix DeepEqual comparison in Test_fixedRule_ruleDAO
[ https://issues.apache.org/jira/browse/YUNIKORN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2666. Fix Version/s: 1.6.0 Resolution: Fixed > Fix DeepEqual comparison in Test_fixedRule_ruleDAO > --- > > Key: YUNIKORN-2666 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2666 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler, test - unit >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > The test case {{Test_fixedRule_ruleDAO/filter}} can randomly fail due to the > non-deterministic nature of map key iteration: > {noformat} > fixed_rule_test.go:285: assertion failed: > --- tt.want > +++ ruleDAO > &dao.RuleDAO{ > Name: "fixed", > Parameters: {"create": "true", "qualified": "false", > "queue": "default"}, > Filter: &dao.FilterDAO{ > Type: "allow", > UserList: nil, > GroupList: []string{ > - "group1", > + "group2", > - "group2", > + "group1", > }, > UserExp: "", > GroupExp: "", > }, > ParentRule: nil, > } > {noformat} > We use {{maps.Keys()}} when we create the user list and group list in > {{FilterDAO}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2666) Fix DeepEqual comparison in Test_fixedRule_ruleDAO
Peter Bacsko created YUNIKORN-2666: -- Summary: Fix DeepEqual comparison in Test_fixedRule_ruleDAO Key: YUNIKORN-2666 URL: https://issues.apache.org/jira/browse/YUNIKORN-2666 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler, test - unit Reporter: Peter Bacsko The test case {{Test_fixedRule_ruleDAO/filter}} can randomly fail due to the non-deterministic nature of map key iteration: {noformat} fixed_rule_test.go:285: assertion failed: --- tt.want +++ ruleDAO &dao.RuleDAO{ Name: "fixed", Parameters: {"create": "true", "qualified": "false", "queue": "default"}, Filter: &dao.FilterDAO{ Type: "allow", UserList: nil, GroupList: []string{ - "group1", + "group2", - "group2", + "group1", }, UserExp: "", GroupExp: "", }, ParentRule: nil, } {noformat} We use {{maps.Keys()}} when we create the user list and group list in {{FilterDAO}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2650) Complete or remove web_server_test#TestProxy
[ https://issues.apache.org/jira/browse/YUNIKORN-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2650. Fix Version/s: 1.6.0 Resolution: Fixed > Complete or remove web_server_test#TestProxy > > > Key: YUNIKORN-2650 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2650 > Project: Apache YuniKorn > Issue Type: Test >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > web_server_test has a empty test case: TestProxy [0]. It seems to me there is > proxy-related test [1]. > [0] > https://github.com/apache/yunikorn-k8shim/blob/58adfe941d2d8dae5544af8b49e435f304678807/pkg/webtest/web_server_test.go#L82 > [1] > https://github.com/apache/yunikorn-k8shim/blob/58adfe941d2d8dae5544af8b49e435f304678807/pkg/webtest/web_server_test.go#L73 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2514) Update documentation about event.requestCapacity
[ https://issues.apache.org/jira/browse/YUNIKORN-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2514. Fix Version/s: 1.6.0 Resolution: Fixed > Update documentation about event.requestCapacity > > > Key: YUNIKORN-2514 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2514 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: documentation >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2654) Remove unused code in k8shim context
[ https://issues.apache.org/jira/browse/YUNIKORN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2654. Fix Version/s: 1.6.0 Resolution: Fixed > Remove unused code in k8shim context > > > Key: YUNIKORN-2654 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2654 > Project: Apache YuniKorn > Issue Type: Task > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Chenchen Lai >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 1.6.0 > > > The NotifyApplicationComplete and NotifyApplicationFail function are not > called by anything and are unused code. > The K8shim does not trigger the application completion or failure. This is > triggered by the core when the application no longer has any activity > registered. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2647) Flaky test TestUpdateNodeCapacity
[ https://issues.apache.org/jira/browse/YUNIKORN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2647. Fix Version/s: 1.6.0 Resolution: Fixed > Flaky test TestUpdateNodeCapacity > - > > Key: YUNIKORN-2647 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2647 > Project: Apache YuniKorn > Issue Type: Bug > Components: test - unit >Reporter: Wilfred Spiegelenburg >Assignee: Tseng Hsi-Huang >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 1.6.0 > > > Same as we saw in YUNIKORN-2573 the single node update test might fail: > {code:java} > --- FAIL: TestUpdateNodeCapacity (0.03s) > operation_test.go:446: Expected partition resource map[memory:1 > vcore:2], doesn't match with actual partition resource > map[memory:1 vcore:2]{code} > We calculate the delta resources when updating node capacity with that delta > we update resources in partition. > The test would fail with following order same as for multiple nodes > node.SetCapacity() -> waitForAvailableNodeResource() -> > partitionInfo.GetTotalPartitionResource() -> > partition.updatePartitionResource() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2659) Improve config validator funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2659. Fix Version/s: 1.6.0 Resolution: Fixed > Improve config validator funtion's test coverage > > > Key: YUNIKORN-2659 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2659 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Improve the following funtion's test coverage in configvalidator.go > * checkPlacementRule > * checkLimitResource > * checkLimit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2661) Fix hard-coded boolean in setLimit
Peter Bacsko created YUNIKORN-2661: -- Summary: Fix hard-coded boolean in setLimit Key: YUNIKORN-2661 URL: https://issues.apache.org/jira/browse/YUNIKORN-2661 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Peter Bacsko Assignee: Peter Bacsko Inside the UGM code {{setLimit()}}, we don't pass down {{doWildcardCheck}}, so this variables never reaches the leafs: {noformat} / Note: Lock free call. The Lock of the linked tracker (UserTracker and GroupTracker) should be held before calling this function. func (qt *QueueTracker) setLimit(hierarchy []string, maxResource *resources.Resource, maxApps uint64, useWildCard bool, trackType trackingType, doWildCardCheck bool) { log.Log(log.SchedUGM).Debug("Setting limits", zap.String("queue path", qt.queuePath), zap.Strings("hierarchy", hierarchy), zap.Uint64("max applications", maxApps), zap.Stringer("max resources", maxResource), zap.Bool("use wild card", useWildCard)) // depth first: all the way to the leaf, create if not exists // more than 1 in the slice means we need to recurse down if len(hierarchy) > 1 { childName := hierarchy[1] if qt.childQueueTrackers[childName] == nil { qt.childQueueTrackers[childName] = newQueueTracker(qt.queuePath, childName, trackType) } qt.childQueueTrackers[childName].setLimit(hierarchy[1:], maxResource, maxApps, useWildCard, trackType, false) ... {noformat} Fix this and create a unit test for {{setLimit()}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2649) Improve CalculateAbsUsedCapacity & CompUsageRatio funtion's test coverage in resources.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2649. Fix Version/s: 1.6.0 Resolution: Fixed > Improve CalculateAbsUsedCapacity & CompUsageRatio funtion's test coverage in > resources.go > - > > Key: YUNIKORN-2649 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2649 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2581) Expose running placement rules in REST
[ https://issues.apache.org/jira/browse/YUNIKORN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2581. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Expose running placement rules in REST > -- > > Key: YUNIKORN-2581 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2581 > Project: Apache YuniKorn > Issue Type: New Feature > Components: core - common >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Since introducing the use of placement rules always and the recovery rule the > queue config does not correctly show the running rules. > Also if a config update has been rejected, for any reason, the rules would > not be correct > Exposing the configured rules from the placement manager works around all > these issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2646) Deadlock detected during preemption
[ https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2646. Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed > Deadlock detected during preemption > --- > > Key: YUNIKORN-2646 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2646 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Dmitry >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > Attachments: yunikorn-logs-lock.txt.gz > > > Hitting deadlocks in 1.5.1 > The log is attached -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2542) Consistent logging and tracker handling for increment/decrement
[ https://issues.apache.org/jira/browse/YUNIKORN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2542. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. Thanks [~Tseng Hsi-Huang] for the contribution. > Consistent logging and tracker handling for increment/decrement > --- > > Key: YUNIKORN-2542 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2542 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Tseng Hsi-Huang >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > We log DEBUG output and use {{GroupTracker}} inconsistently in {{Manager}} > and in {{UserTracker}}. > Eg. > {{Manager.IncreaseTrackedResource()}}: only a single log output with DEBUG > level > {{Manager.DecreaseTrackedResource()}}: multiple log statements, also handles > the group tracker which is not the case with increments > This also affects {{UserTracker}} - logs&GroupTracker handling are different > in {{increaseTrackedResource()}}/{{decreaseTrackedResource()}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2567) Remove Application reference from applicationEvents
[ https://issues.apache.org/jira/browse/YUNIKORN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2567. Fix Version/s: 1.6.0 Resolution: Fixed > Remove Application reference from applicationEvents > --- > > Key: YUNIKORN-2567 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2567 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2642) Don't set resources on the recovery queue
[ https://issues.apache.org/jira/browse/YUNIKORN-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2642. Resolution: Fixed > Don't set resources on the recovery queue > - > > Key: YUNIKORN-2642 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2642 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > The resource constrainst can be set on dynamic queues based on application > tags. We should not set this on the recovery queue, because there's no quota > on them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2635) test coverage improvement: same priority case in sorter
[ https://issues.apache.org/jira/browse/YUNIKORN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2635. Fix Version/s: 1.6.0 Resolution: Fixed > test coverage improvement: same priority case in sorter > > > Key: YUNIKORN-2635 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2635 > Project: Apache YuniKorn > Issue Type: Test > Components: core - scheduler >Reporter: Chen Yu Teng >Assignee: Chen Yu Teng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2633) Unnecessary warning from Partition when adding an application
[ https://issues.apache.org/jira/browse/YUNIKORN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2633. Fix Version/s: 1.6.0 Resolution: Fixed > Unnecessary warning from Partition when adding an application > - > > Key: YUNIKORN-2633 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2633 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > The following is printed when adding an application: > {noformat} > 2024-05-17T21:53:04.716+0200 WARNcore.scheduler.queue > scheduler/partition.go:344 Trying to set resources on a queue that is > not an unmanaged leaf{"queueName": "root.default"} > {noformat} > This message is supposed to be printed when the application defines a > guaranteed or max resource. After YUNIKORN-2547 it's always printed if the > queue is managed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2642) Don't set resources on the recovery queue
Peter Bacsko created YUNIKORN-2642: -- Summary: Don't set resources on the recovery queue Key: YUNIKORN-2642 URL: https://issues.apache.org/jira/browse/YUNIKORN-2642 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Peter Bacsko Assignee: Peter Bacsko The resource constrainst can be set on dynamic queues based on application tags. We should not set this on the recovery queue, because there's no quota on them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2566) Remove AllocationAsk reference from askEvents
[ https://issues.apache.org/jira/browse/YUNIKORN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2566. Fix Version/s: 1.6.0 Resolution: Fixed > Remove AllocationAsk reference from askEvents > - > > Key: YUNIKORN-2566 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2566 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2565) Remove Node reference from nodeEvents
[ https://issues.apache.org/jira/browse/YUNIKORN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2565. Fix Version/s: 1.6.0 Resolution: Fixed > Remove Node reference from nodeEvents > - > > Key: YUNIKORN-2565 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2565 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2618) Streamline AsyncRMCallback UpdateAllocation
[ https://issues.apache.org/jira/browse/YUNIKORN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2618. Fix Version/s: 1.6.0 Resolution: Fixed > Streamline AsyncRMCallback UpdateAllocation > --- > > Key: YUNIKORN-2618 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2618 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Yun Sun >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > if task is not found, a nil is returned from {{context.getTask}} in for > {{response.New}} processing we should just log that fact and proceed to the > next alloc. Simplifies the flow as we never need to check for a. nil task. We > should never have a pod in the cache that does not exist as a task on an > application. > We retrieve the application using the application ID from the response to > never use the object. We only use the application ID to pass into an event. > The context event handler then does the exact same lookup again to process > the event on the app. > We need to become much smarter in this area, double or triple lookups, > generate async events that just change the state of the app or task or kick > off another event. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2611) [UMBRELLA] YuniKorn 1.5.1 release efforts
[ https://issues.apache.org/jira/browse/YUNIKORN-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2611. Fix Version/s: 1.5.1 Resolution: Fixed > [UMBRELLA] YuniKorn 1.5.1 release efforts > - > > Key: YUNIKORN-2611 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2611 > Project: Apache YuniKorn > Issue Type: Task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Blocker > Fix For: 1.5.1 > > > This umbrella is to track the work items needed for 1.5.0 release. > Release manager: Peter Bacsko. > This release only consists of bug fixes. Use the filter > [https://issues.apache.org/jira/issues/?filter=12353383] to see the list of > deliverables. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2614) Update website for 1.5.1
[ https://issues.apache.org/jira/browse/YUNIKORN-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2614. Fix Version/s: 1.5.1 Target Version: 1.5.1 Resolution: Fixed > Update website for 1.5.1 > > > Key: YUNIKORN-2614 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2614 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2639) Clarify release procedure for minor releases
Peter Bacsko created YUNIKORN-2639: -- Summary: Clarify release procedure for minor releases Key: YUNIKORN-2639 URL: https://issues.apache.org/jira/browse/YUNIKORN-2639 Project: Apache YuniKorn Issue Type: Task Components: release Reporter: Peter Bacsko After the release of 1.5.1, we realized that we need to properly define the release process for a minor release. This needs to be properly documented. The clarification should cover things like: # What it can and can't include (no features/bugfixes only) # How to publish docs? Shall we keep the current "a.b.c" version on the website or remove it and publish "a.b.c+1"? # Communication: possible difference in release notes, announcement, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2633) Unnecessary warning from Partition when adding an application
Peter Bacsko created YUNIKORN-2633: -- Summary: Unnecessary warning from Partition when adding an application Key: YUNIKORN-2633 URL: https://issues.apache.org/jira/browse/YUNIKORN-2633 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Peter Bacsko Assignee: Peter Bacsko The following is printed when adding an application: {noformat} 2024-05-17T21:53:04.716+0200WARNcore.scheduler.queue scheduler/partition.go:344 Trying to set resources on a queue that is not an unmanaged leaf{"queueName": "root.default"} {noformat} This message is supposed to be printed when the application defines a guaranteed or max resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2613) Release notes for 1.5.1
[ https://issues.apache.org/jira/browse/YUNIKORN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2613. Fix Version/s: 1.5.1 Resolution: Fixed > Release notes for 1.5.1 > --- > > Key: YUNIKORN-2613 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2613 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2632) Data race in IncAllocatedResource
[ https://issues.apache.org/jira/browse/YUNIKORN-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YUNIKORN-2632. Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed > Data race in IncAllocatedResource > - > > Key: YUNIKORN-2632 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2632 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > After YUNIKORN-2548, we accidentally make an unlocked access to > \{{Queue.allocatedResource}}. > {noformat} > WARNING: DATA RACE > Read at 0x00c000578a00 by goroutine 52: > > github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).IncAllocatedResource() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/queue.go:1032 > +0x6b > > github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNode() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/application.go:1495 > +0x184 > > github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNodes.func1() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/application.go:1402 > +0x144 > > github.com/apache/yunikorn-core/pkg/scheduler/objects.(*treeIterator).ForEachNode.func1() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/node_iterator.go:42 > +0x95 > github.com/google/btree.(*node[go.shape.interface { > Less(github.com/google/btree.Item) bool }]).iterate() > > /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:522 > +0x6f1 > github.com/google/btree.(*node[go.shape.interface { > Less(github.com/google/btree.Item) bool }]).iterate() > > /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 > +0x448 > github.com/google/btree.(*node[go.shape.interface { > Less(github.com/google/btree.Item) bool }]).iterate() > > /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 > +0x448 > github.com/google/btree.(*node[go.shape.interface { > Less(github.com/google/btree.Item) bool }]).iterate() > > /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 > +0x448 > github.com/google/btree.(*BTreeG[go.shape.interface { > Less(github.com/google/btree.Item) bool }]).Ascend() > > /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:779 > +0x108 > github.com/google/btree.(*BTree).Ascend() > > /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:1029 > +0x108 > > github.com/apache/yunikorn-core/pkg/scheduler/objects.(*treeIterator).ForEachNode() > ... > Previous write at 0x00c000578a00 by goroutine 49: > > github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).DecAllocatedResource() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/queue.go:1101 > +0x212 > > github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/partition.go:1357 > +0x17b4 > > github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/context.go:870 > +0xba > > github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/context.go:750 > +0x1e4 > github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/scheduler.go:133 > +0x28d > > github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.gowrap1() > > /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/scheduler.go:60 > +0x33 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2632) Data race in IncAllocatedResource
Peter Bacsko created YUNIKORN-2632: -- Summary: Data race in IncAllocatedResource Key: YUNIKORN-2632 URL: https://issues.apache.org/jira/browse/YUNIKORN-2632 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Peter Bacsko Assignee: Peter Bacsko After YUNIKORN-2548, we accidentally make an unlocked access to \{{Queue.allocatedResource}}. {noformat} WARNING: DATA RACE Read at 0x00c000578a00 by goroutine 52: github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).IncAllocatedResource() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/queue.go:1032 +0x6b github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNode() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/application.go:1495 +0x184 github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNodes.func1() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/application.go:1402 +0x144 github.com/apache/yunikorn-core/pkg/scheduler/objects.(*treeIterator).ForEachNode.func1() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/node_iterator.go:42 +0x95 github.com/google/btree.(*node[go.shape.interface { Less(github.com/google/btree.Item) bool }]).iterate() /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:522 +0x6f1 github.com/google/btree.(*node[go.shape.interface { Less(github.com/google/btree.Item) bool }]).iterate() /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 +0x448 github.com/google/btree.(*node[go.shape.interface { Less(github.com/google/btree.Item) bool }]).iterate() /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 +0x448 github.com/google/btree.(*node[go.shape.interface { Less(github.com/google/btree.Item) bool }]).iterate() /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 +0x448 github.com/google/btree.(*BTreeG[go.shape.interface { Less(github.com/google/btree.Item) bool }]).Ascend() /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:779 +0x108 github.com/google/btree.(*BTree).Ascend() /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:1029 +0x108 github.com/apache/yunikorn-core/pkg/scheduler/objects.(*treeIterator).ForEachNode() ... Previous write at 0x00c000578a00 by goroutine 49: github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).DecAllocatedResource() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/queue.go:1101 +0x212 github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/partition.go:1357 +0x17b4 github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/context.go:870 +0xba github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/context.go:750 +0x1e4 github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/scheduler.go:133 +0x28d github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.gowrap1() /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/scheduler.go:60 +0x33 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org