[jira] [Resolved] (YUNIKORN-2869) Tagging for 1.6.0

2024-09-17 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2869.

 Fix Version/s: 1.6.0
Target Version: 1.6.0
Resolution: Fixed

> Tagging for 1.6.0
> -
>
> Key: YUNIKORN-2869
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2869
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2874) Cannot generate reproducible builds during release

2024-09-11 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2874:
--

 Summary: Cannot generate reproducible builds during release
 Key: YUNIKORN-2874
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2874
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: release
Reporter: Peter Bacsko


When trying to release Yunikorn with REPRODUCIBLE_BUILDS=1 (default), then the 
following error occurs:
{noformat}
~/repos/yunikorn-release/staging/tmp/apache-yunikorn-1.6.0-src/k8shim$ make 
REPRODUCIBLE_BUILDS=1 scheduler
building binary for scheduler docker image
docker run -t --rm=true --volume 
"/home/bacskop/repos/yunikorn-release/staging/tmp/apache-yunikorn-1.6.0-src/k8shim/:/buildroot"
 "golang:1.22.1" sh -c "cd /buildroot && \
CGO_ENABLED=0 GOOS=linux GOARCH=\"amd64\" go build \
-a \
-o=build/bin/yunikorn-scheduler \
-trimpath \
-ldflags '-buildid= -extldflags \"-static\" -X 
github.com/apache/yunikorn-k8shim/pkg/conf.buildVersion=1.6.0 -X 
github.com/apache/yunikorn-k8shim/pkg/conf.buildDate=2024-09-10T08:26:23+00:00 
-X github.com/apache/yunikorn-k8shim/pkg/conf.isPluginVersion=false -X 
github.com/apache/yunikorn-k8shim/pkg/conf.goVersion=1.22.1 -X 
github.com/apache/yunikorn-k8shim/pkg/conf.arch=amd64 -X 
github.com/apache/yunikorn-k8shim/pkg/conf.coreSHA=a2d40c81fee104356f9e33120fd557a928f74f2b
 -X 
github.com/apache/yunikorn-k8shim/pkg/conf.siSHA=68e8c6cca28a743d797e7908b1225392a3a2
 -X 
github.com/apache/yunikorn-k8shim/pkg/conf.shimSHA=240aeb90951a30c677890b61b50f6bfcafb227b5'
 \
-tags netgo \
-installsuffix netgo \
./pkg/cmd/shim/"
go: downloading go.uber.org/zap v1.26.0
go: downloading k8s.io/api v0.31.0
...
go: downloading github.com/spf13/cobra v1.8.1
go: downloading golang.org/x/sync v0.8.0
go: downloading github.com/asaskevich/govalidator 
v0.0.0-20190424111038-f61b66f89f4a
pkg/cmd/shim/main.go:31:2: github.com/apache/yunikorn-core@v1.6.0-1: 
replacement directory ../core/ does not exist
pkg/common/constants/constants.go:22:2: 
github.com/apache/yunikorn-scheduler-interface@v1.6.0-1: replacement directory 
../scheduler-interface/ does not exist
pkg/locking/locking.go:26:2: github.com/apache/yunikorn-core@v1.6.0-1: 
replacement directory ../core/ does not exist
pkg/common/test/schedulerapi_mock.go:25:2: 
github.com/apache/yunikorn-scheduler-interface@v1.6.0-1: replacement directory 
../scheduler-interface/ does not exist
pkg/common/test/recoverable_apps_mock.go:25:2: 
github.com/apache/yunikorn-scheduler-interface@v1.6.0-1: replacement directory 
../scheduler-interface/ does not exist
{noformat}
The problem is, we don't mount "../core" and "../scheduler-interface" during 
the release procedure.  This is slightly different from a normal build, because 
we use the "replace" directive in "go.mod", so the two extra directories are 
necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2493) Preemption Hardening Phase 1

2024-09-10 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2493.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Preemption Hardening Phase 1
> 
>
> Key: YUNIKORN-2493
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2493
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2870) Release notes for 1.6.0

2024-09-10 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2870:
--

 Summary: Release notes for 1.6.0
 Key: YUNIKORN-2870
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2870
 Project: Apache YuniKorn
  Issue Type: Sub-task
Reporter: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2341) New Queue Web UI

2024-09-09 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2341.

Resolution: Fixed

> New Queue Web UI 
> -
>
> Key: YUNIKORN-2341
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2341
> Project: Apache YuniKorn
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Dong-Lin Hsieh
>Assignee: Dong-Lin Hsieh
>Priority: Major
> Fix For: 1.6.0
>
>
> Fresh new Web UI to visualize queues in YuniKorn.
> Subtasks 1 through 12 are the basic components of the new UI. 
> These components will be used in at least two places.
>  # Visualize any valid YuniKorn {{{}config.yaml{}}}.
>  # Visualize the current queues that YuniKorn is using.
> Inspired by [tableau/query-graphs|https://github.com/tableau/query-graphs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2863) New Queue Web UI phase II

2024-09-09 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2863:
--

 Summary: New Queue Web UI phase II
 Key: YUNIKORN-2863
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2863
 Project: Apache YuniKorn
  Issue Type: Improvement
Reporter: Peter Bacsko
Assignee: Dong-Lin Hsieh






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2323) Gang scheduling user experience issues

2024-09-09 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2323.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Gang scheduling user experience issues
> --
>
> Key: YUNIKORN-2323
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2323
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.4.0
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> In case of any issues, users are finding it bit difficult to understand what 
> is going on with the gang app. 
> Issue 1:
> "driver pod is getting struck"
> At times, when driver pod is not able to run successfully for some reasons, 
> users are getting the perspective that pod is getting struck and app is 
> hanged, not moving further. Users are waiting for some time and don't 
> understand the clear picture. How do we close the gap quickly and communicate 
> accordingly through events?
> Issue 2:
> ResumeApplication is fired when all ph's are timed out. Do we need to inform 
> the users about this event as they may not clue any about this significant 
> change?
> Issue 3: 
> When Gang app ph's are in progress (and allocated), when there is request for 
> real asks and there is resource crunch, do we need to trigger auto scaling?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2842) Improve metadata & gang_utils funtion's test coverage

2024-08-28 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2842.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Improve metadata & gang_utils funtion's test coverage
> -
>
> Key: YUNIKORN-2842
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2842
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: shim - kubernetes
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Improve the following test coverage:
>  * GetPlaceholderResourceRequests (empty resource key case)
>  * getTaskMetadata (appID empty string case)
>  * getAppMetadata (get GetTaskGroupsFromAnnotation error case)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2845) Remove SchedulerConf.TestMode

2024-08-27 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2845:
--

 Summary: Remove SchedulerConf.TestMode
 Key: YUNIKORN-2845
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2845
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Peter Bacsko
Assignee: Peter Bacsko


After YUNIKORN-2844, there will be no need to use 
{{SchedulerConf.IsTestMode()}} in the production code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2844) Inject event recorder externally

2024-08-27 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2844:
--

 Summary: Inject event recorder externally
 Key: YUNIKORN-2844
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2844
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The current implementation creates an event recorder like that:

{noformat}
func GetRecorder() events.EventRecorder {
lock.Lock()
defer lock.Unlock()
once.Do(func() {
// note, the initiation of the event recorder requires on a 
workable Kubernetes client,
// in test mode we should skip this and just use a fake 
recorder instead.
configs := conf.GetSchedulerConf()
if !configs.IsTestMode() {
k8sClient := client.NewKubeClient(configs.KubeConfig)
eventBroadcaster := 
events.NewBroadcaster(&events.EventSinkImpl{
Interface: k8sClient.GetClientSet().EventsV1()})
eventBroadcaster.StartRecordingToSink(make(<-chan 
struct{}))
eventRecorder = 
eventBroadcaster.NewRecorder(scheme.Scheme, constants.SchedulerName)
}
})

return eventRecorder
}
{noformat}

The problem with this approach is that we need to indicate "test mode" in the 
config, which just complicates things. 

We can simplify this code if the recorder is set during Yunikorn initialization 
in {{NewScheduler()}}. The plugin code already does this in 
{{NewSchedulerPlugin()}} and calls 
{{events.SetRecorder(handle.EventRecorder())}}.

We should also get rid of the default fake recorder. This uses a buffered 
channel with the size of 1024. This isn't a problem now, but if a new test 
somehow ends up generating a lot of events, message sending will block. It 
might not be obvious to someone to understand why running a unit test just 
starts to block suddenly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2833) [SI] Track non-Yunikorn allocations

2024-08-24 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2833:
--

 Summary: [SI] Track non-Yunikorn allocations
 Key: YUNIKORN-2833
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2833
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: scheduler-interface
Reporter: Peter Bacsko
Assignee: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2834) [shim] Track non-Yunikorn allocations

2024-08-24 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2834:
--

 Summary: [shim] Track non-Yunikorn allocations
 Key: YUNIKORN-2834
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2834
 Project: Apache YuniKorn
  Issue Type: Sub-task
Reporter: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2832) [core] Track non-Yunikorn allocations

2024-08-24 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2832:
--

 Summary: [core] Track non-Yunikorn allocations
 Key: YUNIKORN-2832
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2832
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2831) Update golangcli lint

2024-08-23 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2831:
--

 Summary: Update golangcli lint
 Key: YUNIKORN-2831
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2831
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler, shim - kubernetes
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Go 1.23 has been released, but the linter version 1.57.2 does not support it:

{noformat}
~/repos/yunikorn-core$ make lint
installing golangci-lint v1.57.2
running golangci-lint
Killed
make: *** [Makefile:133: lint] Error 137
{noformat}

According to the [release 
page|https://github.com/golangci/golangci-lint/releases], we need at least 
1.60.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2777) Improve TrackedResource type

2024-08-13 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2777.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Improve TrackedResource type
> 
>
> Key: YUNIKORN-2777
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2777
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Currently, TrackedResource is defined as:
> {noformat}
> type TrackedResource struct {
>   TrackedResourceMap map[string]map[string]int64
>   locking.RWMutex
> }
> {noformat}
> As it turned out during the review of  
> [YUNIKORN-2652|https://github.com/apache/yunikorn-core/pull/897],  
> {{TrackedResourceMap}} is actually {{map[string]*Resource}}. If we change the 
> definition, we'll be able to use the existing functions for {{Resource}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2528) Increase coverage for UGM code

2024-08-12 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2528.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Increase coverage for UGM code
> --
>
> Key: YUNIKORN-2528
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2528
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The following branches are not covered properly by unit tests:
>  # {{GroupTracker.decreaseTrackedResource()}}: when {{gt == nil}}
>  # {{Manager.DecreaseTrackedResource()}}: when there's no UserTracker
>  # {{Manager.DecreaseTrackedResource()}}: when there's no GroupTracker
>  # {{Manager.DecreaseTrackedResource()}}: when groupTracker decrement returns 
> true
>  # {{QueueTracker.decreaseTrackedResource()}}: when there's no child tracker
> See https://app.codecov.io/gh/apache/yunikorn-core/pull/810.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2429) Enhance UGM Manager test coverage

2024-08-12 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2429.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Enhance UGM Manager test coverage
> -
>
> Key: YUNIKORN-2429
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2429
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> During the review of YUNIKORN-2116, we noticed that a certain mistake was 
> made in {{clearEarlierSetUserWildCardLimits()}}, but it was not caught by the 
> unit tests.
> Ensure proper coverage to verify configuration update.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2756) Consider moving event_system#defaultEventChannelSize to configs#const

2024-08-12 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2756.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Consider moving event_system#defaultEventChannelSize to configs#const
> -
>
> Key: YUNIKORN-2756
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2756
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Trivial
>  Labels: newbie, pull-request-available
> Fix For: 1.6.0
>
>
> All other event-related configs are in configs#const[1], so we should move 
> `defaultEventChannelSize`[0] make them together.
> BTW, `defaultRingBufferSize`[2] will be removed by 
> https://github.com/apache/yunikorn-core/pull/915, since its replacement is in 
> configs#const already [3]
> [0] 
> https://github.com/apache/yunikorn-core/blob/f25bee90c2abd2c6682912dfdd0013ef2f4bc0ba/pkg/events/event_system.go#L37
> [1] 
> https://github.com/apache/yunikorn-core/blob/f25bee90c2abd2c6682912dfdd0013ef2f4bc0ba/pkg/common/configs/configs.go#L43
> [2] 
> https://github.com/apache/yunikorn-core/blob/f25bee90c2abd2c6682912dfdd0013ef2f4bc0ba/pkg/events/event_system.go#L38
> [3] 
> https://github.com/apache/yunikorn-core/blob/f25bee90c2abd2c6682912dfdd0013ef2f4bc0ba/pkg/common/configs/configs.go#L47



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2681) Data race in TestCheckHealthStatusNotFound

2024-08-08 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2681.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Data race in TestCheckHealthStatusNotFound 
> ---
>
> Key: YUNIKORN-2681
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2681
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler, test - unit
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Data race was detected during an unit test:
> {noformat}
> ==
> WARNING: DATA RACE
> Write at 0x0170c220 by goroutine 2575:
>   github.com/apache/yunikorn-core/pkg/webservice.NewWebApp()
>   
> /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/webservice.go:82 
> +0x11c
>   
> github.com/apache/yunikorn-core/pkg/webservice.TestCheckHealthStatusNotFound()
>   
> /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2574
>  +0x2f
>   testing.tRunner()
>   /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e
>   testing.(*T).Run.gowrap1()
>   /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x44
> Previous read at 0x0170c220 by goroutine 2542:
>   github.com/apache/yunikorn-core/pkg/webservice.getStream()
>   
> /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers.go:1225 
> +0xbd3
>   github.com/apache/yunikorn-core/pkg/webservice.TestGetStream_Limit.gowrap4()
>   
> /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2308
>  +0x4f
> Goroutine 2575 (running) created at:
>   testing.(*T).Run()
>   /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x825
>   testing.runTests.func1()
>   /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2161 +0x85
>   testing.tRunner()
>   /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e
>   testing.runTests()
>   /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2159 +0x8be
>   testing.(*M).Run()
>   /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2027 +0xf17
>   main.main()
>   _testmain.go:163 +0x2e4
> Goroutine 2542 (running) created at:
>   github.com/apache/yunikorn-core/pkg/webservice.TestGetStream_Limit()
>   
> /home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2308
>  +0xbb7
>   testing.tRunner()
>   /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e
>   testing.(*T).Run.gowrap1()
>   /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x44
> ==
> 2024-06-18T13:40:54.182Z  INFOcore.events 
> events/event_streaming.go:164   Removing event stream consumer  {"name": 
> "host-1", "creation time": "2024-06-18T13:40:54.181Z"}
> 2024-06-18T13:40:54.182Z  INFOcore.scheduler.health   
> webservice/handlers.go:623  Health check is not available
> --- FAIL: TestCheckHealthStatusNotFound (0.00s)
> testing.go:1398: race detected during execution of test
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2792) Create design doc

2024-08-07 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2792:
--

 Summary: Create design doc
 Key: YUNIKORN-2792
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2792
 Project: Apache YuniKorn
  Issue Type: Sub-task
Reporter: Peter Bacsko
Assignee: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2791) Track non-Yunikorn allocations in the core

2024-08-07 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2791:
--

 Summary: Track non-Yunikorn allocations in the core
 Key: YUNIKORN-2791
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2791
 Project: Apache YuniKorn
  Issue Type: New Feature
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Currently, we don't know what non-YK pods are assigned to a particular node in 
the core. We only track the total amount of allocations as 
{{occupiedResources}} object inside the {{objects.Node}} type. If the tracking 
somehow becomes out of sync with the actual cluster state, it's very difficult 
to know what went wrong, because these allocations are not shown in the state 
dump.

In order to enhance supportability, we want to track all non-YK pods per node 
on the core side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2760) `make tools` should check the version of tools

2024-08-07 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2760.

Resolution: Fixed

Merged to master in both repos.

> `make tools` should check the version of tools
> --
>
> Key: YUNIKORN-2760
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2760
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Makefile, by default, checks only the existence of file. Hence, developers 
> need to remove tools folder (or call `make distclean`) manually to trigger 
> the installation after we update the version of tools.
> However, how developers can be aware of the tools updates? Personally, I 
> smell fishy from the error of warning, but that could be implicit and noisy 
> :cry
> In order to fix that, I'd like to introduce the new folder structure to tools 
> folder: 
> {code:java}
> /tools/{tool_name}-{version}
> {code}
>  That offers a unique path to each version of tool. Developers will not miss 
> the updates anymore.
> *rejected proposal*
> {code:java}
> /tools/{tool_name}/{version}
> {code}
>  That offers a unique path to each version of tool. Developers will not miss 
> the updates anymore.
> NOTED: we need to remove the existent tool binary if there is naming conflict 
> in creating the new path. For example, creating /tools/golangci-lint/1.57.2 
> will fail if /tools/golangci-lint is a existent file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2787) Eliminate gosec lint warnings

2024-08-07 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2787.

Resolution: Not A Problem

> Eliminate gosec lint warnings 
> --
>
> Key: YUNIKORN-2787
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2787
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>  Labels: pull-request-available
>
> Use the nolint directive to get rid of the following warnings from the test 
> code
> {noformat}
> pkg/scheduler/objects/node_collection_test.go:315:18: G602: Potentially 
> accessing slice out of bounds (gosec)
>   assert.Equal(t, nodes[0].NodeID, "node-2", "wrong node 0")
>   ^
> pkg/scheduler/objects/node_collection_test.go:316:18: G602: Potentially 
> accessing slice out of bounds (gosec)
>   assert.Equal(t, nodes[1].NodeID, "node-4", "wrong node 1")
>   ^
> pkg/scheduler/objects/node_collection_test.go:317:18: G602: Potentially 
> accessing slice out of bounds (gosec)
>   assert.Equal(t, nodes[2].NodeID, "node-1", "wrong node 2")
>   ^
> pkg/scheduler/objects/node_collection_test.go:318:18: G602: Potentially 
> accessing slice out of bounds (gosec)
>   assert.Equal(t, nodes[3].NodeID, "node-3", "wrong node 3")
>   ^
> pkg/scheduler/objects/nodesorting_test.go:200:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node 
> (fair)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:201:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node 
> (fair)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:214:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node2.NodeID, nodes[0].NodeID, "wrong initial node 
> (binpacking)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:215:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node1.NodeID, nodes[1].NodeID, "wrong second node 
> (binpacking)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:244:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node 
> (binpacking, empty allocation)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:245:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node 
> (binpacking, empty allocation)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:256:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node 
> (fair, empty allocation)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:257:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node 
> (fair, empty allocation)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:274:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node2.NodeID, nodes[0].NodeID, "wrong initial node 
> (binpacking, node2 half-filled)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:275:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node1.NodeID, nodes[1].NodeID, "wrong second node 
> (binpacking, node2 half-filled")
> ^
> pkg/scheduler/objects/nodesorting_test.go:287:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node 
> (fair, node2 half-filled)")
> ^
> pkg/scheduler/objects/nodesorting_test.go:288:32: G602: Potentially accessing 
> slice out of bounds (gosec)
>   assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node 
> (binpacking, node2 half-filled")
> ^
> make: *** [Makefile:131: lint] Error 1
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2787) Eliminate gosec lint warnings

2024-08-05 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2787:
--

 Summary: Eliminate gosec lint warnings 
 Key: YUNIKORN-2787
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2787
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Use the nolint directive to get rid of the following warnings from the test code

{noformat}
running golangci-lint
pkg/scheduler/objects/node_collection_test.go:315:18: G602: Potentially 
accessing slice out of bounds (gosec)
assert.Equal(t, nodes[0].NodeID, "node-2", "wrong node 0")
^
pkg/scheduler/objects/node_collection_test.go:316:18: G602: Potentially 
accessing slice out of bounds (gosec)
assert.Equal(t, nodes[1].NodeID, "node-4", "wrong node 1")
^
pkg/scheduler/objects/node_collection_test.go:317:18: G602: Potentially 
accessing slice out of bounds (gosec)
assert.Equal(t, nodes[2].NodeID, "node-1", "wrong node 2")
^
pkg/scheduler/objects/node_collection_test.go:318:18: G602: Potentially 
accessing slice out of bounds (gosec)
assert.Equal(t, nodes[3].NodeID, "node-3", "wrong node 3")
^
pkg/scheduler/objects/nodesorting_test.go:200:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node 
(fair)")
  ^
pkg/scheduler/objects/nodesorting_test.go:201:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node 
(fair)")
  ^
pkg/scheduler/objects/nodesorting_test.go:214:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node2.NodeID, nodes[0].NodeID, "wrong initial node 
(binpacking)")
  ^
pkg/scheduler/objects/nodesorting_test.go:215:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node1.NodeID, nodes[1].NodeID, "wrong second node 
(binpacking)")
  ^
pkg/scheduler/objects/nodesorting_test.go:244:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node 
(binpacking, empty allocation)")
  ^
pkg/scheduler/objects/nodesorting_test.go:245:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node 
(binpacking, empty allocation)")
  ^
pkg/scheduler/objects/nodesorting_test.go:256:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node 
(fair, empty allocation)")
  ^
pkg/scheduler/objects/nodesorting_test.go:257:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node 
(fair, empty allocation)")
  ^
pkg/scheduler/objects/nodesorting_test.go:274:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node2.NodeID, nodes[0].NodeID, "wrong initial node 
(binpacking, node2 half-filled)")
  ^
pkg/scheduler/objects/nodesorting_test.go:275:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node1.NodeID, nodes[1].NodeID, "wrong second node 
(binpacking, node2 half-filled")
  ^
pkg/scheduler/objects/nodesorting_test.go:287:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node1.NodeID, nodes[0].NodeID, "wrong initial node 
(fair, node2 half-filled)")
  ^
pkg/scheduler/objects/nodesorting_test.go:288:32: G602: Potentially accessing 
slice out of bounds (gosec)
assert.Equal(t, node2.NodeID, nodes[1].NodeID, "wrong second node 
(binpacking, node2 half-filled")
  ^
make: *** [Makefile:131: lint] Error 1
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2706) [UMBRELLA] YuniKorn 1.5.2 release efforts

2024-08-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2706.

Fix Version/s: 1.5.2
   Resolution: Fixed

> [UMBRELLA] YuniKorn 1.5.2 release efforts
> -
>
> Key: YUNIKORN-2706
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2706
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.5.2
>
>
> This umbrella is to track the work items needed for the 1.5.2 release.
> Release manager: Peter Bacsko.
> This release only contains bug fixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2629) Adding a node can result in a deadlock

2024-08-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2629.

 Fix Version/s: 1.6.0
Target Version: 1.5.2, 1.6.0  (was: 1.6.0, 1.5.2)
Resolution: Fixed

> Adding a node can result in a deadlock
> --
>
> Key: YUNIKORN-2629
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2629
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Affects Versions: 1.5.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
> Attachments: updateNode_deadlock_trace.txt, 
> yunikorn-scheduler-20240627.log, yunikorn_stuck_stack_20240708.txt
>
>
> Adding a new node after Yunikorn state initialization can result in a 
> deadlock.
> The problem is that {{Context.addNode()}} holds a lock while we're waiting 
> for the {{NodeAccepted}} event:
> {noformat}
>dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, 
> func(event interface{}) {
>   nodeEvent, ok := event.(CachedSchedulerNodeEvent)
>   if !ok {
>   return
>   }
>   [...] removed for clarity
>   wg.Done()
>   })
>   defer dispatcher.UnregisterEventHandler(handlerID, 
> dispatcher.EventTypeNode)
>   if err := 
> ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateNode(&si.NodeRequest{
>   Nodes: nodesToRegister,
>   RmID:  schedulerconf.GetSchedulerConf().ClusterID,
>   }); err != nil {
>   log.Log(log.ShimContext).Error("Failed to register nodes", 
> zap.Error(err))
>   return nil, err
>   }
>   // wait for all responses to accumulate
>   wg.Wait()  <--- shim gets stuck here
>  {noformat}
> If tasks are being processed, then the dispatcher will try to retrieve the 
> evend handler, which is returned from Context:
> {noformat}
> go func() {
>   for {
>   select {
>   case event := <-getDispatcher().eventChan:
>   switch v := event.(type) {
>   case events.TaskEvent:
>   getEventHandler(EventTypeTask)(v)  <--- 
> eventually calls Context.getTask()
>   case events.ApplicationEvent:
>   getEventHandler(EventTypeApp)(v)
>   case events.SchedulerNodeEvent:
>   getEventHandler(EventTypeNode)(v)  
> {noformat}
> Since {{addNode()}} is holding a write lock, the event processing loop gets 
> stuck, so {{registerNodes()}} will never progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2319) cache.Task: reference to old pod object is kept after update

2024-08-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2319.

Fix Version/s: 1.6.0
   Resolution: Fixed

> cache.Task: reference to old pod object is kept after update
> 
>
> Key: YUNIKORN-2319
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2319
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
> Attachments: 2024-01-09 134112.png, 2024-01-09 134130.png
>
>
> There is a kind of memory leak in the shim: when the pod is updated, the old 
> pod object is still referenced from Task, so the GC has no chance to remove 
> it (only when the pod terminates).
> See screenshot: task points to version 80199, scheduler cache already has a 
> newer version 81216.
> We have two solutions:
> 1. Update the object in the Task together with the scheduler cache
> 2. Don't store the pointer to the pod, instead, always retrieve it from the 
> scheduler cache



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler

2024-08-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2717.

Fix Version/s: 1.6.0
   Resolution: Fixed

[~chiahsuan] 

> Assert invalid queue name in get queue applications handler
> ---
>
> Key: YUNIKORN-2717
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2717
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Chia Hsuan Chang
>Priority: Minor
>  Labels: newbie
> Fix For: 1.6.0
>
>
> Assert invalid queue name in TestGetQueueApplicationsHandler test method 
> using 
> assertQueueInvalid(). Also cleanup the method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Reopened] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler

2024-08-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko reopened YUNIKORN-2717:


> Assert invalid queue name in get queue applications handler
> ---
>
> Key: YUNIKORN-2717
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2717
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Chia Hsuan Chang
>Priority: Minor
>  Labels: newbie
>
> Assert invalid queue name in TestGetQueueApplicationsHandler test method 
> using 
> assertQueueInvalid(). Also cleanup the method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2766) Only generate event if all predicates failed

2024-07-31 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2766.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Only generate event if all predicates failed
> 
>
> Key: YUNIKORN-2766
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2766
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Right now, we send an event to the pod if a predicate failed:
> {noformat}
>if err := plugin.Predicates(&si.PredicatesArgs{
>   AllocationKey: allocationKey,
>   NodeID:sn.NodeID,
>   Allocate:  allocate,
>   }); err != nil {
>   log.Log(log.SchedNode).Debug("running predicates 
> failed",
>   zap.String("allocationKey", allocationKey),
>   zap.String("nodeID", sn.NodeID),
>   zap.Bool("allocateFlag", allocate),
>   zap.Error(err))
>   // running predicates failed
>   msg := err.Error()
>   ask.LogAllocationFailure(msg, allocate)
>   ask.SendPredicateFailedEvent(msg)
>   return false
>   }
> {noformat}
> This is, however, not correct. We should only generate an event if *all* 
> predicates have failed, which means that the pod cannot be scheduled. A 
> failing predicate for a given node can be perfectly normal in many cases.
> Instead, we should aggregate the failed predicates and send an event like:
> {noformat}
> All predicates failed for request '345d70d7-243a-4077-a9f8-0bb76c3532d7': 
> node(s) didn't match Pod's node affinity/selector (20x); node(s) had taints 
> that the pod didn't tolerate (5x)
> {noformat}
> where 20x and 5x tell how many times a certain predicate failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2652) Expand getApplication() endpoint handler to return resource usage

2024-07-30 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2652.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Expand getApplication() endpoint handler to return resource usage
> -
>
> Key: YUNIKORN-2652
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2652
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Rich Scott
>Assignee: Rich Scott
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Some users would like to be able to see resource usage (preempted, 
> placeholder resource, etc) for applications that have been completed. The 
> `getApplication()` endpoint handler should be enhanced to take an optional 
> parameter specifying that the user would like details about resources 
> included in the response, and a new `ApplicationXXXDAOInfo` object that is a 
> slight superset of `ApplicationDAOInfo` should be introduced, and can be used 
> in the response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2777) Improve TrackedResource type

2024-07-30 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2777:
--

 Summary: Improve TrackedResource type
 Key: YUNIKORN-2777
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2777
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Peter Bacsko


Currently, TrackedResource is defined as:
{noformat}
type TrackedResource struct {
TrackedResourceMap map[string]map[string]int64
locking.RWMutex
}
{noformat}

As it turned out during the review of  
[YUNIKORN-2652|https://github.com/apache/yunikorn-core/pull/897],  
{{TrackedResourceMap}} is actually {{map[string]*Resource}}. If we change the 
definition, we'll be able to use the existing functions that already exist for 
{{Resource}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2707) Tagging for 1.5.2

2024-07-26 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2707.

Fix Version/s: 1.5.2
   Resolution: Fixed

> Tagging for 1.5.2
> -
>
> Key: YUNIKORN-2707
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2707
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2759) Replace %w by Errors.join

2024-07-26 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2759.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Replace %w by Errors.join
> -
>
> Key: YUNIKORN-2759
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2759
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> original discussion: https://issues.apache.org/jira/browse/YUNIKORN-2262
> Errors.join can make the code more performant and readable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2770) Simplify Application.GetTask()

2024-07-26 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2770.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Simplify Application.GetTask()
> --
>
> Key: YUNIKORN-2770
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2770
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> {{Application.GetTask()}} returns a {{*Task}} and an {{error}}, but the 
> {{error}} is completely unnecessary. We either have the task for the given 
> taskID or we don't. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2765) Improve si_helper & resource funtion's test coverage

2024-07-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2765.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Improve si_helper & resource funtion's test coverage
> 
>
> Key: YUNIKORN-2765
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2765
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: shim - kubernetes
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Improve the following funtion's test coverage
>  * GetTerminationTypeFromString (unknow terminationtype)
>  * getMaxResource (requested resource types are fewer than allocated types)
>  * GetResource
>  * GetTGResource



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2770) Simplify Application.GetTask()

2024-07-25 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2770:
--

 Summary: Simplify Application.GetTask()
 Key: YUNIKORN-2770
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2770
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Peter Bacsko
Assignee: Peter Bacsko


{{Application.GetTask()}} returns a {{*Task}} and an {{error}}, but the 
{{error}} is completely unnecessary. We either have the task for the given 
taskID or we don't. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2707) Tagging for 1.5.2

2024-07-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2707.

Fix Version/s: 1.5.2
   Resolution: Fixed

> Tagging for 1.5.2
> -
>
> Key: YUNIKORN-2707
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2707
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Reopened] (YUNIKORN-2707) Tagging for 1.5.2

2024-07-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko reopened YUNIKORN-2707:


> Tagging for 1.5.2
> -
>
> Key: YUNIKORN-2707
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2707
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2766) Only generate event if all predicates failed

2024-07-22 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2766:
--

 Summary: Only generate event if all predicates failed
 Key: YUNIKORN-2766
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2766
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Right now, we send an event to the pod if a predicate failed:

{noformat}
if err := plugin.Predicates(&si.PredicatesArgs{
AllocationKey: allocationKey,
NodeID:sn.NodeID,
Allocate:  allocate,
}); err != nil {
log.Log(log.SchedNode).Debug("running predicates 
failed",
zap.String("allocationKey", allocationKey),
zap.String("nodeID", sn.NodeID),
zap.Bool("allocateFlag", allocate),
zap.Error(err))
// running predicates failed
msg := err.Error()
ask.LogAllocationFailure(msg, allocate)
ask.SendPredicateFailedEvent(msg)
return false
}
{noformat}

This is, however, not correct. We should only generate an event if *all* 
predicates have failed, which means that the pod cannot be scheduled. A failing 
predicate for a given node can be perfectly normal in many cases.

Instead, we should aggregate the failed predicates and send an event like:

{noformat}
All predicates failed for request '345d70d7-243a-4077-a9f8-0bb76c3532d7': 
node(s) didn't match Pod's node affinity/selector (20x), node(s) had taints 
that the pod didn't tolerate (5x)
{noformat}

where 20x and 5x tell how many times a certain predicate failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2725) Temporarily disable failing e2e preemption tests

2024-07-04 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2725.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Temporarily disable failing e2e preemption tests
> 
>
> Key: YUNIKORN-2725
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2725
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: shim - kubernetes, test - e2e
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Disable the following tests to have green builds:
> Verify_preemption_on_priority_queue
> Verify_basic_preemption
> Verify_allow_preemption_tag



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2725) Temporarily disable failing e2e tests

2024-07-04 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2725:
--

 Summary: Temporarily disable failing e2e tests
 Key: YUNIKORN-2725
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2725
 Project: Apache YuniKorn
  Issue Type: Test
  Components: shim - kubernetes, test - e2e
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Disable the following tests to have green builds:

Verify_preemption_on_priority_queue
Verify_basic_preemption



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2724) Improve the signature of methods notifyTaskComplete() and ensureAppAndTaskCreated()

2024-07-04 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2724:
--

 Summary: Improve the signature of methods notifyTaskComplete() and 
ensureAppAndTaskCreated()
 Key: YUNIKORN-2724
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2724
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Peter Bacsko


>From the review [https://github.com/apache/yunikorn-k8shim/pull/864]

"I also think we need to change the signature for {{notifyTaskComplete(string, 
string)}} to {{notifyTaskComplete(*Application, string)}} Probably better to 
use a separate jira for that as it flows through into {{NotifyTaskComplete()}} 
and some tests. The 2 tests have the application pointer already. It removes a 
number of extra getApplication() calls we really do not need.
Similar for {{ensureAppAndTaskCreated()}} which is only ever called from this 
function. Add a parameter to it to make it: {{ensureAppAndTaskCreated(*v1.Pod, 
*Application)}} and only execute application creation {{{}if app == nil{}}}. 
This can be either in this jira or in a separate one."

That is, optimize the methods so that we avoid unnecessary {{GetApplication()}} 
calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2182) Set ReadHeaderTimeout in http server

2024-07-03 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2182.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Set ReadHeaderTimeout in http server
> 
>
> Key: YUNIKORN-2182
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2182
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common, webapp
>Reporter: Wilfred Spiegelenburg
>Assignee: Chenchen Lai
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.6.0
>
>
> Potential Slowloris Attack because ReadHeaderTimeout is not configured in the 
> http.Server (gosec)
> We do not set ReadTimeout or ReadHeaderTimeout so we do not have a timeout at 
> all at the moment.
> BTW: this is not important for the webtest servers we build as they are just 
> for our tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2568) Move all xxxEvents types to objects/events

2024-07-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2568.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Move all xxxEvents types to objects/events
> --
>
> Key: YUNIKORN-2568
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2568
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2564) [Umbrella] Move xxxEvents types to a different package

2024-07-02 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2564.

Fix Version/s: 1.6.0
   Resolution: Fixed

> [Umbrella] Move xxxEvents types to a different package
> --
>
> Key: YUNIKORN-2564
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2564
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.6.0
>
>
> There are several Events that can be moved to a different package:
> * queueEvents
> * applicationEvents
> * askEvents
> * nodeEvents
> There are numerous files in {{pkg/scheduler/objects}}. This is an opportunity 
> to clean it up a bit and move these under eg. 
> {{pkg/scheduler/objects/events}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2708) Release notes for 1.5.2

2024-06-28 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2708:
--

 Summary: Release notes for 1.5.2
 Key: YUNIKORN-2708
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2708
 Project: Apache YuniKorn
  Issue Type: Sub-task
Reporter: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2709) Update website for 1.5.2

2024-06-28 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2709:
--

 Summary: Update website for 1.5.2
 Key: YUNIKORN-2709
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2709
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: release
Reporter: Peter Bacsko
Assignee: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2706) [UMBRELLA] YuniKorn 1.5.2 release efforts

2024-06-28 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2706:
--

 Summary: [UMBRELLA] YuniKorn 1.5.2 release efforts
 Key: YUNIKORN-2706
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2706
 Project: Apache YuniKorn
  Issue Type: Task
  Components: release
Reporter: Peter Bacsko
Assignee: Peter Bacsko


This umbrella is to track the work items needed for the 1.5.2 release.

Release manager: Peter Bacsko.

This release only contains bug fixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2707) Tagging for 1.5.2

2024-06-28 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2707:
--

 Summary: Tagging for 1.5.2
 Key: YUNIKORN-2707
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2707
 Project: Apache YuniKorn
  Issue Type: Sub-task
Reporter: Peter Bacsko
Assignee: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2704) Event publish errors out when predicates fail

2024-06-28 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2704.

Fix Version/s: 1.6.0
   1.5.2
   Resolution: Fixed

Merged to master & branch-1.5

> Event publish errors out when predicates fail
> -
>
> Key: YUNIKORN-2704
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2704
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Mit Desai
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.6.0, 1.5.2
>
>
> I consistently see this error in the logs when events are published.
> I did put some debug logs and found that I only get it when the events for 
> untolerated taints are published.
> E0618 17:43:17.858946       1 event_broadcaster.go:270] "Server rejected 
> event (will not retry!)" err="Event \"<>.17da2a31072bb32f\" is 
> invalid: [action: Required value, reason: Required value]" 
> event="&Event\{ObjectMeta:{<>.17da2a31072bb32f  dpi-dev    0 
> 0001-01-01 00:00:00 + UTC   map[] map[] [] [] 
> []},EventTime:2024-06-18 17:43:17.857332069 + UTC 
> m=+84279.014490005,Series:nil,ReportingController:yunikorn,ReportingInstance:yunikorn-yunikorn-scheduler-59bdc88fdc-7h5bt,Action:,Reason:,Regarding:\{Pod
>  <> <> 5c90315c-a07d-4801-9ecc-baf61ee45f11 v1 
> 4323324038 },Related:nil,Note:Predicate failed for request 
> '5c90315c-a07d-4801-9ecc-baf61ee45f11' with message: 'node(s) had untolerated 
> taint \{<>: <>}',Type:Normal,DeprecatedSource:\{ 
> },DeprecatedFirstTimestamp:0001-01-01 00:00:00 + 
> UTC,DeprecatedLastTimestamp:0001-01-01 00:00:00 + UTC,DeprecatedCount:0,}"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2694) Improve placement rule funtion's test coverage - 2

2024-06-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2694.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Improve placement rule funtion's test coverage - 2
> --
>
> Key: YUNIKORN-2694
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2694
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2683) Unnecessary error is logged when resource usage is increased

2024-06-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2683.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Unnecessary error is logged when resource usage is increased
> 
>
> Key: YUNIKORN-2683
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2683
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The refactored code in YUNIKORN-2542 contains an unnecessary warning message:
> {noformat}
>   appGroup := userTracker.getGroupForApp(applicationID)
>   log.Log(log.SchedUGM).Debug("Increasing resource usage for user",
>   zap.String("user", user.User),
>   zap.String("queue path", queuePath),
>   zap.String("application", applicationID),
>   zap.String("group", appGroup),
>   zap.Stringer("resource", usage))
>   groupTracker := m.GetGroupTracker(appGroup)
>   if groupTracker == nil {
>   log.Log(log.SchedUGM).Error("group tracker should be available 
> in groupTrackers map",
>   zap.String("application", applicationID),
>   zap.String("group", appGroup))
>   return
>   }
> ...
> {noformat}
> We don't always have a {{groupTracker}}. The previous code simply called 
> {{increaseTrackedResource()}} on an empty tracker:
> {noformat}
> func (ut *UserTracker) increaseTrackedResource(queuePath string, 
> applicationID string, usage *resources.Resource) {
>   ut.Lock()
>   defer ut.Unlock()
>   ut.events.sendIncResourceUsageForUser(ut.userName, queuePath, usage)
>   hierarchy := strings.Split(queuePath, configs.DOT)
>   ut.queueTracker.increaseTrackedResource(hierarchy, applicationID, user, 
> usage)
>   gt := ut.appGroupTrackers[applicationID]
>   log.Log(log.SchedUGM).Debug("Increasing resource usage for group",
>   zap.String("group", gt.getName()),
>   zap.Strings("queue path", hierarchy),
>   zap.String("application", applicationID),
>   zap.Stringer("resource", usage))
>   gt.increaseTrackedResource(queuePath, applicationID, usage, 
> ut.userName) <- can be null
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2661) Fix hard-coded boolean in setLimit

2024-06-24 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2661.

Fix Version/s: 1.6.0
   1.5.2
   Resolution: Fixed

Merged to master & branch-1.5

> Fix hard-coded boolean in setLimit
> --
>
> Key: YUNIKORN-2661
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2661
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> Inside the UGM code {{setLimit()}}, we don't pass down {{doWildcardCheck}}, 
> so this variables never reaches the leafs:
> {noformat}
> / Note: Lock free call. The Lock of the linked tracker (UserTracker and 
> GroupTracker) should be held before calling this function.
> func (qt *QueueTracker) setLimit(hierarchy []string, maxResource 
> *resources.Resource, maxApps uint64, useWildCard bool, trackType 
> trackingType, doWildCardCheck bool) {
>   log.Log(log.SchedUGM).Debug("Setting limits",
>   zap.String("queue path", qt.queuePath),
>   zap.Strings("hierarchy", hierarchy),
>   zap.Uint64("max applications", maxApps),
>   zap.Stringer("max resources", maxResource),
>   zap.Bool("use wild card", useWildCard))
>   // depth first: all the way to the leaf, create if not exists
>   // more than 1 in the slice means we need to recurse down
>   if len(hierarchy) > 1 {
>   childName := hierarchy[1]
>   if qt.childQueueTrackers[childName] == nil {
>   qt.childQueueTrackers[childName] = 
> newQueueTracker(qt.queuePath, childName, trackType)
>   }
>   qt.childQueueTrackers[childName].setLimit(hierarchy[1:], 
> maxResource, maxApps, useWildCard, trackType, false)  <-- should be 
> "doWildCardCheck" not "false"
> ...
> {noformat}
> Fix this and create a unit test for {{setLimit()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2516) Update documentation about event.RESTResponseSize

2024-06-21 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2516.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Update documentation about event.RESTResponseSize
> -
>
> Key: YUNIKORN-2516
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2516
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2512) Event system properties are not used

2024-06-21 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2512.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Event system properties are not used
> 
>
> Key: YUNIKORN-2512
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2512
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.6.0
>
>
> There two properties which are not used by the event system:
> # The property "event.requestCapacity" is supposed to determine the size of a 
> slice which is used between the core and shim to transfer events in every 2 
> seconds. However, right now it's not used at all, we use the default (1000) 
> every time.
> # The property "RESTResponseSize" is not even in the code at all. It 
> influences the maximum number of entries returned in the batch API. 
> Currently, the hard coded value is 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2245) Application sorting: improve pending resource filtering

2024-06-21 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2245.

Resolution: Won't Do

> Application sorting: improve pending resource filtering
> ---
>
> Key: YUNIKORN-2245
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2245
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>
> When sorting applications, we do a filtering on pending resources:
> {noformat}
> func filterOnPendingResources(apps map[string]*Application) []*Application {
>   filteredApps := make([]*Application, 0)
>   for _, app := range apps {
>   // Only look at app when pending-res > 0
>   if resources.StrictlyGreaterThanZero(app.GetPendingResource()) {
>   filteredApps = append(filteredApps, app)
>   }
>   }
>   return filteredApps
> }
> {noformat}
> This filtering is relatively expensive, but necessary, because during the 
> lifecycle of an application, {{sa.pending}} can become 0 and in this case, we 
> don't want to schedule anything from the app.
> Suggested approach is to track total pendingAskRepeats inside the app. That 
> way we don't need to call {{resources.StrictlyGreaterThanZero()}} and we 
> perform a simple integer comparison.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2221) Performance improvements phase II

2024-06-21 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko closed YUNIKORN-2221.
--

> Performance improvements phase II
> -
>
> Key: YUNIKORN-2221
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2221
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler, shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
> Fix For: 1.5.0
>
>
> Umbrella JIRA for further performance improvements in Yunikorn.
> The main issues have been addressed in YUNIKORN-1715. However, it's still 
> possible to reduce memory and CPU usage further by doing smaller things.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2221) Performance improvements phase II

2024-06-21 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2221.

Fix Version/s: 1.5.0
   Resolution: Fixed

> Performance improvements phase II
> -
>
> Key: YUNIKORN-2221
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2221
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler, shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
> Fix For: 1.5.0
>
>
> Umbrella JIRA for further performance improvements in Yunikorn.
> The main issues have been addressed in YUNIKORN-1715. However, it's still 
> possible to reduce memory and CPU usage further by doing smaller things.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2653) Gang scheduling K8s event formatting compliance

2024-06-19 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2653.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Gang scheduling K8s event formatting compliance
> ---
>
> Key: YUNIKORN-2653
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2653
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The K8s events provide definitions and rules around the content of the fields 
> within the event. Adjust the content of gang scheduling related events to 
> comply with the rules.
> Focussed on the reason and action fields only.
>   * 'reason' is the reason this event is generated. 'reason' should be short 
> and unique; it should be in UpperCamelCase format (starting with a capital 
> letter). 
>  * 'action' explains what happened with regarding/ what action did the 
> ReportingController take in objects name; it should be in UpperCamelCase 
> format (starting with a capital letter). 
> No space or long text.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2683) Unnecessary error is logged when resource usage is increased

2024-06-19 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2683:
--

 Summary: Unnecessary error is logged when resource usage is 
increased
 Key: YUNIKORN-2683
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2683
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Reporter: Peter Bacsko


The refactored code in YUNIKORN-2542 contains an unnecessary warning message:

{noformat}
appGroup := userTracker.getGroupForApp(applicationID)
log.Log(log.SchedUGM).Debug("Increasing resource usage for user",
zap.String("user", user.User),
zap.String("queue path", queuePath),
zap.String("application", applicationID),
zap.String("group", appGroup),
zap.Stringer("resource", usage))
groupTracker := m.GetGroupTracker(appGroup)
if groupTracker == nil {
log.Log(log.SchedUGM).Error("group tracker should be available 
in groupTrackers map",
zap.String("application", applicationID),
zap.String("group", appGroup))
return
}
...
{noformat}

We don't always have a {{groupTracker}}. The previous code simply called 
{{increaseTrackedResource()}} on an empty tracker:

{noformat}
func (ut *UserTracker) increaseTrackedResource(queuePath string, applicationID 
string, usage *resources.Resource) {
ut.Lock()
defer ut.Unlock()
ut.events.sendIncResourceUsageForUser(ut.userName, queuePath, usage)
hierarchy := strings.Split(queuePath, configs.DOT)
ut.queueTracker.increaseTrackedResource(hierarchy, applicationID, user, 
usage)
gt := ut.appGroupTrackers[applicationID]
log.Log(log.SchedUGM).Debug("Increasing resource usage for group",
zap.String("group", gt.getName()),
zap.Strings("queue path", hierarchy),
zap.String("application", applicationID),
zap.Stringer("resource", usage))
gt.increaseTrackedResource(queuePath, applicationID, usage, 
ut.userName) <- can be null
}
{noformat}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2680) Improve placement rule funtion's test coverage

2024-06-18 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2680.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Improve placement rule funtion's test coverage
> --
>
> Key: YUNIKORN-2680
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2680
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2681) Data race in TestGetStream_Limit

2024-06-18 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2681:
--

 Summary: Data race in TestGetStream_Limit
 Key: YUNIKORN-2681
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2681
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler, test - unit
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Data race was detected during an unit test:

{noformat}
==
WARNING: DATA RACE
Write at 0x0170c220 by goroutine 2575:
  github.com/apache/yunikorn-core/pkg/webservice.NewWebApp()
  
/home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/webservice.go:82 
+0x11c
  github.com/apache/yunikorn-core/pkg/webservice.TestCheckHealthStatusNotFound()
  
/home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2574
 +0x2f
  testing.tRunner()
  /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
  /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x44

Previous read at 0x0170c220 by goroutine 2542:
  github.com/apache/yunikorn-core/pkg/webservice.getStream()
  
/home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers.go:1225 
+0xbd3
  github.com/apache/yunikorn-core/pkg/webservice.TestGetStream_Limit.gowrap4()
  
/home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2308
 +0x4f

Goroutine 2575 (running) created at:
  testing.(*T).Run()
  /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
  /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2161 +0x85
  testing.tRunner()
  /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e
  testing.runTests()
  /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
  /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:2027 +0xf17
  main.main()
  _testmain.go:163 +0x2e4

Goroutine 2542 (running) created at:
  github.com/apache/yunikorn-core/pkg/webservice.TestGetStream_Limit()
  
/home/runner/work/yunikorn-core/yunikorn-core/pkg/webservice/handlers_test.go:2308
 +0xbb7
  testing.tRunner()
  /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
  /opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x44
==
2024-06-18T13:40:54.182ZINFOcore.events 
events/event_streaming.go:164   Removing event stream consumer  {"name": 
"host-1", "creation time": "2024-06-18T13:40:54.181Z"}
2024-06-18T13:40:54.182ZINFOcore.scheduler.health   
webservice/handlers.go:623  Health check is not available
--- FAIL: TestCheckHealthStatusNotFound (0.00s)
testing.go:1398: race detected during execution of test
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2673) Improve newFilter funtion's test coverage in filter.go

2024-06-18 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2673.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Improve newFilter funtion's test coverage in filter.go
> --
>
> Key: YUNIKORN-2673
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2673
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2515) Add property event.RESTResponseSize to the batch event handler

2024-06-12 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2515.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Add property event.RESTResponseSize to the batch event handler
> --
>
> Key: YUNIKORN-2515
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2515
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2670) Improve util funtion's test coverage

2024-06-11 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2670.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Improve util funtion's test coverage
> 
>
> Key: YUNIKORN-2670
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2670
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Improve the following funtion's test coverage in util.go
>  * ZeroTimeInUnixNano
>  * GetNewUUID
>  * IsRecoveryQueue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2669) nil pointer dereference error

2024-06-09 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2669.

Resolution: Duplicate

This looks like a dup of YUNIKORN-2562. The solution for this has been 
delivered in 1.5.1. It's also on master.

> nil pointer dereference error
> -
>
> Key: YUNIKORN-2669
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2669
> Project: Apache YuniKorn
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Junyoung Park
>Assignee: Peter Bacsko
>Priority: Major
>
> Environment: AWS EKS 1.26
> yunikorn-scheduler logs
> {code:java}
> panic: runtime error: invalid memory address or nil pointer 
> dereference[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 
> pc=0x179b2f5]
> goroutine 50 
> [running]:github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).ReplaceAllocation(0xc000661000,
>  {0xc008ad14a0, 0x24}) 
> github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/objects/application.go:1739
>  
> +0x615github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation(0xc00046a100?,
>  0xc01436c880)
> github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/partition.go:1281 
> +0x27fgithub.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases(0xc000502680?,
>  {0xc02014da60, 0x1, 0xc0112f5ee8?}, {0xc0060f8980, 0xb})
> github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/context.go:868 
> +0x9egithub.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent(0xc00046a100?,
>  0xc0145e8eb0?)  
> github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/context.go:750 
> +0xa5github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent(0xc000120990)
>
> github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/scheduler.go:111 
> +0x16ecreated by 
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService in 
> goroutine 1 
> github.com/apache/yunikorn-core@v1.4.0-1/pkg/scheduler/scheduler.go:55 +0x9c 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2637) finalizePods should ignore pods like registerPods does

2024-06-07 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2637.

Fix Version/s: 1.6.0
   1.5.2
   Resolution: Fixed

Merged to master & branch-1.5.

> finalizePods should ignore pods like registerPods does
> --
>
> Key: YUNIKORN-2637
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2637
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> The initialisation code is a two step process for pods: first list all pods 
> and add them to the system in registerPods(). This returns a list of pods 
> processed.
> The second step happens after event handlers are turned on and nodes have 
> been cleaned up etc. During the second step pods from the first step are 
> checked and removed. However pods that were already in a terminated state in 
> step 1 get removed again. Although the step should be idempotent this is 
> unneeded. When iterating over the existing pods any pod in a terminal state 
> should be skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2668) Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails

2024-06-07 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2668.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails 
> 
>
> Key: YUNIKORN-2668
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2668
> Project: Apache YuniKorn
>  Issue Type: Task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The test case TestUpdateAllocation_NewTask_AssumePodFails occasionally fails 
> due to a deadlock problem described in YUNIKORN-2629. Until that ticket is 
> resolved, let's disable this test for the time being, so upstream tests don't 
> fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2668) Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails

2024-06-07 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2668:
--

 Summary: Temporarily disable 
TestUpdateAllocation_NewTask_AssumePodFails 
 Key: YUNIKORN-2668
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2668
 Project: Apache YuniKorn
  Issue Type: Task
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The test case TestUpdateAllocation_NewTask_AssumePodFails occasionally fails 
due to a deadlock problem described in YUNIKORN-2629. Until that ticket is 
resolved, let's disable this test for the time being, so upstream tests don't 
fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2561) Support topology spread constraints on placeholder pods

2024-06-06 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2561.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Support topology spread constraints on placeholder pods
> ---
>
> Key: YUNIKORN-2561
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2561
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Jacob Salway
>Assignee: Jacob Salway
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> If a pod has a topology spread constraint with a `whenUnsatisfiable: 
> DoNotSchedule` constraint and is used as part of a task group, it is not 
> possible to pass the constraint to the placeholder pods created by Yunikorn.
> This can result in placeholder pods being placed on a node that would violate 
> the original pod's topology spread constraint.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2643) utils.go WaitForCondition improvement

2024-06-06 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2643.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master. Thanks [~mean-world] for the contribution.

> utils.go WaitForCondition improvement 
> --
>
> Key: YUNIKORN-2643
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2643
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: HUAN-IU LIOU
>Assignee: HUAN-IU LIOU
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2663) Improve ACL struct funtion's test coverage

2024-06-06 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2663.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Improve ACL struct funtion's test coverage
> --
>
> Key: YUNIKORN-2663
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2663
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Remove unreachable code in NewACL func
> Improve the following funtion's test coverage in acl.go
>  * TestSetUsers
>  * TestSetGroups



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2666) Fix DeepEqual comparison in Test_fixedRule_ruleDAO

2024-06-06 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2666.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Fix DeepEqual comparison in Test_fixedRule_ruleDAO 
> ---
>
> Key: YUNIKORN-2666
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2666
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler, test - unit
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The test case {{Test_fixedRule_ruleDAO/filter}} can randomly fail due to the 
> non-deterministic nature of map key iteration:
> {noformat}
> fixed_rule_test.go:285: assertion failed: 
> --- tt.want
> +++ ruleDAO
>   &dao.RuleDAO{
>   Name:   "fixed",
>   Parameters: {"create": "true", "qualified": "false", 
> "queue": "default"},
>   Filter: &dao.FilterDAO{
>   Type: "allow",
>   UserList: nil,
>   GroupList: []string{
> - "group1",
> + "group2",
> - "group2",
> + "group1",
>   },
>   UserExp:  "",
>   GroupExp: "",
>   },
>   ParentRule: nil,
>   }
> {noformat}
> We use {{maps.Keys()}} when we create the user list and group list in 
> {{FilterDAO}}. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2666) Fix DeepEqual comparison in Test_fixedRule_ruleDAO

2024-06-06 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2666:
--

 Summary: Fix DeepEqual comparison in Test_fixedRule_ruleDAO 
 Key: YUNIKORN-2666
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2666
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler, test - unit
Reporter: Peter Bacsko


The test case {{Test_fixedRule_ruleDAO/filter}} can randomly fail due to the 
non-deterministic nature of map key iteration:

{noformat}
fixed_rule_test.go:285: assertion failed: 
--- tt.want
+++ ruleDAO
  &dao.RuleDAO{
Name:   "fixed",
Parameters: {"create": "true", "qualified": "false", "queue": 
"default"},
Filter: &dao.FilterDAO{
Type: "allow",
UserList: nil,
GroupList: []string{
-   "group1",
+   "group2",
-   "group2",
+   "group1",
},
UserExp:  "",
GroupExp: "",
},
ParentRule: nil,
  }
{noformat}

We use {{maps.Keys()}} when we create the user list and group list in 
{{FilterDAO}}. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2650) Complete or remove web_server_test#TestProxy

2024-06-06 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2650.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Complete or remove web_server_test#TestProxy
> 
>
> Key: YUNIKORN-2650
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2650
> Project: Apache YuniKorn
>  Issue Type: Test
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> web_server_test has a empty test case: TestProxy [0]. It seems to me there is 
> proxy-related test [1].
> [0] 
> https://github.com/apache/yunikorn-k8shim/blob/58adfe941d2d8dae5544af8b49e435f304678807/pkg/webtest/web_server_test.go#L82
> [1] 
> https://github.com/apache/yunikorn-k8shim/blob/58adfe941d2d8dae5544af8b49e435f304678807/pkg/webtest/web_server_test.go#L73



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2514) Update documentation about event.requestCapacity

2024-06-05 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2514.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Update documentation about event.requestCapacity
> 
>
> Key: YUNIKORN-2514
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2514
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2654) Remove unused code in k8shim context

2024-06-04 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2654.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Remove unused code in k8shim context
> 
>
> Key: YUNIKORN-2654
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2654
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Chenchen Lai
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 1.6.0
>
>
> The NotifyApplicationComplete and NotifyApplicationFail  function are not 
> called by anything and are unused code.
> The K8shim does not trigger the application completion or failure. This is 
> triggered by the core when the application no longer has any activity 
> registered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2647) Flaky test TestUpdateNodeCapacity

2024-06-04 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2647.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Flaky test TestUpdateNodeCapacity
> -
>
> Key: YUNIKORN-2647
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2647
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: test - unit
>Reporter: Wilfred Spiegelenburg
>Assignee: Tseng Hsi-Huang
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 1.6.0
>
>
> Same as we saw in YUNIKORN-2573 the single node update test might fail:
> {code:java}
> --- FAIL: TestUpdateNodeCapacity (0.03s)
>     operation_test.go:446: Expected partition resource map[memory:1 
> vcore:2], doesn't match with actual partition resource 
> map[memory:1 vcore:2]{code}
> We calculate the delta resources when updating node capacity with that delta 
> we update resources in partition.
> The test would fail with following order same as for multiple nodes
> node.SetCapacity() -> waitForAvailableNodeResource() ->  
> partitionInfo.GetTotalPartitionResource()  -> 
> partition.updatePartitionResource()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2659) Improve config validator funtion's test coverage

2024-06-04 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2659.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Improve config validator funtion's test coverage
> 
>
> Key: YUNIKORN-2659
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2659
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Improve the following funtion's test coverage in configvalidator.go
>  * checkPlacementRule 
>  * checkLimitResource 
>  * checkLimit 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2661) Fix hard-coded boolean in setLimit

2024-06-03 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2661:
--

 Summary: Fix hard-coded boolean in setLimit
 Key: YUNIKORN-2661
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2661
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Inside the UGM code {{setLimit()}}, we don't pass down {{doWildcardCheck}}, so 
this variables never reaches the leafs:

{noformat}
/ Note: Lock free call. The Lock of the linked tracker (UserTracker and 
GroupTracker) should be held before calling this function.
func (qt *QueueTracker) setLimit(hierarchy []string, maxResource 
*resources.Resource, maxApps uint64, useWildCard bool, trackType trackingType, 
doWildCardCheck bool) {
log.Log(log.SchedUGM).Debug("Setting limits",
zap.String("queue path", qt.queuePath),
zap.Strings("hierarchy", hierarchy),
zap.Uint64("max applications", maxApps),
zap.Stringer("max resources", maxResource),
zap.Bool("use wild card", useWildCard))
// depth first: all the way to the leaf, create if not exists
// more than 1 in the slice means we need to recurse down
if len(hierarchy) > 1 {
childName := hierarchy[1]
if qt.childQueueTrackers[childName] == nil {
qt.childQueueTrackers[childName] = 
newQueueTracker(qt.queuePath, childName, trackType)
}
qt.childQueueTrackers[childName].setLimit(hierarchy[1:], 
maxResource, maxApps, useWildCard, trackType, false)
...
{noformat}

Fix this and create a unit test for {{setLimit()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2649) Improve CalculateAbsUsedCapacity & CompUsageRatio funtion's test coverage in resources.go

2024-05-31 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2649.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Improve CalculateAbsUsedCapacity & CompUsageRatio funtion's test coverage in 
> resources.go
> -
>
> Key: YUNIKORN-2649
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2649
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2581) Expose running placement rules in REST

2024-05-31 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2581.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Expose running placement rules in REST
> --
>
> Key: YUNIKORN-2581
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2581
> Project: Apache YuniKorn
>  Issue Type: New Feature
>  Components: core - common
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Since introducing the use of placement rules always and the recovery rule the 
> queue config does not correctly show the running rules.
> Also if a config update has been rejected, for any reason, the rules would 
> not be correct
> Exposing the configured rules from the placement manager works around all 
> these issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2646) Deadlock detected during preemption

2024-05-30 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2646.

Fix Version/s: 1.6.0
   1.5.2
   Resolution: Fixed

> Deadlock detected during preemption
> ---
>
> Key: YUNIKORN-2646
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2646
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Dmitry
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
> Attachments: yunikorn-logs-lock.txt.gz
>
>
> Hitting deadlocks in 1.5.1
> The log is attached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2542) Consistent logging and tracker handling for increment/decrement

2024-05-30 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2542.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master. Thanks [~Tseng Hsi-Huang] for the contribution.

> Consistent logging and tracker handling for increment/decrement
> ---
>
> Key: YUNIKORN-2542
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2542
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Tseng Hsi-Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We log DEBUG output and use {{GroupTracker}} inconsistently in {{Manager}} 
> and in {{UserTracker}}.
> Eg.
> {{Manager.IncreaseTrackedResource()}}: only a single log output with DEBUG 
> level
> {{Manager.DecreaseTrackedResource()}}: multiple log statements, also handles 
> the group tracker which is not the case with increments
> This also affects {{UserTracker}} - logs&GroupTracker handling are different 
> in {{increaseTrackedResource()}}/{{decreaseTrackedResource()}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2567) Remove Application reference from applicationEvents

2024-05-30 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2567.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Remove Application reference from applicationEvents
> ---
>
> Key: YUNIKORN-2567
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2567
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2642) Don't set resources on the recovery queue

2024-05-30 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2642.

Resolution: Fixed

> Don't set resources on the recovery queue
> -
>
> Key: YUNIKORN-2642
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2642
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> The resource constrainst can be set on dynamic queues based on application 
> tags. We should not set this on the recovery queue, because there's no quota 
> on them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2635) test coverage improvement: same priority case in sorter

2024-05-26 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2635.

Fix Version/s: 1.6.0
   Resolution: Fixed

> test coverage improvement: same priority case in sorter 
> 
>
> Key: YUNIKORN-2635
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2635
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - scheduler
>Reporter: Chen Yu Teng
>Assignee: Chen Yu Teng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2633) Unnecessary warning from Partition when adding an application

2024-05-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2633.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Unnecessary warning from Partition when adding an application
> -
>
> Key: YUNIKORN-2633
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2633
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The following is printed when adding an application:
> {noformat}
> 2024-05-17T21:53:04.716+0200  WARNcore.scheduler.queue
> scheduler/partition.go:344  Trying to set resources on a queue that is 
> not an unmanaged leaf{"queueName": "root.default"}
> {noformat}
> This message is supposed to be printed when the application defines a 
> guaranteed or max resource. After YUNIKORN-2547 it's always printed if the 
> queue is managed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2642) Don't set resources on the recovery queue

2024-05-24 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2642:
--

 Summary: Don't set resources on the recovery queue
 Key: YUNIKORN-2642
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2642
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The resource constrainst can be set on dynamic queues based on application 
tags. We should not set this on the recovery queue, because there's no quota on 
them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2566) Remove AllocationAsk reference from askEvents

2024-05-23 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2566.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Remove AllocationAsk reference from askEvents
> -
>
> Key: YUNIKORN-2566
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2566
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2565) Remove Node reference from nodeEvents

2024-05-23 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2565.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Remove Node reference from nodeEvents
> -
>
> Key: YUNIKORN-2565
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2565
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2618) Streamline AsyncRMCallback UpdateAllocation

2024-05-22 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2618.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Streamline AsyncRMCallback UpdateAllocation
> ---
>
> Key: YUNIKORN-2618
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2618
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Yun Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> if task is not found, a nil is returned from {{context.getTask}} in  for 
> {{response.New}} processing we should just log that fact and proceed to the 
> next alloc. Simplifies the flow as we never need to check for a. nil task. We 
> should never have a pod in the cache that does not exist as a task on an 
> application.
> We retrieve the application using the application ID from the response to 
> never use the object. We only use the application ID to pass into an event. 
> The context event handler then does the exact same lookup again to process 
> the event on the app.
> We need to become much smarter in this area, double or triple lookups, 
> generate async events that just change the state of the app or task or kick 
> off another event.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2611) [UMBRELLA] YuniKorn 1.5.1 release efforts

2024-05-22 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2611.

Fix Version/s: 1.5.1
   Resolution: Fixed

> [UMBRELLA] YuniKorn 1.5.1 release efforts
> -
>
> Key: YUNIKORN-2611
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2611
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
> Fix For: 1.5.1
>
>
> This umbrella is to track the work items needed for 1.5.0 release.
> Release manager: Peter Bacsko.
> This release only consists of bug fixes. Use the filter 
> [https://issues.apache.org/jira/issues/?filter=12353383] to see the list of 
> deliverables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2614) Update website for 1.5.1

2024-05-22 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2614.

 Fix Version/s: 1.5.1
Target Version: 1.5.1
Resolution: Fixed

> Update website for 1.5.1
> 
>
> Key: YUNIKORN-2614
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2614
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2639) Clarify release procedure for minor releases

2024-05-21 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2639:
--

 Summary: Clarify release procedure for minor releases
 Key: YUNIKORN-2639
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2639
 Project: Apache YuniKorn
  Issue Type: Task
  Components: release
Reporter: Peter Bacsko


After the release of 1.5.1, we realized that we need to properly define the 
release process for a minor release. This needs to be properly documented.

The clarification should cover things like:
# What it can and can't include (no features/bugfixes only)
# How to publish docs? Shall we keep the current "a.b.c" version on the website 
or remove it and publish "a.b.c+1"?
# Communication: possible difference in release notes, announcement, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2633) Unnecessary warning from Partition when adding an application

2024-05-17 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2633:
--

 Summary: Unnecessary warning from Partition when adding an 
application
 Key: YUNIKORN-2633
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2633
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The following is printed when adding an application:

{noformat}
2024-05-17T21:53:04.716+0200WARNcore.scheduler.queue
scheduler/partition.go:344  Trying to set resources on a queue that is not 
an unmanaged leaf{"queueName": "root.default"}
{noformat}

This message is supposed to be printed when the application defines a 
guaranteed or max resource. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2613) Release notes for 1.5.1

2024-05-17 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2613.

Fix Version/s: 1.5.1
   Resolution: Fixed

> Release notes for 1.5.1
> ---
>
> Key: YUNIKORN-2613
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2613
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2632) Data race in IncAllocatedResource

2024-05-17 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2632.

Fix Version/s: 1.6.0
   1.5.2
   Resolution: Fixed

> Data race in IncAllocatedResource
> -
>
> Key: YUNIKORN-2632
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2632
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> After YUNIKORN-2548, we accidentally make an unlocked access to 
> \{{Queue.allocatedResource}}.
> {noformat}
> WARNING: DATA RACE
> Read at 0x00c000578a00 by goroutine 52:
>   
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).IncAllocatedResource()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/queue.go:1032
>  +0x6b
>   
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNode()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/application.go:1495
>  +0x184
>   
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNodes.func1()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/application.go:1402
>  +0x144
>   
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*treeIterator).ForEachNode.func1()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/node_iterator.go:42
>  +0x95
>   github.com/google/btree.(*node[go.shape.interface { 
> Less(github.com/google/btree.Item) bool }]).iterate()
>   
> /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:522 
> +0x6f1
>   github.com/google/btree.(*node[go.shape.interface { 
> Less(github.com/google/btree.Item) bool }]).iterate()
>   
> /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 
> +0x448
>   github.com/google/btree.(*node[go.shape.interface { 
> Less(github.com/google/btree.Item) bool }]).iterate()
>   
> /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 
> +0x448
>   github.com/google/btree.(*node[go.shape.interface { 
> Less(github.com/google/btree.Item) bool }]).iterate()
>   
> /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 
> +0x448
>   github.com/google/btree.(*BTreeG[go.shape.interface { 
> Less(github.com/google/btree.Item) bool }]).Ascend()
>   
> /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:779 
> +0x108
>   github.com/google/btree.(*BTree).Ascend()
>   
> /home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:1029 
> +0x108
>   
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*treeIterator).ForEachNode()
> ...
> Previous write at 0x00c000578a00 by goroutine 49:
>   
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).DecAllocatedResource()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/queue.go:1101
>  +0x212
>   
> github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/partition.go:1357
>  +0x17b4
>   
> github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/context.go:870
>  +0xba
>   
> github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/context.go:750
>  +0x1e4
>   github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/scheduler.go:133
>  +0x28d
>   
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.gowrap1()
>   
> /home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/scheduler.go:60
>  +0x33
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2632) Data race in IncAllocatedResource

2024-05-17 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2632:
--

 Summary: Data race in IncAllocatedResource
 Key: YUNIKORN-2632
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2632
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


After YUNIKORN-2548, we accidentally make an unlocked access to 
\{{Queue.allocatedResource}}.

{noformat}
WARNING: DATA RACE
Read at 0x00c000578a00 by goroutine 52:
  
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).IncAllocatedResource()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/queue.go:1032
 +0x6b
  github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNode()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/application.go:1495
 +0x184
  
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNodes.func1()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/application.go:1402
 +0x144
  
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*treeIterator).ForEachNode.func1()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/node_iterator.go:42
 +0x95
  github.com/google/btree.(*node[go.shape.interface { 
Less(github.com/google/btree.Item) bool }]).iterate()
  
/home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:522 
+0x6f1
  github.com/google/btree.(*node[go.shape.interface { 
Less(github.com/google/btree.Item) bool }]).iterate()
  
/home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 
+0x448
  github.com/google/btree.(*node[go.shape.interface { 
Less(github.com/google/btree.Item) bool }]).iterate()
  
/home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 
+0x448
  github.com/google/btree.(*node[go.shape.interface { 
Less(github.com/google/btree.Item) bool }]).iterate()
  
/home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:510 
+0x448
  github.com/google/btree.(*BTreeG[go.shape.interface { 
Less(github.com/google/btree.Item) bool }]).Ascend()
  
/home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:779 
+0x108
  github.com/google/btree.(*BTree).Ascend()
  
/home/bacskop/go/pkg/mod/github.com/google/btree@v1.1.2/btree_generic.go:1029 
+0x108
  
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*treeIterator).ForEachNode()
...
Previous write at 0x00c000578a00 by goroutine 49:
  
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).DecAllocatedResource()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/objects/queue.go:1101
 +0x212
  
github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/partition.go:1357
 +0x17b4
  
github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/context.go:870
 +0xba
  
github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/context.go:750
 +0x1e4
  github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/scheduler.go:133
 +0x28d
  
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.gowrap1()
  
/home/bacskop/go/pkg/mod/github.com/apache/yunikorn-core@v1.5.1-1/pkg/scheduler/scheduler.go:60
 +0x33
 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



  1   2   3   4   5   6   >