[jira] [Updated] (YUNIKORN-2709) Update website for 1.5.2

2024-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2709:
-
Labels: pull-request-available  (was: )

> Update website for 1.5.2
> 
>
> Key: YUNIKORN-2709
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2709
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2708) Release notes for 1.5.2

2024-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2708:
-
Labels: pull-request-available release  (was: release)

> Release notes for 1.5.2
> ---
>
> Key: YUNIKORN-2708
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2708
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available, release
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2712) Missing specific param error for REST API

2024-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2712:
-
Labels: newbie pull-request-available  (was: newbie)

> Missing specific param error for REST API
> -
>
> Key: YUNIKORN-2712
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2712
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> Some REST API's throw "missing specific param" kind of errors, but not all. 
> For example, user name is missing. Similarly, all mandatory parameters in 
> other REST API's can follow the same pattern. It is very clear, rather than 
> saying "doesn't exists" kind of error.
> Suggestion given in 
> [https://github.com/apache/yunikorn-core/pull/905#discussion_r1663068429] can 
> be used as reference for implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2524) add documentation for recovery queue (root.@recovery@)

2024-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2524:
-
Labels: newbie pull-request-available  (was: newbie)

> add documentation for recovery queue (root.@recovery@)
> --
>
> Key: YUNIKORN-2524
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2524
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Chia-Ping Tsai
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Major
>  Labels: newbie, pull-request-available
>
> the recovery queue is unqueryable directly but we can observe the recovery 
> queue name via app Restful API (`ws/v1/partition/%s/application/%s`).
> Hence, we should write documents for recovery queue. Otherwise, it would be 
> surprise to users when they see the incomprehensible queue and they get 
> nothing from our docs.
> some discussion on a pr review: 
> https://github.com/apache/yunikorn-site/pull/426#discussion_r1588788027



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2759) Replace %w by Errors.join

2024-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2759:
-
Labels: pull-request-available  (was: )

> Replace %w by Errors.join
> -
>
> Key: YUNIKORN-2759
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2759
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Major
>  Labels: pull-request-available
>
> original discussion: https://issues.apache.org/jira/browse/YUNIKORN-2262
> Errors.join can make the code more performant and readable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2771) Optimization: Use termination grace period of 0 seconds for placeholder pods

2024-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2771:
-
Labels: pull-request-available  (was: )

> Optimization: Use termination grace period of 0 seconds for placeholder pods
> 
>
> Key: YUNIKORN-2771
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2771
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>
> When we create placeholder pods for gang scheduling, we do not specify a 
> termination grace period, and therefore inherit the Kubernetes default of 30 
> seconds. This is unnecessary as the placeholders do not perform any logic and 
> therefore require no graceful termination.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2459) Core: Merge ask and allocation objects

2024-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2459:
-
Labels: pull-request-available  (was: )

> Core: Merge ask and allocation objects
> --
>
> Key: YUNIKORN-2459
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2459
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>
> Merge the Ask and Allocation objects into a single Allocation object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2770) Simplify Application.GetTask()

2024-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2770:
-
Labels: pull-request-available  (was: )

> Simplify Application.GetTask()
> --
>
> Key: YUNIKORN-2770
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2770
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>  Labels: pull-request-available
>
> {{Application.GetTask()}} returns a {{*Task}} and an {{error}}, but the 
> {{error}} is completely unnecessary. We either have the task for the given 
> taskID or we don't. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2766) Only generate event if all predicates failed

2024-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2766:
-
Labels: pull-request-available  (was: )

> Only generate event if all predicates failed
> 
>
> Key: YUNIKORN-2766
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2766
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>
> Right now, we send an event to the pod if a predicate failed:
> {noformat}
>if err := plugin.Predicates({
>   AllocationKey: allocationKey,
>   NodeID:sn.NodeID,
>   Allocate:  allocate,
>   }); err != nil {
>   log.Log(log.SchedNode).Debug("running predicates 
> failed",
>   zap.String("allocationKey", allocationKey),
>   zap.String("nodeID", sn.NodeID),
>   zap.Bool("allocateFlag", allocate),
>   zap.Error(err))
>   // running predicates failed
>   msg := err.Error()
>   ask.LogAllocationFailure(msg, allocate)
>   ask.SendPredicateFailedEvent(msg)
>   return false
>   }
> {noformat}
> This is, however, not correct. We should only generate an event if *all* 
> predicates have failed, which means that the pod cannot be scheduled. A 
> failing predicate for a given node can be perfectly normal in many cases.
> Instead, we should aggregate the failed predicates and send an event like:
> {noformat}
> All predicates failed for request '345d70d7-243a-4077-a9f8-0bb76c3532d7': 
> node(s) didn't match Pod's node affinity/selector (20x); node(s) had taints 
> that the pod didn't tolerate (5x)
> {noformat}
> where 20x and 5x tell how many times a certain predicate failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2696) appoint specific version when installing yunikorn

2024-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2696:
-
Labels: newbie pull-request-available  (was: newbie)

> appoint specific version when installing yunikorn
> -
>
> Key: YUNIKORN-2696
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2696
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Chen Yu Teng
>Assignee: Lyu Bo Cian
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> In get started doc, image tags are latest which is not available on docker 
> hub.
> Need to update helm chart via helm upgrade.
>   
> helm upgrade -f custom.yml  --install yunikorn yunikorn/yunikorn -n yunikorn 
> --create-namespace
> ```yml
> image:
>   tag: scheduler-1.5.1
> admissionController:
>   image:
>     tag: admission-1.5.1
> web:
>   image:
>     tag: web-1.5.1
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2765) Improve si_helper & resource funtion's test coverage

2024-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2765:
-
Labels: pull-request-available  (was: )

> Improve si_helper & resource funtion's test coverage
> 
>
> Key: YUNIKORN-2765
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2765
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: shim - kubernetes
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>
> Improve the following funtion's test coverage
>  * GetTerminationTypeFromString (unknow terminationtype)
>  * getMaxResource (requested resource types are fewer than allocated types)
>  * GetResource
>  * GetTGResource



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2354) Visualize the current queue that YuniKorn is using

2024-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2354:
-
Labels: pull-request-available  (was: )

> Visualize the current queue that YuniKorn is using
> --
>
> Key: YUNIKORN-2354
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2354
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Dong-Lin Hsieh
>Assignee: Dong-Lin Hsieh
>Priority: Major
>  Labels: pull-request-available
>
> # another tab page
>  # additional queue info (running applicaitons)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2763) add the documentation of REST API for specific queue

2024-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2763:
-
Labels: pull-request-available  (was: )

> add the documentation of REST API for specific queue
> 
>
> Key: YUNIKORN-2763
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2763
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation, website
>Reporter: Chia-Ping Tsai
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: pull-request-available
>
> The new call will be used by e2e (see YUNIKORN-2713), and hence it is worth 
> having the documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2262) propagate the error message when queue creation gets failed

2024-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2262:
-
Labels: pull-request-available  (was: )

> propagate the error message when queue creation gets failed
> ---
>
> Key: YUNIKORN-2262
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2262
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Minor
>  Labels: pull-request-available
>
> [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334]
> the error message of root cause is swallowed, so it is hard to be inspired by 
> the common message "failed to create rule based queue ..."
> BTW, the error I met is the parent queue "is already a leaf". The error 
> message is helpful and it makes us catch up the root cause easily.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2760) `make tools` should check the version of tools

2024-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2760:
-
Labels: pull-request-available  (was: )

> `make tools` should check the version of tools
> --
>
> Key: YUNIKORN-2760
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2760
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: pull-request-available
>
> Makefile, by default, checks only the existence of file. Hence, developers 
> need to remove tools folder (or call `make distclean`) manually to trigger 
> the installation after we update the version of tools.
> However, how developers can be aware of the tools updates? Personally, I 
> smell fishy from the error of warning, but that could be implicit and noisy 
> :cry
> In order to fix that, I'd like to introduce the new folder structure to tools 
> folder: 
> {code:java}
> /tools/{tool_name}-{version}
> {code}
>  That offers a unique path to each version of tool. Developers will not miss 
> the updates anymore.
> *rejected proposal*
> {code:java}
> /tools/{tool_name}/{version}
> {code}
>  That offers a unique path to each version of tool. Developers will not miss 
> the updates anymore.
> NOTED: we need to remove the existent tool binary if there is naming conflict 
> in creating the new path. For example, creating /tools/golangci-lint/1.57.2 
> will fail if /tools/golangci-lint is a existent file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2761) Explain preemption storm in usage doc

2024-07-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2761:
-
Labels: pull-request-available  (was: )

> Explain preemption storm in usage doc
> -
>
> Key: YUNIKORN-2761
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2761
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: website
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2762) Improve util funtion's test coverage

2024-07-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2762:
-
Labels: pull-request-available  (was: )

> Improve util funtion's test coverage
> 
>
> Key: YUNIKORN-2762
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2762
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: shim - kubernetes
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>
> Improve the following function unit test in util.go
>  *  IsPluginMode
>  * Convert2ConfigMap
>  * IsPodRunning
>  * GetNamespaceQuotaFromAnnotation (JSON Unmarshal error case)
>  * WaitForCondition
>  * GetCoreSchedulerConfigFromConfigMap (NotMapping file case)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2713) Use queue specific REST API directly

2024-07-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2713:
-
Labels: newbie pull-request-available  (was: newbie)

> Use queue specific REST API directly
> 
>
> Key: YUNIKORN-2713
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2713
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes, test - e2e
>Reporter: Manikandan R
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: newbie, pull-request-available
>
> There are some places in e2e tests using old way to fetching all queues for 
> the given partition, then fetch queue specific info in next call. Instead, 
> Queue info can be fetched directly in a single call. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2719) Assert invalid group name in Get Group REST API

2024-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2719:
-
Labels: newbie pull-request-available  (was: newbie)

> Assert invalid group name in Get Group REST API
> ---
>
> Key: YUNIKORN-2719
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2719
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Yun Sun
>Priority: Major
>  Labels: newbie, pull-request-available
>
> Assert invalid group name in Get Group REST API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2755) yunikorn-web: pnpm version should be locked

2024-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2755:
-
Labels: pull-request-available  (was: )

> yunikorn-web: pnpm version should be locked
> ---
>
> Key: YUNIKORN-2755
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2755
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: webapp
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>
> Now that we are using pnpm, we should lock the version that we are using to 
> prevent unexpected divergence of package.json and pnpm-lock.yaml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2745) Log analysis adopting loki

2024-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2745:
-
Labels: pull-request-available  (was: )

> Log analysis adopting loki
> --
>
> Key: YUNIKORN-2745
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2745
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Chen Yu Teng
>Assignee: HUAN-IU LIOU
>Priority: Major
>  Labels: pull-request-available
>
> Adding a tutorial how to parse yunikorn log and show logs in Grafana UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2746) Adopting prometheus service monitor instead of modifying config

2024-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2746:
-
Labels: pull-request-available  (was: )

> Adopting prometheus service monitor instead of modifying config
> ---
>
> Key: YUNIKORN-2746
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2746
> Project: Apache YuniKorn
>  Issue Type: Task
>Reporter: Chen Yu Teng
>Assignee: JunHong Peng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2720) Use createRequest() in handlers_test.go

2024-07-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2720:
-
Labels: newbie pull-request-available  (was: newbie)

> Use createRequest() in handlers_test.go
> ---
>
> Key: YUNIKORN-2720
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2720
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Major
>  Labels: newbie, pull-request-available
>
> Use createRequest() helper methods where ever applicable in handlers_test.go. 
> handlers_test.go is huge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2738) Only check failure reason once not for every pod

2024-07-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2738:
-
Labels: pull-request-available  (was: )

> Only check failure reason once not for every pod
> 
>
> Key: YUNIKORN-2738
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2738
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: pull-request-available
>
> The reason for an application failure does not change and can be 
> pre-calculated for all pods when a failure is handled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2732) Improve allocation & queue_events funtion's test coverage

2024-07-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2732:
-
Labels: pull-request-available  (was: )

> Improve allocation & queue_events funtion's test coverage
> -
>
> Key: YUNIKORN-2732
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2732
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2711) Skip setting the queue name to default queue in the shim

2024-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2711:
-
Labels: pull-request-available  (was: )

> Skip setting the queue name to default queue in the shim
> 
>
> Key: YUNIKORN-2711
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2711
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Major
>  Labels: pull-request-available
>
> Admission controller and the scheduler currently checks the pod for the 
> supplied queue name. If the queue name is not provided, it sets the queue to 
> default queue 'root.default'
> After the changes from YUNIKORN-2703, we do not need to set the queue name on 
> the shim and the core should take care of setting the default queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2207) Update user group documentation

2024-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2207:
-
Labels: pull-request-available  (was: )

> Update user group documentation
> ---
>
> Key: YUNIKORN-2207
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2207
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Wilfred Spiegelenburg
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Major
>  Labels: pull-request-available
>
> The order in the [User & Group 
> Resolution|https://yunikorn.apache.org/docs/user_guide/usergroup_resolution/] 
> documentation should be reversed:
>  * current handling via the admission controller
>  * deprecated handling via the label
> We should also add a removal notice for a specific YuniKorn version of the 
> old label. From that release we only support the annotation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2724) Improve the signature of methods notifyTaskComplete() and ensureAppAndTaskCreated()

2024-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2724:
-
Labels: pull-request-available  (was: )

> Improve the signature of methods notifyTaskComplete() and 
> ensureAppAndTaskCreated()
> ---
>
> Key: YUNIKORN-2724
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2724
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Minor
>  Labels: pull-request-available
>
> From the review [https://github.com/apache/yunikorn-k8shim/pull/864]
> Change {{notifyTaskComplete(string, string)}} to 
> {{notifyTaskComplete(*Application, string).}} It removes a number of extra 
> getApplication() calls we really do not need.
> Similar for {{ensureAppAndTaskCreated()}} which is only ever called from this 
> function. Add a parameter to it to make it: 
> {{ensureAppAndTaskCreated(*v1.Pod, *Application)}} and only execute 
> application creation {{{}if app == nil{}}}. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2493) Preemption Hardening

2024-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2493:
-
Labels: pull-request-available  (was: )

> Preemption Hardening
> 
>
> Key: YUNIKORN-2493
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2493
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2728) Config event.RESTResponseSize should be placed under Event System Settings

2024-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2728:
-
Labels: newbie pull-request-available  (was: newbie)

> Config event.RESTResponseSize should be placed under Event System Settings
> --
>
> Key: YUNIKORN-2728
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2728
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Kuan Po Tseng
>Assignee: Chenchen Lai
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> [https://yunikorn.apache.org/docs/next/user_guide/service_config/#eventrestresponsesize]
> event.RESTResponseSize is an event-related config and should be placed under
> [#event-system-settings|https://yunikorn.apache.org/docs/next/user_guide/service_config/#event-system-settings]
>  instead of 
> [#health-settings|https://yunikorn.apache.org/docs/next/user_guide/service_config/#health-settings]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2729) remove `--new-from-rev` from Makefile

2024-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2729:
-
Labels: pull-request-available  (was: )

> remove `--new-from-rev` from Makefile
> -
>
> Key: YUNIKORN-2729
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2729
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Chia-Ping Tsai
>Assignee: Huang Guan Hao
>Priority: Minor
>  Labels: pull-request-available
>
> It is time to show the power of lint :)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2727) Fix Dead Links and Update readme for Docusaurus v3

2024-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2727:
-
Labels: pull-request-available  (was: )

> Fix Dead Links and Update readme for Docusaurus v3
> --
>
> Key: YUNIKORN-2727
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2727
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: documentation
>Reporter: Hsien-Cheng(Ryan) Huang
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Minor
>  Labels: pull-request-available
>
> Issue 1: Dead Link in "Deploy the Scheduler"
> Problem: Dead link at example in "Deploy the Scheduler" section.
> Current: 
> https://yunikorn.apache.org/docs/developer_guide/deployment/#deploy-the-admission-controller
> Solution: Replace with correct links:
> https://yunikorn.apache.org/docs/next/developer_guide/deployment/#deploy-the-scheduler
> https://yunikorn.apache.org/docs/next/developer_guide/deployment/#Deploy-the-Scheduler
> Cause: Migration to Docusaurus v3 with strict URL regulations.
> Issue 2: Outdated Docusaurus Version in README
> Problem: README mentions Docusaurus v2.
> Current: "The website is built based using docusaurus-v2."
> Solution: Update to v3.
> New: "The website is built using Docusaurus v3."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2726) Add "How to check E2E test logs?" to developer guide

2024-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2726:
-
Labels: newbie pull-request-available  (was: newbie)

> Add "How to check E2E test logs?" to developer guide
> 
>
> Key: YUNIKORN-2726
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2726
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Yu-Lin Chen
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: image-2024-07-06-16-39-54-365.png
>
>
> After YUNIKORN-2305 , the logs of failed E2E test are dumped to local and 
> upload to Github Action Artifact. We should let new developers know how to 
> retrieve them.
> We should add some explaination to developer 
> guide(https://yunikorn.apache.org/docs/next/developer_guide/e2e_test), below 
> should be included:
>  # Where to find the local e2e test logs after `make e2e_test` failed? (In 
> yunikorn-k8shim/build/e2e/\{suite}/)
>  # What's logs types we have
> a. \{specName}_k8sClusterInfo.txt
> b.\{specName}_ykContainerLog.txt
> c.\{specName}_ykFullStateDump.json
>  # How to download logs in Github Action (Check below screenshot in [the 
> failed CI 
> Link|https://github.com/apache/yunikorn-k8shim/actions/runs/9807493804]) 
> !image-2024-07-06-16-39-54-365.png|width=573,height=307!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2655) Cleanup REST API documentation

2024-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2655:
-
Labels: pull-request-available  (was: )

> Cleanup REST API documentation
> --
>
> Key: YUNIKORN-2655
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2655
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: documentation
>Reporter: Wilfred Spiegelenburg
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Major
>  Labels: pull-request-available
>
> The REST API documentation is not up to date with the current behaviour as it 
> does not show any 400 or 404 errors returned by a number of API calls.
> The error response only shows a 500 code with the same message for each call.
> We should move to a simple list for each call showing the applicable errors 
> like this:
> {code:java}
> ### Error responses
> **Code** : `400 Bad Request` (URL query is invalid, missing partition name)
> **Code** : `404 Not Found` (Partition not found)
> **Code** : `500 Internal Server Error` {code}
> Remove the error examples as they do not add any detail required



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2699) Preemption e2e tests fail in latest master

2024-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2699:
-
Labels: pull-request-available  (was: )

> Preemption e2e tests fail in latest master
> --
>
> Key: YUNIKORN-2699
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2699
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Manikandan R
>Priority: Critical
>  Labels: pull-request-available
>
> Output:
>  
> {noformat}
> Preemption Verify_basic_preemption
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139
>   STEP: Creating development namespace: dev-anvkm @ 06/25/24 18:08:14.291
>   STEP: A queue uses resource more than the guaranteed value even after 
> removing one of the pods. The cluster doesn't have enough resource to deploy 
> a pod in another queue which uses resource less than the guaranteed value. @ 
> 06/25/24 18:08:15.301
>   STEP: Update root.sandbox1 and root.sandbox2 with guaranteed memory 4677M @ 
> 06/25/24 18:08:15.301
>   STEP: Port-forward the scheduler pod @ 06/25/24 18:08:15.302
> port-forward is already running  STEP: Enabling new scheduling config @ 
> 06/25/24 18:08:15.302
>   STEP: Deploy the sleep pod sleepjob1 to the development namespace @ 
> 06/25/24 18:08:18.313
>   STEP: Deploy the sleep pod sleepjob2 to the development namespace @ 
> 06/25/24 18:08:22.518
>   STEP: Deploy the sleep pod sleepjob3 to the development namespace @ 
> 06/25/24 18:08:26.517
>   STEP: Deploy the sleep pod sleepjob4 to the development namespace @ 
> 06/25/24 18:08:30.518
>   STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:08:38.517
>   [FAILED] in [It] - 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198
>  @ 06/25/24 18:08:38.718
>   Logging yk fullstatedump, spec: Verify_basic_preemption
>   Created log file: 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykFullStateDump.json
>   Logging k8s cluster info, spec: Verify_basic_preemption
>   Created log file: 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_k8sClusterInfo.txt
>   Logging yk container logs, spec: Verify_basic_preemption
>   Created log file: 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykContainerLog.txt
>   STEP: Tear down namespace: dev-anvkm @ 06/25/24 18:08:39.235
>   STEP: Restoring YuniKorn configuration @ 06/25/24 18:08:40.118
>   STEP: Restoring the old config maps @ 06/25/24 18:08:40.119
> • [FAILED] [27.837 seconds]
> Preemption [It] Verify_basic_preemption
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139
>   [FAILED] One of the pods in root.sandbox1 should be preempted
>   Expected
>       : 1
>   to equal
>       : 2
>   In [It] at: 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198
>  @ 06/25/24 18:08:38.718-- Preemption 
> Verify_preemption_on_priority_queue
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:333
>   STEP: Creating development namespace: dev-u0kt7 @ 06/25/24 18:10:24.975
>   STEP: A task can only preempt a task with lower or equal priority @ 
> 06/25/24 18:10:25.982
>   STEP: Update root.sandbox1, root.low-priority, root.high-priority with 
> guaranteed memory 4677M @ 06/25/24 18:10:25.982
>   STEP: Port-forward the scheduler pod @ 06/25/24 18:10:25.983
> port-forward is already running  STEP: Enabling new scheduling config @ 
> 06/25/24 18:10:25.983
>   STEP: Deploy the sleep pod sleepjob1 to the development namespace @ 
> 06/25/24 18:10:28.99
>   STEP: Deploy the sleep pod sleepjob2 to the development namespace @ 
> 06/25/24 18:10:32.791
>   STEP: Deploy the sleep pod sleepjob3 to the development namespace @ 
> 06/25/24 18:10:35.792
>   STEP: Deploy the sleep pod sleepjob4 to the development namespace @ 
> 06/25/24 18:10:38.792
>   STEP: Deploy the sleep pod sleepjob5 to the development namespace @ 
> 06/25/24 18:10:38.995
>   STEP: The sleep pod sleepjob4 can't be scheduled @ 06/25/24 18:10:39.194
>   STEP: The sleep pod sleepjob5 can be scheduled @ 06/25/24 18:10:41.392
>   STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:10:46.392
>   [FAILED] in [It] - 
> /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:424
>  @ 06/25/24 18:10:46.592
>   Logging yk fullstatedump, spec: Verify_preemption_on_priority_queue
>   Created log file: 
> 

[jira] [Updated] (YUNIKORN-2725) Temporarily disable failing e2e preemption tests

2024-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2725:
-
Labels: pull-request-available  (was: )

> Temporarily disable failing e2e preemption tests
> 
>
> Key: YUNIKORN-2725
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2725
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: shim - kubernetes, test - e2e
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>
> Disable the following tests to have green builds:
> Verify_preemption_on_priority_queue
> Verify_basic_preemption



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2319) cache.Task: reference to old pod object is kept after update

2024-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2319:
-
Labels: pull-request-available  (was: )

> cache.Task: reference to old pod object is kept after update
> 
>
> Key: YUNIKORN-2319
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2319
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Attachments: 2024-01-09 134112.png, 2024-01-09 134130.png
>
>
> There is a kind of memory leak in the shim: when the pod is updated, the old 
> pod object is still referenced from Task, so the GC has no chance to remove 
> it (only when the pod terminates).
> See screenshot: task points to version 80199, scheduler cache already has a 
> newer version 81216.
> We have two solutions:
> 1. Update the object in the Task together with the scheduler cache
> 2. Don't store the pointer to the pod, instead, always retrieve it from the 
> scheduler cache



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2697) Improve usergroup funtion's test coverage

2024-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2697:
-
Labels: pull-request-available  (was: )

> Improve usergroup funtion's test coverage
> -
>
> Key: YUNIKORN-2697
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2697
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2722) Expose the IsOriginator flag in REST

2024-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2722:
-
Labels: pull-request-available  (was: )

> Expose the IsOriginator flag in REST
> 
>
> Key: YUNIKORN-2722
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2722
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Yu-Lin Chen
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: pull-request-available
>
> The first real pod for each application is marked as originator. And it’s 
> typically considered as driver/owner pod. This flag is propagated to core and 
> impacts the preemption decision flow.
>  
> However, the current REST API doesn’t expose the originator flag. Exposing 
> the flag will allow user to check which allocation is originator and will be 
> beneficial for monitoring and troubleshooting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2182) Set ReadHeaderTimeout in http server

2024-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2182:
-
Labels: newbie pull-request-available  (was: newbie)

> Set ReadHeaderTimeout in http server
> 
>
> Key: YUNIKORN-2182
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2182
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common, webapp
>Reporter: Wilfred Spiegelenburg
>Assignee: Chenchen Lai
>Priority: Major
>  Labels: newbie, pull-request-available
>
> Potential Slowloris Attack because ReadHeaderTimeout is not configured in the 
> http.Server (gosec)
> We do not set ReadTimeout or ReadHeaderTimeout so we do not have a timeout at 
> all at the moment.
> BTW: this is not important for the webtest servers we build as they are just 
> for our tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2716) Doc changes to escape query params in REST API

2024-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2716:
-
Labels: pull-request-available  (was: )

> Doc changes to escape query params in REST API
> --
>
> Key: YUNIKORN-2716
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2716
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Need to make changes in REST API doc to escape the query params like queue 
> name, user name and group name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2667) E2E test for Gang app originator pod changes after restart

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2667:
-
Labels: pull-request-available  (was: )

> E2E test for Gang app originator pod changes after restart
> --
>
> Key: YUNIKORN-2667
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2667
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: shim - kubernetes
>Reporter: Manikandan R
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/YUNIKORN-2665 had covered unit test for 
> the changes. Need to have a test to cover the full cycle - Before and after 
> restart either by writing a e2e test or using mock scheduler kind of setup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2695) remove core dependency pkg/common

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2695:
-
Labels: pull-request-available  (was: )

> remove core dependency pkg/common
> -
>
> Key: YUNIKORN-2695
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2695
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: HUAN-IU LIOU
>Assignee: Chenchen Lai
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2703) Scheduler does not honor default queue setting from the ConfigMap

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2703:
-
Labels: pull-request-available  (was: )

> Scheduler does not honor default queue setting from the ConfigMap
> -
>
> Key: YUNIKORN-2703
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2703
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Major
>  Labels: pull-request-available
>
> YUNIKORN-1650 added an override for default queue name in the config map to 
> solve for the scenario where the provided placement rule is evaluated before 
> other rules.
> Scheduler also adds a default queue if the pod labels or annotations does not 
> define a queue name. Because this happens before the placement rules are 
> evaluated, we end up in the same situation of applications getting placed in 
> the default queue and ignoring all other placement rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2721) Improve template funtion's test coverage

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2721:
-
Labels: pull-request-available  (was: )

> Improve template funtion's test coverage
> 
>
> Key: YUNIKORN-2721
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2721
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2693) A Example doc of RayService management with Yunikorn

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2693:
-
Labels: pull-request-available  (was: )

> A Example doc of RayService management with Yunikorn
> 
>
> Key: YUNIKORN-2693
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2693
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Chen Yu Teng
>Assignee: JunHong Peng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2715) Handle special characters for params like queue, username & groupname

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2715:
-
Labels: pull-request-available  (was: )

> Handle special characters for params like queue, username & groupname
> -
>
> Key: YUNIKORN-2715
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2715
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler, shim - kubernetes, test - e2e
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> With more special characters coming in for queue, username etc there is a 
> need to ensure those characters has been handled at both sides. Clients need 
> to send those values using escaping methods. Receiver need to parse those 
> values using unescaping method to collect the actual values. Also need to add 
> test for the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2269) remove the USER_LABEL_KEY from docs

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2269:
-
Labels: pull-request-available  (was: )

> remove the USER_LABEL_KEY from docs
> ---
>
> Key: YUNIKORN-2269
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2269
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> core does not support USER_LABEL_KEY  after YUNIKORN-1405 got merged, so we 
> should remove it from docs.
> https://yunikorn.apache.org/docs/user_guide/usergroup_resolution/#using-the-yunikornapacheorgusername-label
> {quote}
> The yunikorn.apache.org/username key can be customized by overriding the 
> default value using the USER_LABEL_KEYenv variable in the K8s Deployment. 
> This is particularly useful in scenarios where the user label is already 
> being added or if the label has to be modified for some secuirty reasons.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2704) Event publish errors out when predicates fail

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2704:
-
Labels: pull-request-available  (was: )

> Event publish errors out when predicates fail
> -
>
> Key: YUNIKORN-2704
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2704
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Mit Desai
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> I consistently see this error in the logs when events are published.
> I did put some debug logs and found that I only get it when the events for 
> untolerated taints are published.
> E0618 17:43:17.858946       1 event_broadcaster.go:270] "Server rejected 
> event (will not retry!)" err="Event \"<>.17da2a31072bb32f\" is 
> invalid: [action: Required value, reason: Required value]" 
> event="\{ObjectMeta:{<>.17da2a31072bb32f  dpi-dev    0 
> 0001-01-01 00:00:00 + UTC   map[] map[] [] [] 
> []},EventTime:2024-06-18 17:43:17.857332069 + UTC 
> m=+84279.014490005,Series:nil,ReportingController:yunikorn,ReportingInstance:yunikorn-yunikorn-scheduler-59bdc88fdc-7h5bt,Action:,Reason:,Regarding:\{Pod
>  <> <> 5c90315c-a07d-4801-9ecc-baf61ee45f11 v1 
> 4323324038 },Related:nil,Note:Predicate failed for request 
> '5c90315c-a07d-4801-9ecc-baf61ee45f11' with message: 'node(s) had untolerated 
> taint \{<>: <>}',Type:Normal,DeprecatedSource:\{ 
> },DeprecatedFirstTimestamp:0001-01-01 00:00:00 + 
> UTC,DeprecatedLastTimestamp:0001-01-01 00:00:00 + UTC,DeprecatedCount:0,}"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2568) Move all xxxEvents types to objects/events

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2568:
-
Labels: pull-request-available  (was: )

> Move all xxxEvents types to objects/events
> --
>
> Key: YUNIKORN-2568
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2568
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2698) E2e tests for k8shim don't compile with latest core

2024-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2698:
-
Labels: pull-request-available  (was: )

> E2e tests for k8shim don't compile with latest core
> ---
>
> Key: YUNIKORN-2698
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2698
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2304) add instruction docs of looping flaky test

2024-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2304:
-
Labels: pull-request-available  (was: )

> add instruction docs of looping flaky test
> --
>
> Key: YUNIKORN-2304
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2304
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Tseng Hsi-Huang
>Priority: Major
>  Labels: pull-request-available
>
> The flaky is hard to be dig-in since it fails rarely. Hence, it would be 
> better to have an example of looping specify flaky in our docs. That can be 
> one-line command. For instance:
> {code:java}
> I=0; while go test -run TestNoFillWithoutEventPluginRegistered ./pkg/... 
> -count=1; do (( I=$I+1 )); echo "Completed loop: $I"; sleep 1; done {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2683) Unnecessary error is logged when resource usage is increased

2024-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2683:
-
Labels: pull-request-available  (was: )

> Unnecessary error is logged when resource usage is increased
> 
>
> Key: YUNIKORN-2683
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2683
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
>  Labels: pull-request-available
>
> The refactored code in YUNIKORN-2542 contains an unnecessary warning message:
> {noformat}
>   appGroup := userTracker.getGroupForApp(applicationID)
>   log.Log(log.SchedUGM).Debug("Increasing resource usage for user",
>   zap.String("user", user.User),
>   zap.String("queue path", queuePath),
>   zap.String("application", applicationID),
>   zap.String("group", appGroup),
>   zap.Stringer("resource", usage))
>   groupTracker := m.GetGroupTracker(appGroup)
>   if groupTracker == nil {
>   log.Log(log.SchedUGM).Error("group tracker should be available 
> in groupTrackers map",
>   zap.String("application", applicationID),
>   zap.String("group", appGroup))
>   return
>   }
> ...
> {noformat}
> We don't always have a {{groupTracker}}. The previous code simply called 
> {{increaseTrackedResource()}} on an empty tracker:
> {noformat}
> func (ut *UserTracker) increaseTrackedResource(queuePath string, 
> applicationID string, usage *resources.Resource) {
>   ut.Lock()
>   defer ut.Unlock()
>   ut.events.sendIncResourceUsageForUser(ut.userName, queuePath, usage)
>   hierarchy := strings.Split(queuePath, configs.DOT)
>   ut.queueTracker.increaseTrackedResource(hierarchy, applicationID, user, 
> usage)
>   gt := ut.appGroupTrackers[applicationID]
>   log.Log(log.SchedUGM).Debug("Increasing resource usage for group",
>   zap.String("group", gt.getName()),
>   zap.Strings("queue path", hierarchy),
>   zap.String("application", applicationID),
>   zap.Stringer("resource", usage))
>   gt.increaseTrackedResource(queuePath, applicationID, usage, 
> ut.userName) <- can be null
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2694) Improve placement rule funtion's test coverage - 2

2024-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2694:
-
Labels: pull-request-available  (was: )

> Improve placement rule funtion's test coverage - 2
> --
>
> Key: YUNIKORN-2694
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2694
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2658) add nolint:funlen to long functions to supress the lint warnings

2024-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2658:
-
Labels: pull-request-available  (was: )

> add nolint:funlen to long functions to supress the lint warnings
> 
>
> Key: YUNIKORN-2658
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2658
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Chia-Ping Tsai
>Assignee: Huang Guan Hao
>Priority: Major
>  Labels: pull-request-available
>
> as title



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2675) A Example doc of RayCluster and RayJob management with Yunikorn

2024-06-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2675:
-
Labels: pull-request-available  (was: )

> A Example doc of RayCluster and RayJob management with Yunikorn
> ---
>
> Key: YUNIKORN-2675
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2675
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Chen Yu Teng
>Assignee: HUAN-IU LIOU
>Priority: Major
>  Labels: pull-request-available
>
> Adding labels and annotation to Raycluser helm chart.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2685) Use the newer WaitForCondition() in shim test

2024-06-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2685:
-
Labels: newbie pull-request-available  (was: newbie)

> Use the newer WaitForCondition() in shim test
> -
>
> Key: YUNIKORN-2685
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2685
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Yu-Lin Chen
>Assignee: HUAN-IU LIOU
>Priority: Major
>  Labels: newbie, pull-request-available
>
> In YUNIKORN-2643, WaitFor() and WaitForCondition() have been refactored.
> We should update to the latest core version and use the newer 
> WaitForCondition() in the shim. 
>  * 
> [https://github.com/apache/yunikorn-k8shim/blob/24efbeda6800fabec17cf9e0474cebee0314bd6e/pkg/client/clients_test.go#L61-L71]
>  * 
> [https://github.com/apache/yunikorn-k8shim/blob/24efbeda6800fabec17cf9e0474cebee0314bd6e/pkg/cache/task_test.go#L203-L205]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2686) Validate user and group specified in filter config

2024-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2686:
-
Labels: pull-request-available  (was: )

> Validate user and group specified in filter config
> --
>
> Key: YUNIKORN-2686
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2686
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Rule filter may have user or group to be allowed or denied. These users and 
> groups are being validated. Since user validation has been changed, need to 
> enhance the test to verify the Rule filter behaviour based on the new 
> validation characters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2680) Improve placement rule funtion's test coverage

2024-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2680:
-
Labels: pull-request-available  (was: )

> Improve placement rule funtion's test coverage
> --
>
> Key: YUNIKORN-2680
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2680
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2679) Add copy URL button on the allocations panel

2024-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2679:
-
Labels: pull-request-available  (was: )

> Add copy URL button on the allocations panel
> 
>
> Key: YUNIKORN-2679
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2679
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Denis Coric
>Assignee: Denis Coric
>Priority: Major
>  Labels: pull-request-available
>
> Add a copy URL button that will generate and copy the hotlink to that 
> allocations screen. It is leveraging the YUNIKORN-2624 implementation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2677) Rename AllocationResult to AllocationResultType

2024-06-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2677:
-
Labels: pull-request-available  (was: )

> Rename AllocationResult to AllocationResultType
> ---
>
> Key: YUNIKORN-2677
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2677
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>
> In preparation for other refactorings, rename the AllocationResult object to 
> AllocationResultType.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2676) Get started yunikorn with load-balancer

2024-06-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2676:
-
Labels: pull-request-available  (was: )

> Get started yunikorn with load-balancer
> ---
>
> Key: YUNIKORN-2676
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2676
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Chen Yu Teng
>Assignee: Chen Yu Teng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2674) specific helm chart link of Service Configuration doc update

2024-06-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2674:
-
Labels: pull-request-available  (was: )

> specific helm chart link of Service Configuration doc update
> 
>
> Key: YUNIKORN-2674
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2674
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: documentation
>Reporter: HUAN-IU LIOU
>Assignee: HUAN-IU LIOU
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2673) Improve newFilter funtion's test coverage in filter.go

2024-06-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2673:
-
Labels: pull-request-available  (was: )

> Improve newFilter funtion's test coverage in filter.go
> --
>
> Key: YUNIKORN-2673
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2673
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2652) Expand getApplication() endpoint handler to optionally return resource usage

2024-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2652:
-
Labels: pull-request-available  (was: )

> Expand getApplication() endpoint handler to optionally return resource usage
> 
>
> Key: YUNIKORN-2652
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2652
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Rich Scott
>Assignee: Tseng Hsi-Huang
>Priority: Major
>  Labels: pull-request-available
>
> Some users would like to be able to see resource usage (preempted, 
> placeholder resource, etc) for applications that have been completed. The 
> `getApplication()` endpoint handler should be enhanced to take an optional 
> parameter specifying that the user would like details about resources 
> included in the response, and a new `ApplicationXXXDAOInfo` object that is a 
> slight superset of `ApplicationDAOInfo` should be introduced, and can be used 
> in the response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2516) Update documentation about event.RESTResponseSize

2024-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2516:
-
Labels: pull-request-available  (was: )

> Update documentation about event.RESTResponseSize
> -
>
> Key: YUNIKORN-2516
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2516
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2672) Upgrade to K8s 1.29.6

2024-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2672:
-
Labels: pull-request-available  (was: )

> Upgrade to K8s 1.29.6
> -
>
> Key: YUNIKORN-2672
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2672
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Craig Condit
>Priority: Critical
>  Labels: pull-request-available
>
> A major performance regression was fixed in K8s that on analysis mainly 
> impacts the plugin implementation. The regression is part of the release 
> 1.29.4 we currently build against.
> See [https://github.com/kubernetes/kubernetes/pull/125197] for details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2657) Validate queue generated as part of the placement rules

2024-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2657:
-
Labels: pull-request-available  (was: )

> Validate queue generated as part of the placement rules
> ---
>
> Key: YUNIKORN-2657
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2657
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Currently, there is no validation or restriction on the characters used in 
> queue name being generated as part of the placement rules. However, queues 
> specified in configuration are going through validation process. Need to do 
> similar validation checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2671) Convert Allocation releases field to singular

2024-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2671:
-
Labels: pull-request-available  (was: )

> Convert Allocation releases field to singular
> -
>
> Key: YUNIKORN-2671
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2671
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>
> Now that repeats are no longer allowed, we have no need to track multiple 
> releases for an allocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2656) Validate user name

2024-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2656:
-
Labels: pull-request-available  (was: )

>  Validate user name
> ---
>
> Key: YUNIKORN-2656
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2656
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
>  Currently, there is no validation or restriction on the characters used in 
> user name specified as part of app submission. However, users specified in 
> limit settings are going through validation process. Need to do similar 
> validation checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2670) Improve util funtion's test coverage

2024-06-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2670:
-
Labels: pull-request-available  (was: )

> Improve util funtion's test coverage
> 
>
> Key: YUNIKORN-2670
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2670
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>
> Improve the following funtion's test coverage in util.go
>  * ZeroTimeInUnixNano
>  * GetNewUUID
>  * IsRecoveryQueue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2626) Add flag to helm chart to disable web container

2024-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2626:
-
Labels: pull-request-available  (was: )

> Add flag to helm chart to disable web container
> ---
>
> Key: YUNIKORN-2626
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2626
> Project: Apache YuniKorn
>  Issue Type: New Feature
>  Components: deployment
>Reporter: Michael
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: pull-request-available
>
> For our use case we only really need the admission controller and scheduler. 
> The helm chart does currently not provide a way to disable deploying the web 
> container and it would be great if that is possible.
> Is there any reason not to disable the web container?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2668) Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails

2024-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2668:
-
Labels: pull-request-available  (was: )

> Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails 
> 
>
> Key: YUNIKORN-2668
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2668
> Project: Apache YuniKorn
>  Issue Type: Task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>
> The test case TestUpdateAllocation_NewTask_AssumePodFails occasionally fails 
> due to a deadlock problem described in YUNIKORN-2629. Until that ticket is 
> resolved, let's disable this test for the time being, so upstream tests don't 
> fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2666) Fix DeepEqual comparison in Test_fixedRule_ruleDAO

2024-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2666:
-
Labels: pull-request-available  (was: )

> Fix DeepEqual comparison in Test_fixedRule_ruleDAO 
> ---
>
> Key: YUNIKORN-2666
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2666
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler, test - unit
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>
> The test case {{Test_fixedRule_ruleDAO/filter}} can randomly fail due to the 
> non-deterministic nature of map key iteration:
> {noformat}
> fixed_rule_test.go:285: assertion failed: 
> --- tt.want
> +++ ruleDAO
>   {
>   Name:   "fixed",
>   Parameters: {"create": "true", "qualified": "false", 
> "queue": "default"},
>   Filter: {
>   Type: "allow",
>   UserList: nil,
>   GroupList: []string{
> - "group1",
> + "group2",
> - "group2",
> + "group1",
>   },
>   UserExp:  "",
>   GroupExp: "",
>   },
>   ParentRule: nil,
>   }
> {noformat}
> We use {{maps.Keys()}} when we create the user list and group list in 
> {{FilterDAO}}. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2665) Gang app originator pod changes after restart

2024-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2665:
-
Labels: pull-request-available  (was: )

> Gang app originator pod changes after restart
> -
>
> Key: YUNIKORN-2665
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2665
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Critical
>  Labels: pull-request-available
>
> Gang app choose the first pod (who created the app) as originator pod which 
> becomes the real driver pod later. While processing gang app specifically 
> after the placeholder creation and in the process of replacement, restart can 
> lead to the below described incorrect behaviour:
> During restore, there is no guarantee on the ordering of pods coming from K8s 
> lister especially when all the pods created with the same second timestamp. 
> k8s use the seconds based timestamp, which means all pods created with in 
> same second has same timestamp. During this situation, whichever pod comes 
> first from lister, YK designate it as originator pod. So, any placeholder 
> could become the originator pod and actual originator pod has been lost. This 
> change could cause rippling effects leading to weird behaviour and needs to 
> be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2515) Add property event.RESTResponseSize to the batch event handler

2024-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2515:
-
Labels: pull-request-available  (was: )

> Add property event.RESTResponseSize to the batch event handler
> --
>
> Key: YUNIKORN-2515
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2515
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2514) Update documentation about event.requestCapacity

2024-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2514:
-
Labels: pull-request-available  (was: )

> Update documentation about event.requestCapacity
> 
>
> Key: YUNIKORN-2514
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2514
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2663) Improve ACL struct funtion's test coverage

2024-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2663:
-
Labels: pull-request-available  (was: )

> Improve ACL struct funtion's test coverage
> --
>
> Key: YUNIKORN-2663
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2663
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>
> Remove unreachable code in NewACL func
> Improve the following funtion's test coverage in acl.go
>  * TestSetUsers
>  * TestSetGroups



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2647) Flaky test TestUpdateNodeCapacity

2024-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2647:
-
Labels: newbie pull-request-available  (was: newbie)

> Flaky test TestUpdateNodeCapacity
> -
>
> Key: YUNIKORN-2647
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2647
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: test - unit
>Reporter: Wilfred Spiegelenburg
>Assignee: Tseng Hsi-Huang
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> Same as we saw in YUNIKORN-2573 the single node update test might fail:
> {code:java}
> --- FAIL: TestUpdateNodeCapacity (0.03s)
>     operation_test.go:446: Expected partition resource map[memory:1 
> vcore:2], doesn't match with actual partition resource 
> map[memory:1 vcore:2]{code}
> We calculate the delta resources when updating node capacity with that delta 
> we update resources in partition.
> The test would fail with following order same as for multiple nodes
> node.SetCapacity() -> waitForAvailableNodeResource() ->  
> partitionInfo.GetTotalPartitionResource()  -> 
> partition.updatePartitionResource()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2624) Enable hotlinking to YuniKorn

2024-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2624:
-
Labels: pull-request-available  (was: )

> Enable hotlinking to YuniKorn
> -
>
> Key: YUNIKORN-2624
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2624
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Denis Coric
>Assignee: Denis Coric
>Priority: Major
>  Labels: pull-request-available
>
> Enable third-party apps to set links to YuniKorn that will populate partition 
> and queue and application ID using the query parameters.
> Queue, Partition, and Application ID should be pre-selected and all details 
> shown on the page using the existing details view and stored in the 
> application storage using the existing functionality. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2654) Remove unused code in k8shim context

2024-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2654:
-
Labels: newbie pull-request-available  (was: newbie)

> Remove unused code in k8shim context
> 
>
> Key: YUNIKORN-2654
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2654
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Chenchen Lai
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> The NotifyApplicationComplete and NotifyApplicationFail  function are not 
> called by anything and are unused code.
> The K8shim does not trigger the application completion or failure. This is 
> triggered by the core when the application no longer has any activity 
> registered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2661) Fix hard-coded boolean in setLimit

2024-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2661:
-
Labels: pull-request-available  (was: )

> Fix hard-coded boolean in setLimit
> --
>
> Key: YUNIKORN-2661
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2661
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>
> Inside the UGM code {{setLimit()}}, we don't pass down {{doWildcardCheck}}, 
> so this variables never reaches the leafs:
> {noformat}
> / Note: Lock free call. The Lock of the linked tracker (UserTracker and 
> GroupTracker) should be held before calling this function.
> func (qt *QueueTracker) setLimit(hierarchy []string, maxResource 
> *resources.Resource, maxApps uint64, useWildCard bool, trackType 
> trackingType, doWildCardCheck bool) {
>   log.Log(log.SchedUGM).Debug("Setting limits",
>   zap.String("queue path", qt.queuePath),
>   zap.Strings("hierarchy", hierarchy),
>   zap.Uint64("max applications", maxApps),
>   zap.Stringer("max resources", maxResource),
>   zap.Bool("use wild card", useWildCard))
>   // depth first: all the way to the leaf, create if not exists
>   // more than 1 in the slice means we need to recurse down
>   if len(hierarchy) > 1 {
>   childName := hierarchy[1]
>   if qt.childQueueTrackers[childName] == nil {
>   qt.childQueueTrackers[childName] = 
> newQueueTracker(qt.queuePath, childName, trackType)
>   }
>   qt.childQueueTrackers[childName].setLimit(hierarchy[1:], 
> maxResource, maxApps, useWildCard, trackType, false)  <-- should be 
> "doWildCardCheck" not "false"
> ...
> {noformat}
> Fix this and create a unit test for {{setLimit()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2659) Improve config validator funtion's test coverage

2024-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2659:
-
Labels: pull-request-available  (was: )

> Improve config validator funtion's test coverage
> 
>
> Key: YUNIKORN-2659
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2659
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>
> Improve the following funtion's test coverage in configvalidator.go
>  * checkPlacementRule 
>  * checkLimitResource 
>  * checkLimit 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2622) Some /debug/pprof/ API response tested is different from example response in docs

2024-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2622:
-
Labels: pull-request-available  (was: )

> Some /debug/pprof/ API response tested is different from example response in 
> docs
> -
>
> Key: YUNIKORN-2622
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2622
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: documentation
>Reporter: Hsien-Cheng(Ryan) Huang
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Minor
>  Labels: pull-request-available
>
> /debug/pprof/symbol
> tested response on 1.5.1: num_symbols: 1
> while doc: binary
> https://yunikorn.apache.org/docs/next/api/system/#success-response-9 
> /debug/pprof/cmdline also: 
> tested response on 1.5.1:  /yunikorn-scheduler
> while doc: binary
> https://yunikorn.apache.org/docs/next/api/system/#cmdline



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2605) Move the bottom allocations table on queues screen to the sidebar according to the design

2024-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2605:
-
Labels: pull-request-available  (was: )

> Move the bottom allocations table on queues screen to the sidebar according 
> to the design
> -
>
> Key: YUNIKORN-2605
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2605
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Denis Coric
>Assignee: Denis Coric
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-05-07-18-21-59-564.png
>
>
> The sidebar has to be tweaked a little to be able to display a list of pods 
> with enough details:
>  * Adjust the dimensions
>  * Define a new view component that will render in the sidebar when the 
> Application is selected
>  * (optional) pagination could work as an infinite scroll
> !image-2024-05-07-18-21-59-564.png|width=1247,height=647!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2650) Complete or remove web_server_test#TestProxy

2024-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2650:
-
Labels: pull-request-available  (was: )

> Complete or remove web_server_test#TestProxy
> 
>
> Key: YUNIKORN-2650
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2650
> Project: Apache YuniKorn
>  Issue Type: Test
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Major
>  Labels: pull-request-available
>
> web_server_test has a empty test case: TestProxy [0]. It seems to me there is 
> proxy-related test [1].
> [0] 
> https://github.com/apache/yunikorn-k8shim/blob/58adfe941d2d8dae5544af8b49e435f304678807/pkg/webtest/web_server_test.go#L82
> [1] 
> https://github.com/apache/yunikorn-k8shim/blob/58adfe941d2d8dae5544af8b49e435f304678807/pkg/webtest/web_server_test.go#L73



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2640) Conside removing config from Clients

2024-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2640:
-
Labels: pull-request-available  (was: )

> Conside removing config from Clients
> 
>
> Key: YUNIKORN-2640
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2640
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Minor
>  Labels: pull-request-available
>
> The config (`conf.SchedulerConf`) [0] references to a global singleton object 
> [1][2]. Also, in the code base `clients#GetConf()` is used 3 times [3] and 
> `conf.GetSchedulerConf()` is used 61 times [4]
> It seems to me `clients#conf` should be removed to avoid confusion.
> [0] 
> https://github.com/apache/yunikorn-k8shim/blob/master/pkg/client/clients.go#L42C8-L42C26
> [1] 
> https://github.com/apache/yunikorn-k8shim/blob/6f2800f689e9e341c736a6af8cbf178a711a9423/pkg/plugin/scheduler_plugin.go#L291
> [2] 
> https://github.com/apache/yunikorn-k8shim/blob/6f2800f689e9e341c736a6af8cbf178a711a9423/pkg/cmd/shim/main.go#L53
> [3] 
> https://github.com/search?q=repo%3Aapache%2Fyunikorn-k8shim+GetConf%28%29=code
> [4] 
> https://github.com/search?q=repo%3Aapache%2Fyunikorn-k8shim+conf.GetSchedulerConf%28%29=code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2653) Gang scheduling K8s event formatting compliance

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2653:
-
Labels: pull-request-available  (was: )

> Gang scheduling K8s event formatting compliance
> ---
>
> Key: YUNIKORN-2653
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2653
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: pull-request-available
>
> The K8s events provide definitions and rules around the content of the fields 
> within the event. Adjust the content of gang scheduling related events to 
> comply with the rules.
> Focussed on the reason and action fields only.
>   * 'reason' is the reason this event is generated. 'reason' should be short 
> and unique; it should be in UpperCamelCase format (starting with a capital 
> letter). 
>  * 'action' explains what happened with regarding/ what action did the 
> ReportingController take in objects name; it should be in UpperCamelCase 
> format (starting with a capital letter). 
> No space or long text.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2567) Remove Application reference from applicationEvents

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2567:
-
Labels: pull-request-available  (was: )

> Remove Application reference from applicationEvents
> ---
>
> Key: YUNIKORN-2567
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2567
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2651) Update the unchecked error for make lint warnings

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2651:
-
Labels: pull-request-available  (was: )

> Update the unchecked error for make lint warnings
> -
>
> Key: YUNIKORN-2651
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2651
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Chia-Ping Tsai
>Assignee: Yun Sun
>Priority: Major
>  Labels: pull-request-available
>
> fix the lint about "unhandled error"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2649) Improve CalculateAbsUsedCapacity & CompUsageRatio funtion's test coverage in resources.go

2024-05-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2649:
-
Labels: pull-request-available  (was: )

> Improve CalculateAbsUsedCapacity & CompUsageRatio funtion's test coverage in 
> resources.go
> -
>
> Key: YUNIKORN-2649
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2649
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2642) Don't set resources on the recovery queue

2024-05-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2642:
-
Labels: pull-request-available  (was: )

> Don't set resources on the recovery queue
> -
>
> Key: YUNIKORN-2642
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2642
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>
> The resource constrainst can be set on dynamic queues based on application 
> tags. We should not set this on the recovery queue, because there's no quota 
> on them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2646) Deadlock detected during preemption

2024-05-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2646:
-
Labels: pull-request-available  (was: )

> Deadlock detected during preemption
> ---
>
> Key: YUNIKORN-2646
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2646
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Dmitry
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Attachments: yunikorn-logs-lock.txt.gz
>
>
> Hitting deadlocks in 1.5.1
> The log is attached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2542) Consistent logging and tracker handling for increment/decrement

2024-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2542:
-
Labels: pull-request-available  (was: )

> Consistent logging and tracker handling for increment/decrement
> ---
>
> Key: YUNIKORN-2542
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2542
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Tseng Hsi-Huang
>Priority: Minor
>  Labels: pull-request-available
>
> We log DEBUG output and use {{GroupTracker}} inconsistently in {{Manager}} 
> and in {{UserTracker}}.
> Eg.
> {{Manager.IncreaseTrackedResource()}}: only a single log output with DEBUG 
> level
> {{Manager.DecreaseTrackedResource()}}: multiple log statements, also handles 
> the group tracker which is not the case with increments
> This also affects {{UserTracker}} - logs handling are different 
> in {{increaseTrackedResource()}}/{{decreaseTrackedResource()}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-182) fix lint issues

2024-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-182:

Labels: pull-request-available  (was: )

> fix lint issues
> ---
>
> Key: YUNIKORN-182
> URL: https://issues.apache.org/jira/browse/YUNIKORN-182
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: build
>Reporter: Wilfred Spiegelenburg
>Assignee: Yun Sun
>Priority: Minor
>  Labels: pull-request-available
>
> When we added the lint test most major issues were fixed. There are still a 
> lot of issues specially in tests that need to be fixed.
> This is a container Jira to track that work on both the k8shim as the core 
> repos.
> Work should be split into multiple parts (per linter?)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2643) utils.go WaitForCondition test coverage improvement

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2643:
-
Labels: pull-request-available  (was: )

> utils.go WaitForCondition test coverage improvement 
> 
>
> Key: YUNIKORN-2643
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2643
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: HUAN-IU LIOU
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2644) Improve FitInScore funtion's test coverage in resources.go

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2644:
-
Labels: pull-request-available  (was: )

> Improve FitInScore funtion's test coverage in resources.go
> --
>
> Key: YUNIKORN-2644
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2644
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2641) Ensure createTime has same semantics for ask and allocation

2024-05-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2641:
-
Labels: pull-request-available  (was: )

> Ensure createTime has same semantics for ask and allocation
> ---
>
> Key: YUNIKORN-2641
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2641
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>
> The createTime field in Allocation and AllocationAsk are not used 
> consistently. Ensure that the field is always set, and that it is not 
> modified later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2633) Unnecessary warning from Partition when adding an application

2024-05-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2633:
-
Labels: pull-request-available  (was: )

> Unnecessary warning from Partition when adding an application
> -
>
> Key: YUNIKORN-2633
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2633
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>
> The following is printed when adding an application:
> {noformat}
> 2024-05-17T21:53:04.716+0200  WARNcore.scheduler.queue
> scheduler/partition.go:344  Trying to set resources on a queue that is 
> not an unmanaged leaf{"queueName": "root.default"}
> {noformat}
> This message is supposed to be printed when the application defines a 
> guaranteed or max resource. After YUNIKORN-2547 it's always printed if the 
> queue is managed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2629) Adding a node can result in a deadlock

2024-05-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-2629:
-
Labels: pull-request-available  (was: )

> Adding a node can result in a deadlock
> --
>
> Key: YUNIKORN-2629
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2629
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Affects Versions: 1.5.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: updateNode_deadlock_trace.txt
>
>
> Adding a new node after Yunikorn state initialization can result in a 
> deadlock.
> The problem is that {{Context.addNode()}} holds a lock while we're waiting 
> for the {{NodeAccepted}} event:
> {noformat}
>dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, 
> func(event interface{}) {
>   nodeEvent, ok := event.(CachedSchedulerNodeEvent)
>   if !ok {
>   return
>   }
>   [...] removed for clarity
>   wg.Done()
>   })
>   defer dispatcher.UnregisterEventHandler(handlerID, 
> dispatcher.EventTypeNode)
>   if err := 
> ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateNode({
>   Nodes: nodesToRegister,
>   RmID:  schedulerconf.GetSchedulerConf().ClusterID,
>   }); err != nil {
>   log.Log(log.ShimContext).Error("Failed to register nodes", 
> zap.Error(err))
>   return nil, err
>   }
>   // wait for all responses to accumulate
>   wg.Wait()  <--- shim gets stuck here
>  {noformat}
> If tasks are being processed, then the dispatcher will try to retrieve the 
> evend handler, which is returned from Context:
> {noformat}
> go func() {
>   for {
>   select {
>   case event := <-getDispatcher().eventChan:
>   switch v := event.(type) {
>   case events.TaskEvent:
>   getEventHandler(EventTypeTask)(v)  <--- 
> eventually calls Context.getTask()
>   case events.ApplicationEvent:
>   getEventHandler(EventTypeApp)(v)
>   case events.SchedulerNodeEvent:
>   getEventHandler(EventTypeNode)(v)  
> {noformat}
> Since {{addNode()}} is holding a write lock, the event processing loop gets 
> stuck, so {{registerNodes()}} will never progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



  1   2   3   4   5   6   7   8   9   10   >