[jira] [Updated] (YUNIKORN-2709) Update website for 1.5.2
[ https://issues.apache.org/jira/browse/YUNIKORN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2709: - Labels: pull-request-available (was: ) > Update website for 1.5.2 > > > Key: YUNIKORN-2709 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2709 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2708) Release notes for 1.5.2
[ https://issues.apache.org/jira/browse/YUNIKORN-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2708: - Labels: pull-request-available release (was: release) > Release notes for 1.5.2 > --- > > Key: YUNIKORN-2708 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2708 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available, release > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2712) Missing specific param error for REST API
[ https://issues.apache.org/jira/browse/YUNIKORN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2712: - Labels: newbie pull-request-available (was: newbie) > Missing specific param error for REST API > - > > Key: YUNIKORN-2712 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2712 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Manikandan R >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Minor > Labels: newbie, pull-request-available > > Some REST API's throw "missing specific param" kind of errors, but not all. > For example, user name is missing. Similarly, all mandatory parameters in > other REST API's can follow the same pattern. It is very clear, rather than > saying "doesn't exists" kind of error. > Suggestion given in > [https://github.com/apache/yunikorn-core/pull/905#discussion_r1663068429] can > be used as reference for implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2524) add documentation for recovery queue (root.@recovery@)
[ https://issues.apache.org/jira/browse/YUNIKORN-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2524: - Labels: newbie pull-request-available (was: newbie) > add documentation for recovery queue (root.@recovery@) > -- > > Key: YUNIKORN-2524 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2524 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: Chia-Ping Tsai >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Major > Labels: newbie, pull-request-available > > the recovery queue is unqueryable directly but we can observe the recovery > queue name via app Restful API (`ws/v1/partition/%s/application/%s`). > Hence, we should write documents for recovery queue. Otherwise, it would be > surprise to users when they see the incomprehensible queue and they get > nothing from our docs. > some discussion on a pr review: > https://github.com/apache/yunikorn-site/pull/426#discussion_r1588788027 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2759) Replace %w by Errors.join
[ https://issues.apache.org/jira/browse/YUNIKORN-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2759: - Labels: pull-request-available (was: ) > Replace %w by Errors.join > - > > Key: YUNIKORN-2759 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2759 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Major > Labels: pull-request-available > > original discussion: https://issues.apache.org/jira/browse/YUNIKORN-2262 > Errors.join can make the code more performant and readable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2771) Optimization: Use termination grace period of 0 seconds for placeholder pods
[ https://issues.apache.org/jira/browse/YUNIKORN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2771: - Labels: pull-request-available (was: ) > Optimization: Use termination grace period of 0 seconds for placeholder pods > > > Key: YUNIKORN-2771 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2771 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > > When we create placeholder pods for gang scheduling, we do not specify a > termination grace period, and therefore inherit the Kubernetes default of 30 > seconds. This is unnecessary as the placeholders do not perform any logic and > therefore require no graceful termination. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2459) Core: Merge ask and allocation objects
[ https://issues.apache.org/jira/browse/YUNIKORN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2459: - Labels: pull-request-available (was: ) > Core: Merge ask and allocation objects > -- > > Key: YUNIKORN-2459 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2459 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > > Merge the Ask and Allocation objects into a single Allocation object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2770) Simplify Application.GetTask()
[ https://issues.apache.org/jira/browse/YUNIKORN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2770: - Labels: pull-request-available (was: ) > Simplify Application.GetTask() > -- > > Key: YUNIKORN-2770 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2770 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Labels: pull-request-available > > {{Application.GetTask()}} returns a {{*Task}} and an {{error}}, but the > {{error}} is completely unnecessary. We either have the task for the given > taskID or we don't. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2766) Only generate event if all predicates failed
[ https://issues.apache.org/jira/browse/YUNIKORN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2766: - Labels: pull-request-available (was: ) > Only generate event if all predicates failed > > > Key: YUNIKORN-2766 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2766 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > > Right now, we send an event to the pod if a predicate failed: > {noformat} >if err := plugin.Predicates({ > AllocationKey: allocationKey, > NodeID:sn.NodeID, > Allocate: allocate, > }); err != nil { > log.Log(log.SchedNode).Debug("running predicates > failed", > zap.String("allocationKey", allocationKey), > zap.String("nodeID", sn.NodeID), > zap.Bool("allocateFlag", allocate), > zap.Error(err)) > // running predicates failed > msg := err.Error() > ask.LogAllocationFailure(msg, allocate) > ask.SendPredicateFailedEvent(msg) > return false > } > {noformat} > This is, however, not correct. We should only generate an event if *all* > predicates have failed, which means that the pod cannot be scheduled. A > failing predicate for a given node can be perfectly normal in many cases. > Instead, we should aggregate the failed predicates and send an event like: > {noformat} > All predicates failed for request '345d70d7-243a-4077-a9f8-0bb76c3532d7': > node(s) didn't match Pod's node affinity/selector (20x); node(s) had taints > that the pod didn't tolerate (5x) > {noformat} > where 20x and 5x tell how many times a certain predicate failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2696) appoint specific version when installing yunikorn
[ https://issues.apache.org/jira/browse/YUNIKORN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2696: - Labels: newbie pull-request-available (was: newbie) > appoint specific version when installing yunikorn > - > > Key: YUNIKORN-2696 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2696 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: Chen Yu Teng >Assignee: Lyu Bo Cian >Priority: Minor > Labels: newbie, pull-request-available > > In get started doc, image tags are latest which is not available on docker > hub. > Need to update helm chart via helm upgrade. > > helm upgrade -f custom.yml --install yunikorn yunikorn/yunikorn -n yunikorn > --create-namespace > ```yml > image: > tag: scheduler-1.5.1 > admissionController: > image: > tag: admission-1.5.1 > web: > image: > tag: web-1.5.1 > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2765) Improve si_helper & resource funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2765: - Labels: pull-request-available (was: ) > Improve si_helper & resource funtion's test coverage > > > Key: YUNIKORN-2765 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2765 > Project: Apache YuniKorn > Issue Type: Test > Components: shim - kubernetes >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > > Improve the following funtion's test coverage > * GetTerminationTypeFromString (unknow terminationtype) > * getMaxResource (requested resource types are fewer than allocated types) > * GetResource > * GetTGResource -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2354) Visualize the current queue that YuniKorn is using
[ https://issues.apache.org/jira/browse/YUNIKORN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2354: - Labels: pull-request-available (was: ) > Visualize the current queue that YuniKorn is using > -- > > Key: YUNIKORN-2354 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2354 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Dong-Lin Hsieh >Assignee: Dong-Lin Hsieh >Priority: Major > Labels: pull-request-available > > # another tab page > # additional queue info (running applicaitons) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2763) add the documentation of REST API for specific queue
[ https://issues.apache.org/jira/browse/YUNIKORN-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2763: - Labels: pull-request-available (was: ) > add the documentation of REST API for specific queue > > > Key: YUNIKORN-2763 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2763 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation, website >Reporter: Chia-Ping Tsai >Assignee: Tzu-Hua Lan >Priority: Major > Labels: pull-request-available > > The new call will be used by e2e (see YUNIKORN-2713), and hence it is worth > having the documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2262) propagate the error message when queue creation gets failed
[ https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2262: - Labels: pull-request-available (was: ) > propagate the error message when queue creation gets failed > --- > > Key: YUNIKORN-2262 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2262 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Minor > Labels: pull-request-available > > [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334] > the error message of root cause is swallowed, so it is hard to be inspired by > the common message "failed to create rule based queue ..." > BTW, the error I met is the parent queue "is already a leaf". The error > message is helpful and it makes us catch up the root cause easily. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2760) `make tools` should check the version of tools
[ https://issues.apache.org/jira/browse/YUNIKORN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2760: - Labels: pull-request-available (was: ) > `make tools` should check the version of tools > -- > > Key: YUNIKORN-2760 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2760 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Tzu-Hua Lan >Priority: Major > Labels: pull-request-available > > Makefile, by default, checks only the existence of file. Hence, developers > need to remove tools folder (or call `make distclean`) manually to trigger > the installation after we update the version of tools. > However, how developers can be aware of the tools updates? Personally, I > smell fishy from the error of warning, but that could be implicit and noisy > :cry > In order to fix that, I'd like to introduce the new folder structure to tools > folder: > {code:java} > /tools/{tool_name}-{version} > {code} > That offers a unique path to each version of tool. Developers will not miss > the updates anymore. > *rejected proposal* > {code:java} > /tools/{tool_name}/{version} > {code} > That offers a unique path to each version of tool. Developers will not miss > the updates anymore. > NOTED: we need to remove the existent tool binary if there is naming conflict > in creating the new path. For example, creating /tools/golangci-lint/1.57.2 > will fail if /tools/golangci-lint is a existent file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2761) Explain preemption storm in usage doc
[ https://issues.apache.org/jira/browse/YUNIKORN-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2761: - Labels: pull-request-available (was: ) > Explain preemption storm in usage doc > - > > Key: YUNIKORN-2761 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2761 > Project: Apache YuniKorn > Issue Type: Improvement > Components: website >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2762) Improve util funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2762: - Labels: pull-request-available (was: ) > Improve util funtion's test coverage > > > Key: YUNIKORN-2762 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2762 > Project: Apache YuniKorn > Issue Type: Test > Components: shim - kubernetes >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > > Improve the following function unit test in util.go > * IsPluginMode > * Convert2ConfigMap > * IsPodRunning > * GetNamespaceQuotaFromAnnotation (JSON Unmarshal error case) > * WaitForCondition > * GetCoreSchedulerConfigFromConfigMap (NotMapping file case) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2713) Use queue specific REST API directly
[ https://issues.apache.org/jira/browse/YUNIKORN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2713: - Labels: newbie pull-request-available (was: newbie) > Use queue specific REST API directly > > > Key: YUNIKORN-2713 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2713 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes, test - e2e >Reporter: Manikandan R >Assignee: Tzu-Hua Lan >Priority: Major > Labels: newbie, pull-request-available > > There are some places in e2e tests using old way to fetching all queues for > the given partition, then fetch queue specific info in next call. Instead, > Queue info can be fetched directly in a single call. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2719) Assert invalid group name in Get Group REST API
[ https://issues.apache.org/jira/browse/YUNIKORN-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2719: - Labels: newbie pull-request-available (was: newbie) > Assert invalid group name in Get Group REST API > --- > > Key: YUNIKORN-2719 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2719 > Project: Apache YuniKorn > Issue Type: Test > Components: core - scheduler >Reporter: Manikandan R >Assignee: Yun Sun >Priority: Major > Labels: newbie, pull-request-available > > Assert invalid group name in Get Group REST API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2755) yunikorn-web: pnpm version should be locked
[ https://issues.apache.org/jira/browse/YUNIKORN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2755: - Labels: pull-request-available (was: ) > yunikorn-web: pnpm version should be locked > --- > > Key: YUNIKORN-2755 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2755 > Project: Apache YuniKorn > Issue Type: Bug > Components: webapp >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > > Now that we are using pnpm, we should lock the version that we are using to > prevent unexpected divergence of package.json and pnpm-lock.yaml. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2745) Log analysis adopting loki
[ https://issues.apache.org/jira/browse/YUNIKORN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2745: - Labels: pull-request-available (was: ) > Log analysis adopting loki > -- > > Key: YUNIKORN-2745 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2745 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: documentation >Reporter: Chen Yu Teng >Assignee: HUAN-IU LIOU >Priority: Major > Labels: pull-request-available > > Adding a tutorial how to parse yunikorn log and show logs in Grafana UI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2746) Adopting prometheus service monitor instead of modifying config
[ https://issues.apache.org/jira/browse/YUNIKORN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2746: - Labels: pull-request-available (was: ) > Adopting prometheus service monitor instead of modifying config > --- > > Key: YUNIKORN-2746 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2746 > Project: Apache YuniKorn > Issue Type: Task >Reporter: Chen Yu Teng >Assignee: JunHong Peng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2720) Use createRequest() in handlers_test.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2720: - Labels: newbie pull-request-available (was: newbie) > Use createRequest() in handlers_test.go > --- > > Key: YUNIKORN-2720 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2720 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Manikandan R >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Major > Labels: newbie, pull-request-available > > Use createRequest() helper methods where ever applicable in handlers_test.go. > handlers_test.go is huge. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2738) Only check failure reason once not for every pod
[ https://issues.apache.org/jira/browse/YUNIKORN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2738: - Labels: pull-request-available (was: ) > Only check failure reason once not for every pod > > > Key: YUNIKORN-2738 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2738 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > > The reason for an application failure does not change and can be > pre-calculated for all pods when a failure is handled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2732) Improve allocation & queue_events funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2732: - Labels: pull-request-available (was: ) > Improve allocation & queue_events funtion's test coverage > - > > Key: YUNIKORN-2732 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2732 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2711) Skip setting the queue name to default queue in the shim
[ https://issues.apache.org/jira/browse/YUNIKORN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2711: - Labels: pull-request-available (was: ) > Skip setting the queue name to default queue in the shim > > > Key: YUNIKORN-2711 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2711 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Mit Desai >Assignee: Mit Desai >Priority: Major > Labels: pull-request-available > > Admission controller and the scheduler currently checks the pod for the > supplied queue name. If the queue name is not provided, it sets the queue to > default queue 'root.default' > After the changes from YUNIKORN-2703, we do not need to set the queue name on > the shim and the core should take care of setting the default queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2207) Update user group documentation
[ https://issues.apache.org/jira/browse/YUNIKORN-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2207: - Labels: pull-request-available (was: ) > Update user group documentation > --- > > Key: YUNIKORN-2207 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2207 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: Wilfred Spiegelenburg >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Major > Labels: pull-request-available > > The order in the [User & Group > Resolution|https://yunikorn.apache.org/docs/user_guide/usergroup_resolution/] > documentation should be reversed: > * current handling via the admission controller > * deprecated handling via the label > We should also add a removal notice for a specific YuniKorn version of the > old label. From that release we only support the annotation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2724) Improve the signature of methods notifyTaskComplete() and ensureAppAndTaskCreated()
[ https://issues.apache.org/jira/browse/YUNIKORN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2724: - Labels: pull-request-available (was: ) > Improve the signature of methods notifyTaskComplete() and > ensureAppAndTaskCreated() > --- > > Key: YUNIKORN-2724 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2724 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Minor > Labels: pull-request-available > > From the review [https://github.com/apache/yunikorn-k8shim/pull/864] > Change {{notifyTaskComplete(string, string)}} to > {{notifyTaskComplete(*Application, string).}} It removes a number of extra > getApplication() calls we really do not need. > Similar for {{ensureAppAndTaskCreated()}} which is only ever called from this > function. Add a parameter to it to make it: > {{ensureAppAndTaskCreated(*v1.Pod, *Application)}} and only execute > application creation {{{}if app == nil{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2493) Preemption Hardening
[ https://issues.apache.org/jira/browse/YUNIKORN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2493: - Labels: pull-request-available (was: ) > Preemption Hardening > > > Key: YUNIKORN-2493 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2493 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2728) Config event.RESTResponseSize should be placed under Event System Settings
[ https://issues.apache.org/jira/browse/YUNIKORN-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2728: - Labels: newbie pull-request-available (was: newbie) > Config event.RESTResponseSize should be placed under Event System Settings > -- > > Key: YUNIKORN-2728 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2728 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: Kuan Po Tseng >Assignee: Chenchen Lai >Priority: Minor > Labels: newbie, pull-request-available > > [https://yunikorn.apache.org/docs/next/user_guide/service_config/#eventrestresponsesize] > event.RESTResponseSize is an event-related config and should be placed under > [#event-system-settings|https://yunikorn.apache.org/docs/next/user_guide/service_config/#event-system-settings] > instead of > [#health-settings|https://yunikorn.apache.org/docs/next/user_guide/service_config/#health-settings] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2729) remove `--new-from-rev` from Makefile
[ https://issues.apache.org/jira/browse/YUNIKORN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2729: - Labels: pull-request-available (was: ) > remove `--new-from-rev` from Makefile > - > > Key: YUNIKORN-2729 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2729 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Chia-Ping Tsai >Assignee: Huang Guan Hao >Priority: Minor > Labels: pull-request-available > > It is time to show the power of lint :) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2727) Fix Dead Links and Update readme for Docusaurus v3
[ https://issues.apache.org/jira/browse/YUNIKORN-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2727: - Labels: pull-request-available (was: ) > Fix Dead Links and Update readme for Docusaurus v3 > -- > > Key: YUNIKORN-2727 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2727 > Project: Apache YuniKorn > Issue Type: Bug > Components: documentation >Reporter: Hsien-Cheng(Ryan) Huang >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Minor > Labels: pull-request-available > > Issue 1: Dead Link in "Deploy the Scheduler" > Problem: Dead link at example in "Deploy the Scheduler" section. > Current: > https://yunikorn.apache.org/docs/developer_guide/deployment/#deploy-the-admission-controller > Solution: Replace with correct links: > https://yunikorn.apache.org/docs/next/developer_guide/deployment/#deploy-the-scheduler > https://yunikorn.apache.org/docs/next/developer_guide/deployment/#Deploy-the-Scheduler > Cause: Migration to Docusaurus v3 with strict URL regulations. > Issue 2: Outdated Docusaurus Version in README > Problem: README mentions Docusaurus v2. > Current: "The website is built based using docusaurus-v2." > Solution: Update to v3. > New: "The website is built using Docusaurus v3." -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2726) Add "How to check E2E test logs?" to developer guide
[ https://issues.apache.org/jira/browse/YUNIKORN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2726: - Labels: newbie pull-request-available (was: newbie) > Add "How to check E2E test logs?" to developer guide > > > Key: YUNIKORN-2726 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2726 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: Yu-Lin Chen >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Major > Labels: newbie, pull-request-available > Attachments: image-2024-07-06-16-39-54-365.png > > > After YUNIKORN-2305 , the logs of failed E2E test are dumped to local and > upload to Github Action Artifact. We should let new developers know how to > retrieve them. > We should add some explaination to developer > guide(https://yunikorn.apache.org/docs/next/developer_guide/e2e_test), below > should be included: > # Where to find the local e2e test logs after `make e2e_test` failed? (In > yunikorn-k8shim/build/e2e/\{suite}/) > # What's logs types we have > a. \{specName}_k8sClusterInfo.txt > b.\{specName}_ykContainerLog.txt > c.\{specName}_ykFullStateDump.json > # How to download logs in Github Action (Check below screenshot in [the > failed CI > Link|https://github.com/apache/yunikorn-k8shim/actions/runs/9807493804]) > !image-2024-07-06-16-39-54-365.png|width=573,height=307! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2655) Cleanup REST API documentation
[ https://issues.apache.org/jira/browse/YUNIKORN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2655: - Labels: pull-request-available (was: ) > Cleanup REST API documentation > -- > > Key: YUNIKORN-2655 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2655 > Project: Apache YuniKorn > Issue Type: Task > Components: documentation >Reporter: Wilfred Spiegelenburg >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Major > Labels: pull-request-available > > The REST API documentation is not up to date with the current behaviour as it > does not show any 400 or 404 errors returned by a number of API calls. > The error response only shows a 500 code with the same message for each call. > We should move to a simple list for each call showing the applicable errors > like this: > {code:java} > ### Error responses > **Code** : `400 Bad Request` (URL query is invalid, missing partition name) > **Code** : `404 Not Found` (Partition not found) > **Code** : `500 Internal Server Error` {code} > Remove the error examples as they do not add any detail required -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2699) Preemption e2e tests fail in latest master
[ https://issues.apache.org/jira/browse/YUNIKORN-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2699: - Labels: pull-request-available (was: ) > Preemption e2e tests fail in latest master > -- > > Key: YUNIKORN-2699 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2699 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Manikandan R >Priority: Critical > Labels: pull-request-available > > Output: > > {noformat} > Preemption Verify_basic_preemption > /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139 > STEP: Creating development namespace: dev-anvkm @ 06/25/24 18:08:14.291 > STEP: A queue uses resource more than the guaranteed value even after > removing one of the pods. The cluster doesn't have enough resource to deploy > a pod in another queue which uses resource less than the guaranteed value. @ > 06/25/24 18:08:15.301 > STEP: Update root.sandbox1 and root.sandbox2 with guaranteed memory 4677M @ > 06/25/24 18:08:15.301 > STEP: Port-forward the scheduler pod @ 06/25/24 18:08:15.302 > port-forward is already running STEP: Enabling new scheduling config @ > 06/25/24 18:08:15.302 > STEP: Deploy the sleep pod sleepjob1 to the development namespace @ > 06/25/24 18:08:18.313 > STEP: Deploy the sleep pod sleepjob2 to the development namespace @ > 06/25/24 18:08:22.518 > STEP: Deploy the sleep pod sleepjob3 to the development namespace @ > 06/25/24 18:08:26.517 > STEP: Deploy the sleep pod sleepjob4 to the development namespace @ > 06/25/24 18:08:30.518 > STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:08:38.517 > [FAILED] in [It] - > /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198 > @ 06/25/24 18:08:38.718 > Logging yk fullstatedump, spec: Verify_basic_preemption > Created log file: > /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykFullStateDump.json > Logging k8s cluster info, spec: Verify_basic_preemption > Created log file: > /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_k8sClusterInfo.txt > Logging yk container logs, spec: Verify_basic_preemption > Created log file: > /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykContainerLog.txt > STEP: Tear down namespace: dev-anvkm @ 06/25/24 18:08:39.235 > STEP: Restoring YuniKorn configuration @ 06/25/24 18:08:40.118 > STEP: Restoring the old config maps @ 06/25/24 18:08:40.119 > • [FAILED] [27.837 seconds] > Preemption [It] Verify_basic_preemption > /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139 > [FAILED] One of the pods in root.sandbox1 should be preempted > Expected > : 1 > to equal > : 2 > In [It] at: > /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198 > @ 06/25/24 18:08:38.718-- Preemption > Verify_preemption_on_priority_queue > /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:333 > STEP: Creating development namespace: dev-u0kt7 @ 06/25/24 18:10:24.975 > STEP: A task can only preempt a task with lower or equal priority @ > 06/25/24 18:10:25.982 > STEP: Update root.sandbox1, root.low-priority, root.high-priority with > guaranteed memory 4677M @ 06/25/24 18:10:25.982 > STEP: Port-forward the scheduler pod @ 06/25/24 18:10:25.983 > port-forward is already running STEP: Enabling new scheduling config @ > 06/25/24 18:10:25.983 > STEP: Deploy the sleep pod sleepjob1 to the development namespace @ > 06/25/24 18:10:28.99 > STEP: Deploy the sleep pod sleepjob2 to the development namespace @ > 06/25/24 18:10:32.791 > STEP: Deploy the sleep pod sleepjob3 to the development namespace @ > 06/25/24 18:10:35.792 > STEP: Deploy the sleep pod sleepjob4 to the development namespace @ > 06/25/24 18:10:38.792 > STEP: Deploy the sleep pod sleepjob5 to the development namespace @ > 06/25/24 18:10:38.995 > STEP: The sleep pod sleepjob4 can't be scheduled @ 06/25/24 18:10:39.194 > STEP: The sleep pod sleepjob5 can be scheduled @ 06/25/24 18:10:41.392 > STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:10:46.392 > [FAILED] in [It] - > /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:424 > @ 06/25/24 18:10:46.592 > Logging yk fullstatedump, spec: Verify_preemption_on_priority_queue > Created log file: >
[jira] [Updated] (YUNIKORN-2725) Temporarily disable failing e2e preemption tests
[ https://issues.apache.org/jira/browse/YUNIKORN-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2725: - Labels: pull-request-available (was: ) > Temporarily disable failing e2e preemption tests > > > Key: YUNIKORN-2725 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2725 > Project: Apache YuniKorn > Issue Type: Test > Components: shim - kubernetes, test - e2e >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > > Disable the following tests to have green builds: > Verify_preemption_on_priority_queue > Verify_basic_preemption -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2319) cache.Task: reference to old pod object is kept after update
[ https://issues.apache.org/jira/browse/YUNIKORN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2319: - Labels: pull-request-available (was: ) > cache.Task: reference to old pod object is kept after update > > > Key: YUNIKORN-2319 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2319 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Attachments: 2024-01-09 134112.png, 2024-01-09 134130.png > > > There is a kind of memory leak in the shim: when the pod is updated, the old > pod object is still referenced from Task, so the GC has no chance to remove > it (only when the pod terminates). > See screenshot: task points to version 80199, scheduler cache already has a > newer version 81216. > We have two solutions: > 1. Update the object in the Task together with the scheduler cache > 2. Don't store the pointer to the pod, instead, always retrieve it from the > scheduler cache -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2697) Improve usergroup funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2697: - Labels: pull-request-available (was: ) > Improve usergroup funtion's test coverage > - > > Key: YUNIKORN-2697 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2697 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2722) Expose the IsOriginator flag in REST
[ https://issues.apache.org/jira/browse/YUNIKORN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2722: - Labels: pull-request-available (was: ) > Expose the IsOriginator flag in REST > > > Key: YUNIKORN-2722 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2722 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Yu-Lin Chen >Assignee: Tzu-Hua Lan >Priority: Major > Labels: pull-request-available > > The first real pod for each application is marked as originator. And it’s > typically considered as driver/owner pod. This flag is propagated to core and > impacts the preemption decision flow. > > However, the current REST API doesn’t expose the originator flag. Exposing > the flag will allow user to check which allocation is originator and will be > beneficial for monitoring and troubleshooting. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2182) Set ReadHeaderTimeout in http server
[ https://issues.apache.org/jira/browse/YUNIKORN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2182: - Labels: newbie pull-request-available (was: newbie) > Set ReadHeaderTimeout in http server > > > Key: YUNIKORN-2182 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2182 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common, webapp >Reporter: Wilfred Spiegelenburg >Assignee: Chenchen Lai >Priority: Major > Labels: newbie, pull-request-available > > Potential Slowloris Attack because ReadHeaderTimeout is not configured in the > http.Server (gosec) > We do not set ReadTimeout or ReadHeaderTimeout so we do not have a timeout at > all at the moment. > BTW: this is not important for the webtest servers we build as they are just > for our tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2716) Doc changes to escape query params in REST API
[ https://issues.apache.org/jira/browse/YUNIKORN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2716: - Labels: pull-request-available (was: ) > Doc changes to escape query params in REST API > -- > > Key: YUNIKORN-2716 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2716 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > > Need to make changes in REST API doc to escape the query params like queue > name, user name and group name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2667) E2E test for Gang app originator pod changes after restart
[ https://issues.apache.org/jira/browse/YUNIKORN-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2667: - Labels: pull-request-available (was: ) > E2E test for Gang app originator pod changes after restart > -- > > Key: YUNIKORN-2667 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2667 > Project: Apache YuniKorn > Issue Type: Test > Components: shim - kubernetes >Reporter: Manikandan R >Assignee: Tzu-Hua Lan >Priority: Major > Labels: pull-request-available > > https://issues.apache.org/jira/browse/YUNIKORN-2665 had covered unit test for > the changes. Need to have a test to cover the full cycle - Before and after > restart either by writing a e2e test or using mock scheduler kind of setup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2695) remove core dependency pkg/common
[ https://issues.apache.org/jira/browse/YUNIKORN-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2695: - Labels: pull-request-available (was: ) > remove core dependency pkg/common > - > > Key: YUNIKORN-2695 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2695 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: HUAN-IU LIOU >Assignee: Chenchen Lai >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2703) Scheduler does not honor default queue setting from the ConfigMap
[ https://issues.apache.org/jira/browse/YUNIKORN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2703: - Labels: pull-request-available (was: ) > Scheduler does not honor default queue setting from the ConfigMap > - > > Key: YUNIKORN-2703 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2703 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Mit Desai >Assignee: Mit Desai >Priority: Major > Labels: pull-request-available > > YUNIKORN-1650 added an override for default queue name in the config map to > solve for the scenario where the provided placement rule is evaluated before > other rules. > Scheduler also adds a default queue if the pod labels or annotations does not > define a queue name. Because this happens before the placement rules are > evaluated, we end up in the same situation of applications getting placed in > the default queue and ignoring all other placement rules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2721) Improve template funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2721: - Labels: pull-request-available (was: ) > Improve template funtion's test coverage > > > Key: YUNIKORN-2721 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2721 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2693) A Example doc of RayService management with Yunikorn
[ https://issues.apache.org/jira/browse/YUNIKORN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2693: - Labels: pull-request-available (was: ) > A Example doc of RayService management with Yunikorn > > > Key: YUNIKORN-2693 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2693 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Chen Yu Teng >Assignee: JunHong Peng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2715) Handle special characters for params like queue, username & groupname
[ https://issues.apache.org/jira/browse/YUNIKORN-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2715: - Labels: pull-request-available (was: ) > Handle special characters for params like queue, username & groupname > - > > Key: YUNIKORN-2715 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2715 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler, shim - kubernetes, test - e2e >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > > With more special characters coming in for queue, username etc there is a > need to ensure those characters has been handled at both sides. Clients need > to send those values using escaping methods. Receiver need to parse those > values using unescaping method to collect the actual values. Also need to add > test for the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2269) remove the USER_LABEL_KEY from docs
[ https://issues.apache.org/jira/browse/YUNIKORN-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2269: - Labels: pull-request-available (was: ) > remove the USER_LABEL_KEY from docs > --- > > Key: YUNIKORN-2269 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2269 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Trivial > Labels: pull-request-available > Fix For: 1.6.0 > > > core does not support USER_LABEL_KEY after YUNIKORN-1405 got merged, so we > should remove it from docs. > https://yunikorn.apache.org/docs/user_guide/usergroup_resolution/#using-the-yunikornapacheorgusername-label > {quote} > The yunikorn.apache.org/username key can be customized by overriding the > default value using the USER_LABEL_KEYenv variable in the K8s Deployment. > This is particularly useful in scenarios where the user label is already > being added or if the label has to be modified for some secuirty reasons. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2704) Event publish errors out when predicates fail
[ https://issues.apache.org/jira/browse/YUNIKORN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2704: - Labels: pull-request-available (was: ) > Event publish errors out when predicates fail > - > > Key: YUNIKORN-2704 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2704 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Mit Desai >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > I consistently see this error in the logs when events are published. > I did put some debug logs and found that I only get it when the events for > untolerated taints are published. > E0618 17:43:17.858946 1 event_broadcaster.go:270] "Server rejected > event (will not retry!)" err="Event \"<>.17da2a31072bb32f\" is > invalid: [action: Required value, reason: Required value]" > event="\{ObjectMeta:{<>.17da2a31072bb32f dpi-dev 0 > 0001-01-01 00:00:00 + UTC map[] map[] [] [] > []},EventTime:2024-06-18 17:43:17.857332069 + UTC > m=+84279.014490005,Series:nil,ReportingController:yunikorn,ReportingInstance:yunikorn-yunikorn-scheduler-59bdc88fdc-7h5bt,Action:,Reason:,Regarding:\{Pod > <> <> 5c90315c-a07d-4801-9ecc-baf61ee45f11 v1 > 4323324038 },Related:nil,Note:Predicate failed for request > '5c90315c-a07d-4801-9ecc-baf61ee45f11' with message: 'node(s) had untolerated > taint \{<>: <>}',Type:Normal,DeprecatedSource:\{ > },DeprecatedFirstTimestamp:0001-01-01 00:00:00 + > UTC,DeprecatedLastTimestamp:0001-01-01 00:00:00 + UTC,DeprecatedCount:0,}" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2568) Move all xxxEvents types to objects/events
[ https://issues.apache.org/jira/browse/YUNIKORN-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2568: - Labels: pull-request-available (was: ) > Move all xxxEvents types to objects/events > -- > > Key: YUNIKORN-2568 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2568 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2698) E2e tests for k8shim don't compile with latest core
[ https://issues.apache.org/jira/browse/YUNIKORN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2698: - Labels: pull-request-available (was: ) > E2e tests for k8shim don't compile with latest core > --- > > Key: YUNIKORN-2698 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2698 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2304) add instruction docs of looping flaky test
[ https://issues.apache.org/jira/browse/YUNIKORN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2304: - Labels: pull-request-available (was: ) > add instruction docs of looping flaky test > -- > > Key: YUNIKORN-2304 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2304 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Tseng Hsi-Huang >Priority: Major > Labels: pull-request-available > > The flaky is hard to be dig-in since it fails rarely. Hence, it would be > better to have an example of looping specify flaky in our docs. That can be > one-line command. For instance: > {code:java} > I=0; while go test -run TestNoFillWithoutEventPluginRegistered ./pkg/... > -count=1; do (( I=$I+1 )); echo "Completed loop: $I"; sleep 1; done {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2683) Unnecessary error is logged when resource usage is increased
[ https://issues.apache.org/jira/browse/YUNIKORN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2683: - Labels: pull-request-available (was: ) > Unnecessary error is logged when resource usage is increased > > > Key: YUNIKORN-2683 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2683 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Labels: pull-request-available > > The refactored code in YUNIKORN-2542 contains an unnecessary warning message: > {noformat} > appGroup := userTracker.getGroupForApp(applicationID) > log.Log(log.SchedUGM).Debug("Increasing resource usage for user", > zap.String("user", user.User), > zap.String("queue path", queuePath), > zap.String("application", applicationID), > zap.String("group", appGroup), > zap.Stringer("resource", usage)) > groupTracker := m.GetGroupTracker(appGroup) > if groupTracker == nil { > log.Log(log.SchedUGM).Error("group tracker should be available > in groupTrackers map", > zap.String("application", applicationID), > zap.String("group", appGroup)) > return > } > ... > {noformat} > We don't always have a {{groupTracker}}. The previous code simply called > {{increaseTrackedResource()}} on an empty tracker: > {noformat} > func (ut *UserTracker) increaseTrackedResource(queuePath string, > applicationID string, usage *resources.Resource) { > ut.Lock() > defer ut.Unlock() > ut.events.sendIncResourceUsageForUser(ut.userName, queuePath, usage) > hierarchy := strings.Split(queuePath, configs.DOT) > ut.queueTracker.increaseTrackedResource(hierarchy, applicationID, user, > usage) > gt := ut.appGroupTrackers[applicationID] > log.Log(log.SchedUGM).Debug("Increasing resource usage for group", > zap.String("group", gt.getName()), > zap.Strings("queue path", hierarchy), > zap.String("application", applicationID), > zap.Stringer("resource", usage)) > gt.increaseTrackedResource(queuePath, applicationID, usage, > ut.userName) <- can be null > } > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2694) Improve placement rule funtion's test coverage - 2
[ https://issues.apache.org/jira/browse/YUNIKORN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2694: - Labels: pull-request-available (was: ) > Improve placement rule funtion's test coverage - 2 > -- > > Key: YUNIKORN-2694 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2694 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2658) add nolint:funlen to long functions to supress the lint warnings
[ https://issues.apache.org/jira/browse/YUNIKORN-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2658: - Labels: pull-request-available (was: ) > add nolint:funlen to long functions to supress the lint warnings > > > Key: YUNIKORN-2658 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2658 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Chia-Ping Tsai >Assignee: Huang Guan Hao >Priority: Major > Labels: pull-request-available > > as title -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2675) A Example doc of RayCluster and RayJob management with Yunikorn
[ https://issues.apache.org/jira/browse/YUNIKORN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2675: - Labels: pull-request-available (was: ) > A Example doc of RayCluster and RayJob management with Yunikorn > --- > > Key: YUNIKORN-2675 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2675 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: documentation >Reporter: Chen Yu Teng >Assignee: HUAN-IU LIOU >Priority: Major > Labels: pull-request-available > > Adding labels and annotation to Raycluser helm chart. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2685) Use the newer WaitForCondition() in shim test
[ https://issues.apache.org/jira/browse/YUNIKORN-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2685: - Labels: newbie pull-request-available (was: newbie) > Use the newer WaitForCondition() in shim test > - > > Key: YUNIKORN-2685 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2685 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Yu-Lin Chen >Assignee: HUAN-IU LIOU >Priority: Major > Labels: newbie, pull-request-available > > In YUNIKORN-2643, WaitFor() and WaitForCondition() have been refactored. > We should update to the latest core version and use the newer > WaitForCondition() in the shim. > * > [https://github.com/apache/yunikorn-k8shim/blob/24efbeda6800fabec17cf9e0474cebee0314bd6e/pkg/client/clients_test.go#L61-L71] > * > [https://github.com/apache/yunikorn-k8shim/blob/24efbeda6800fabec17cf9e0474cebee0314bd6e/pkg/cache/task_test.go#L203-L205] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2686) Validate user and group specified in filter config
[ https://issues.apache.org/jira/browse/YUNIKORN-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2686: - Labels: pull-request-available (was: ) > Validate user and group specified in filter config > -- > > Key: YUNIKORN-2686 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2686 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > > Rule filter may have user or group to be allowed or denied. These users and > groups are being validated. Since user validation has been changed, need to > enhance the test to verify the Rule filter behaviour based on the new > validation characters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2680) Improve placement rule funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2680: - Labels: pull-request-available (was: ) > Improve placement rule funtion's test coverage > -- > > Key: YUNIKORN-2680 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2680 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2679) Add copy URL button on the allocations panel
[ https://issues.apache.org/jira/browse/YUNIKORN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2679: - Labels: pull-request-available (was: ) > Add copy URL button on the allocations panel > > > Key: YUNIKORN-2679 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2679 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Denis Coric >Assignee: Denis Coric >Priority: Major > Labels: pull-request-available > > Add a copy URL button that will generate and copy the hotlink to that > allocations screen. It is leveraging the YUNIKORN-2624 implementation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2677) Rename AllocationResult to AllocationResultType
[ https://issues.apache.org/jira/browse/YUNIKORN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2677: - Labels: pull-request-available (was: ) > Rename AllocationResult to AllocationResultType > --- > > Key: YUNIKORN-2677 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2677 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > > In preparation for other refactorings, rename the AllocationResult object to > AllocationResultType. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2676) Get started yunikorn with load-balancer
[ https://issues.apache.org/jira/browse/YUNIKORN-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2676: - Labels: pull-request-available (was: ) > Get started yunikorn with load-balancer > --- > > Key: YUNIKORN-2676 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2676 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: Chen Yu Teng >Assignee: Chen Yu Teng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2674) specific helm chart link of Service Configuration doc update
[ https://issues.apache.org/jira/browse/YUNIKORN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2674: - Labels: pull-request-available (was: ) > specific helm chart link of Service Configuration doc update > > > Key: YUNIKORN-2674 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2674 > Project: Apache YuniKorn > Issue Type: Task > Components: documentation >Reporter: HUAN-IU LIOU >Assignee: HUAN-IU LIOU >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2673) Improve newFilter funtion's test coverage in filter.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2673: - Labels: pull-request-available (was: ) > Improve newFilter funtion's test coverage in filter.go > -- > > Key: YUNIKORN-2673 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2673 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2652) Expand getApplication() endpoint handler to optionally return resource usage
[ https://issues.apache.org/jira/browse/YUNIKORN-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2652: - Labels: pull-request-available (was: ) > Expand getApplication() endpoint handler to optionally return resource usage > > > Key: YUNIKORN-2652 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2652 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common >Reporter: Rich Scott >Assignee: Tseng Hsi-Huang >Priority: Major > Labels: pull-request-available > > Some users would like to be able to see resource usage (preempted, > placeholder resource, etc) for applications that have been completed. The > `getApplication()` endpoint handler should be enhanced to take an optional > parameter specifying that the user would like details about resources > included in the response, and a new `ApplicationXXXDAOInfo` object that is a > slight superset of `ApplicationDAOInfo` should be introduced, and can be used > in the response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2516) Update documentation about event.RESTResponseSize
[ https://issues.apache.org/jira/browse/YUNIKORN-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2516: - Labels: pull-request-available (was: ) > Update documentation about event.RESTResponseSize > - > > Key: YUNIKORN-2516 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2516 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: documentation >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2672) Upgrade to K8s 1.29.6
[ https://issues.apache.org/jira/browse/YUNIKORN-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2672: - Labels: pull-request-available (was: ) > Upgrade to K8s 1.29.6 > - > > Key: YUNIKORN-2672 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2672 > Project: Apache YuniKorn > Issue Type: Task > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Craig Condit >Priority: Critical > Labels: pull-request-available > > A major performance regression was fixed in K8s that on analysis mainly > impacts the plugin implementation. The regression is part of the release > 1.29.4 we currently build against. > See [https://github.com/kubernetes/kubernetes/pull/125197] for details -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2657) Validate queue generated as part of the placement rules
[ https://issues.apache.org/jira/browse/YUNIKORN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2657: - Labels: pull-request-available (was: ) > Validate queue generated as part of the placement rules > --- > > Key: YUNIKORN-2657 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2657 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - common >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > > Currently, there is no validation or restriction on the characters used in > queue name being generated as part of the placement rules. However, queues > specified in configuration are going through validation process. Need to do > similar validation checks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2671) Convert Allocation releases field to singular
[ https://issues.apache.org/jira/browse/YUNIKORN-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2671: - Labels: pull-request-available (was: ) > Convert Allocation releases field to singular > - > > Key: YUNIKORN-2671 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2671 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > > Now that repeats are no longer allowed, we have no need to track multiple > releases for an allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2656) Validate user name
[ https://issues.apache.org/jira/browse/YUNIKORN-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2656: - Labels: pull-request-available (was: ) > Validate user name > --- > > Key: YUNIKORN-2656 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2656 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - common >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > > Currently, there is no validation or restriction on the characters used in > user name specified as part of app submission. However, users specified in > limit settings are going through validation process. Need to do similar > validation checks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2670) Improve util funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2670: - Labels: pull-request-available (was: ) > Improve util funtion's test coverage > > > Key: YUNIKORN-2670 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2670 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > > Improve the following funtion's test coverage in util.go > * ZeroTimeInUnixNano > * GetNewUUID > * IsRecoveryQueue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2626) Add flag to helm chart to disable web container
[ https://issues.apache.org/jira/browse/YUNIKORN-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2626: - Labels: pull-request-available (was: ) > Add flag to helm chart to disable web container > --- > > Key: YUNIKORN-2626 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2626 > Project: Apache YuniKorn > Issue Type: New Feature > Components: deployment >Reporter: Michael >Assignee: Tzu-Hua Lan >Priority: Major > Labels: pull-request-available > > For our use case we only really need the admission controller and scheduler. > The helm chart does currently not provide a way to disable deploying the web > container and it would be great if that is possible. > Is there any reason not to disable the web container? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2668) Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails
[ https://issues.apache.org/jira/browse/YUNIKORN-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2668: - Labels: pull-request-available (was: ) > Temporarily disable TestUpdateAllocation_NewTask_AssumePodFails > > > Key: YUNIKORN-2668 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2668 > Project: Apache YuniKorn > Issue Type: Task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > > The test case TestUpdateAllocation_NewTask_AssumePodFails occasionally fails > due to a deadlock problem described in YUNIKORN-2629. Until that ticket is > resolved, let's disable this test for the time being, so upstream tests don't > fail. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2666) Fix DeepEqual comparison in Test_fixedRule_ruleDAO
[ https://issues.apache.org/jira/browse/YUNIKORN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2666: - Labels: pull-request-available (was: ) > Fix DeepEqual comparison in Test_fixedRule_ruleDAO > --- > > Key: YUNIKORN-2666 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2666 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler, test - unit >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > > The test case {{Test_fixedRule_ruleDAO/filter}} can randomly fail due to the > non-deterministic nature of map key iteration: > {noformat} > fixed_rule_test.go:285: assertion failed: > --- tt.want > +++ ruleDAO > { > Name: "fixed", > Parameters: {"create": "true", "qualified": "false", > "queue": "default"}, > Filter: { > Type: "allow", > UserList: nil, > GroupList: []string{ > - "group1", > + "group2", > - "group2", > + "group1", > }, > UserExp: "", > GroupExp: "", > }, > ParentRule: nil, > } > {noformat} > We use {{maps.Keys()}} when we create the user list and group list in > {{FilterDAO}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2665) Gang app originator pod changes after restart
[ https://issues.apache.org/jira/browse/YUNIKORN-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2665: - Labels: pull-request-available (was: ) > Gang app originator pod changes after restart > - > > Key: YUNIKORN-2665 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2665 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1 >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Critical > Labels: pull-request-available > > Gang app choose the first pod (who created the app) as originator pod which > becomes the real driver pod later. While processing gang app specifically > after the placeholder creation and in the process of replacement, restart can > lead to the below described incorrect behaviour: > During restore, there is no guarantee on the ordering of pods coming from K8s > lister especially when all the pods created with the same second timestamp. > k8s use the seconds based timestamp, which means all pods created with in > same second has same timestamp. During this situation, whichever pod comes > first from lister, YK designate it as originator pod. So, any placeholder > could become the originator pod and actual originator pod has been lost. This > change could cause rippling effects leading to weird behaviour and needs to > be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2515) Add property event.RESTResponseSize to the batch event handler
[ https://issues.apache.org/jira/browse/YUNIKORN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2515: - Labels: pull-request-available (was: ) > Add property event.RESTResponseSize to the batch event handler > -- > > Key: YUNIKORN-2515 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2515 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2514) Update documentation about event.requestCapacity
[ https://issues.apache.org/jira/browse/YUNIKORN-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2514: - Labels: pull-request-available (was: ) > Update documentation about event.requestCapacity > > > Key: YUNIKORN-2514 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2514 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: documentation >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2663) Improve ACL struct funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2663: - Labels: pull-request-available (was: ) > Improve ACL struct funtion's test coverage > -- > > Key: YUNIKORN-2663 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2663 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > > Remove unreachable code in NewACL func > Improve the following funtion's test coverage in acl.go > * TestSetUsers > * TestSetGroups -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2647) Flaky test TestUpdateNodeCapacity
[ https://issues.apache.org/jira/browse/YUNIKORN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2647: - Labels: newbie pull-request-available (was: newbie) > Flaky test TestUpdateNodeCapacity > - > > Key: YUNIKORN-2647 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2647 > Project: Apache YuniKorn > Issue Type: Bug > Components: test - unit >Reporter: Wilfred Spiegelenburg >Assignee: Tseng Hsi-Huang >Priority: Minor > Labels: newbie, pull-request-available > > Same as we saw in YUNIKORN-2573 the single node update test might fail: > {code:java} > --- FAIL: TestUpdateNodeCapacity (0.03s) > operation_test.go:446: Expected partition resource map[memory:1 > vcore:2], doesn't match with actual partition resource > map[memory:1 vcore:2]{code} > We calculate the delta resources when updating node capacity with that delta > we update resources in partition. > The test would fail with following order same as for multiple nodes > node.SetCapacity() -> waitForAvailableNodeResource() -> > partitionInfo.GetTotalPartitionResource() -> > partition.updatePartitionResource() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2624) Enable hotlinking to YuniKorn
[ https://issues.apache.org/jira/browse/YUNIKORN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2624: - Labels: pull-request-available (was: ) > Enable hotlinking to YuniKorn > - > > Key: YUNIKORN-2624 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2624 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: webapp >Reporter: Denis Coric >Assignee: Denis Coric >Priority: Major > Labels: pull-request-available > > Enable third-party apps to set links to YuniKorn that will populate partition > and queue and application ID using the query parameters. > Queue, Partition, and Application ID should be pre-selected and all details > shown on the page using the existing details view and stored in the > application storage using the existing functionality. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2654) Remove unused code in k8shim context
[ https://issues.apache.org/jira/browse/YUNIKORN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2654: - Labels: newbie pull-request-available (was: newbie) > Remove unused code in k8shim context > > > Key: YUNIKORN-2654 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2654 > Project: Apache YuniKorn > Issue Type: Task > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Chenchen Lai >Priority: Minor > Labels: newbie, pull-request-available > > The NotifyApplicationComplete and NotifyApplicationFail function are not > called by anything and are unused code. > The K8shim does not trigger the application completion or failure. This is > triggered by the core when the application no longer has any activity > registered. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2661) Fix hard-coded boolean in setLimit
[ https://issues.apache.org/jira/browse/YUNIKORN-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2661: - Labels: pull-request-available (was: ) > Fix hard-coded boolean in setLimit > -- > > Key: YUNIKORN-2661 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2661 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > > Inside the UGM code {{setLimit()}}, we don't pass down {{doWildcardCheck}}, > so this variables never reaches the leafs: > {noformat} > / Note: Lock free call. The Lock of the linked tracker (UserTracker and > GroupTracker) should be held before calling this function. > func (qt *QueueTracker) setLimit(hierarchy []string, maxResource > *resources.Resource, maxApps uint64, useWildCard bool, trackType > trackingType, doWildCardCheck bool) { > log.Log(log.SchedUGM).Debug("Setting limits", > zap.String("queue path", qt.queuePath), > zap.Strings("hierarchy", hierarchy), > zap.Uint64("max applications", maxApps), > zap.Stringer("max resources", maxResource), > zap.Bool("use wild card", useWildCard)) > // depth first: all the way to the leaf, create if not exists > // more than 1 in the slice means we need to recurse down > if len(hierarchy) > 1 { > childName := hierarchy[1] > if qt.childQueueTrackers[childName] == nil { > qt.childQueueTrackers[childName] = > newQueueTracker(qt.queuePath, childName, trackType) > } > qt.childQueueTrackers[childName].setLimit(hierarchy[1:], > maxResource, maxApps, useWildCard, trackType, false) <-- should be > "doWildCardCheck" not "false" > ... > {noformat} > Fix this and create a unit test for {{setLimit()}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2659) Improve config validator funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2659: - Labels: pull-request-available (was: ) > Improve config validator funtion's test coverage > > > Key: YUNIKORN-2659 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2659 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > > Improve the following funtion's test coverage in configvalidator.go > * checkPlacementRule > * checkLimitResource > * checkLimit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2622) Some /debug/pprof/ API response tested is different from example response in docs
[ https://issues.apache.org/jira/browse/YUNIKORN-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2622: - Labels: pull-request-available (was: ) > Some /debug/pprof/ API response tested is different from example response in > docs > - > > Key: YUNIKORN-2622 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2622 > Project: Apache YuniKorn > Issue Type: Bug > Components: documentation >Reporter: Hsien-Cheng(Ryan) Huang >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Minor > Labels: pull-request-available > > /debug/pprof/symbol > tested response on 1.5.1: num_symbols: 1 > while doc: binary > https://yunikorn.apache.org/docs/next/api/system/#success-response-9 > /debug/pprof/cmdline also: > tested response on 1.5.1: /yunikorn-scheduler > while doc: binary > https://yunikorn.apache.org/docs/next/api/system/#cmdline -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2605) Move the bottom allocations table on queues screen to the sidebar according to the design
[ https://issues.apache.org/jira/browse/YUNIKORN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2605: - Labels: pull-request-available (was: ) > Move the bottom allocations table on queues screen to the sidebar according > to the design > - > > Key: YUNIKORN-2605 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2605 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: webapp >Reporter: Denis Coric >Assignee: Denis Coric >Priority: Major > Labels: pull-request-available > Attachments: image-2024-05-07-18-21-59-564.png > > > The sidebar has to be tweaked a little to be able to display a list of pods > with enough details: > * Adjust the dimensions > * Define a new view component that will render in the sidebar when the > Application is selected > * (optional) pagination could work as an infinite scroll > !image-2024-05-07-18-21-59-564.png|width=1247,height=647! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2650) Complete or remove web_server_test#TestProxy
[ https://issues.apache.org/jira/browse/YUNIKORN-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2650: - Labels: pull-request-available (was: ) > Complete or remove web_server_test#TestProxy > > > Key: YUNIKORN-2650 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2650 > Project: Apache YuniKorn > Issue Type: Test >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Major > Labels: pull-request-available > > web_server_test has a empty test case: TestProxy [0]. It seems to me there is > proxy-related test [1]. > [0] > https://github.com/apache/yunikorn-k8shim/blob/58adfe941d2d8dae5544af8b49e435f304678807/pkg/webtest/web_server_test.go#L82 > [1] > https://github.com/apache/yunikorn-k8shim/blob/58adfe941d2d8dae5544af8b49e435f304678807/pkg/webtest/web_server_test.go#L73 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2640) Conside removing config from Clients
[ https://issues.apache.org/jira/browse/YUNIKORN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2640: - Labels: pull-request-available (was: ) > Conside removing config from Clients > > > Key: YUNIKORN-2640 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2640 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Minor > Labels: pull-request-available > > The config (`conf.SchedulerConf`) [0] references to a global singleton object > [1][2]. Also, in the code base `clients#GetConf()` is used 3 times [3] and > `conf.GetSchedulerConf()` is used 61 times [4] > It seems to me `clients#conf` should be removed to avoid confusion. > [0] > https://github.com/apache/yunikorn-k8shim/blob/master/pkg/client/clients.go#L42C8-L42C26 > [1] > https://github.com/apache/yunikorn-k8shim/blob/6f2800f689e9e341c736a6af8cbf178a711a9423/pkg/plugin/scheduler_plugin.go#L291 > [2] > https://github.com/apache/yunikorn-k8shim/blob/6f2800f689e9e341c736a6af8cbf178a711a9423/pkg/cmd/shim/main.go#L53 > [3] > https://github.com/search?q=repo%3Aapache%2Fyunikorn-k8shim+GetConf%28%29=code > [4] > https://github.com/search?q=repo%3Aapache%2Fyunikorn-k8shim+conf.GetSchedulerConf%28%29=code -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2653) Gang scheduling K8s event formatting compliance
[ https://issues.apache.org/jira/browse/YUNIKORN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2653: - Labels: pull-request-available (was: ) > Gang scheduling K8s event formatting compliance > --- > > Key: YUNIKORN-2653 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2653 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > > The K8s events provide definitions and rules around the content of the fields > within the event. Adjust the content of gang scheduling related events to > comply with the rules. > Focussed on the reason and action fields only. > * 'reason' is the reason this event is generated. 'reason' should be short > and unique; it should be in UpperCamelCase format (starting with a capital > letter). > * 'action' explains what happened with regarding/ what action did the > ReportingController take in objects name; it should be in UpperCamelCase > format (starting with a capital letter). > No space or long text. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2567) Remove Application reference from applicationEvents
[ https://issues.apache.org/jira/browse/YUNIKORN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2567: - Labels: pull-request-available (was: ) > Remove Application reference from applicationEvents > --- > > Key: YUNIKORN-2567 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2567 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2651) Update the unchecked error for make lint warnings
[ https://issues.apache.org/jira/browse/YUNIKORN-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2651: - Labels: pull-request-available (was: ) > Update the unchecked error for make lint warnings > - > > Key: YUNIKORN-2651 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2651 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Chia-Ping Tsai >Assignee: Yun Sun >Priority: Major > Labels: pull-request-available > > fix the lint about "unhandled error" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2649) Improve CalculateAbsUsedCapacity & CompUsageRatio funtion's test coverage in resources.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2649: - Labels: pull-request-available (was: ) > Improve CalculateAbsUsedCapacity & CompUsageRatio funtion's test coverage in > resources.go > - > > Key: YUNIKORN-2649 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2649 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2642) Don't set resources on the recovery queue
[ https://issues.apache.org/jira/browse/YUNIKORN-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2642: - Labels: pull-request-available (was: ) > Don't set resources on the recovery queue > - > > Key: YUNIKORN-2642 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2642 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > > The resource constrainst can be set on dynamic queues based on application > tags. We should not set this on the recovery queue, because there's no quota > on them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2646) Deadlock detected during preemption
[ https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2646: - Labels: pull-request-available (was: ) > Deadlock detected during preemption > --- > > Key: YUNIKORN-2646 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2646 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Dmitry >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Attachments: yunikorn-logs-lock.txt.gz > > > Hitting deadlocks in 1.5.1 > The log is attached -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2542) Consistent logging and tracker handling for increment/decrement
[ https://issues.apache.org/jira/browse/YUNIKORN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2542: - Labels: pull-request-available (was: ) > Consistent logging and tracker handling for increment/decrement > --- > > Key: YUNIKORN-2542 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2542 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Tseng Hsi-Huang >Priority: Minor > Labels: pull-request-available > > We log DEBUG output and use {{GroupTracker}} inconsistently in {{Manager}} > and in {{UserTracker}}. > Eg. > {{Manager.IncreaseTrackedResource()}}: only a single log output with DEBUG > level > {{Manager.DecreaseTrackedResource()}}: multiple log statements, also handles > the group tracker which is not the case with increments > This also affects {{UserTracker}} - logs handling are different > in {{increaseTrackedResource()}}/{{decreaseTrackedResource()}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-182) fix lint issues
[ https://issues.apache.org/jira/browse/YUNIKORN-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-182: Labels: pull-request-available (was: ) > fix lint issues > --- > > Key: YUNIKORN-182 > URL: https://issues.apache.org/jira/browse/YUNIKORN-182 > Project: Apache YuniKorn > Issue Type: Task > Components: build >Reporter: Wilfred Spiegelenburg >Assignee: Yun Sun >Priority: Minor > Labels: pull-request-available > > When we added the lint test most major issues were fixed. There are still a > lot of issues specially in tests that need to be fixed. > This is a container Jira to track that work on both the k8shim as the core > repos. > Work should be split into multiple parts (per linter?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2643) utils.go WaitForCondition test coverage improvement
[ https://issues.apache.org/jira/browse/YUNIKORN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2643: - Labels: pull-request-available (was: ) > utils.go WaitForCondition test coverage improvement > > > Key: YUNIKORN-2643 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2643 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: HUAN-IU LIOU >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2644) Improve FitInScore funtion's test coverage in resources.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2644: - Labels: pull-request-available (was: ) > Improve FitInScore funtion's test coverage in resources.go > -- > > Key: YUNIKORN-2644 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2644 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2641) Ensure createTime has same semantics for ask and allocation
[ https://issues.apache.org/jira/browse/YUNIKORN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2641: - Labels: pull-request-available (was: ) > Ensure createTime has same semantics for ask and allocation > --- > > Key: YUNIKORN-2641 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2641 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > > The createTime field in Allocation and AllocationAsk are not used > consistently. Ensure that the field is always set, and that it is not > modified later. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2633) Unnecessary warning from Partition when adding an application
[ https://issues.apache.org/jira/browse/YUNIKORN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2633: - Labels: pull-request-available (was: ) > Unnecessary warning from Partition when adding an application > - > > Key: YUNIKORN-2633 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2633 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > > The following is printed when adding an application: > {noformat} > 2024-05-17T21:53:04.716+0200 WARNcore.scheduler.queue > scheduler/partition.go:344 Trying to set resources on a queue that is > not an unmanaged leaf{"queueName": "root.default"} > {noformat} > This message is supposed to be printed when the application defines a > guaranteed or max resource. After YUNIKORN-2547 it's always printed if the > queue is managed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2629) Adding a node can result in a deadlock
[ https://issues.apache.org/jira/browse/YUNIKORN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-2629: - Labels: pull-request-available (was: ) > Adding a node can result in a deadlock > -- > > Key: YUNIKORN-2629 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2629 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Affects Versions: 1.5.0 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Blocker > Labels: pull-request-available > Attachments: updateNode_deadlock_trace.txt > > > Adding a new node after Yunikorn state initialization can result in a > deadlock. > The problem is that {{Context.addNode()}} holds a lock while we're waiting > for the {{NodeAccepted}} event: > {noformat} >dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, > func(event interface{}) { > nodeEvent, ok := event.(CachedSchedulerNodeEvent) > if !ok { > return > } > [...] removed for clarity > wg.Done() > }) > defer dispatcher.UnregisterEventHandler(handlerID, > dispatcher.EventTypeNode) > if err := > ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateNode({ > Nodes: nodesToRegister, > RmID: schedulerconf.GetSchedulerConf().ClusterID, > }); err != nil { > log.Log(log.ShimContext).Error("Failed to register nodes", > zap.Error(err)) > return nil, err > } > // wait for all responses to accumulate > wg.Wait() <--- shim gets stuck here > {noformat} > If tasks are being processed, then the dispatcher will try to retrieve the > evend handler, which is returned from Context: > {noformat} > go func() { > for { > select { > case event := <-getDispatcher().eventChan: > switch v := event.(type) { > case events.TaskEvent: > getEventHandler(EventTypeTask)(v) <--- > eventually calls Context.getTask() > case events.ApplicationEvent: > getEventHandler(EventTypeApp)(v) > case events.SchedulerNodeEvent: > getEventHandler(EventTypeNode)(v) > {noformat} > Since {{addNode()}} is holding a write lock, the event processing loop gets > stuck, so {{registerNodes()}} will never progress. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org