[jira] [Created] (YUNIKORN-2780) Remove unnecessary node ExistingAllocations handling
Craig Condit created YUNIKORN-2780: -- Summary: Remove unnecessary node ExistingAllocations handling Key: YUNIKORN-2780 URL: https://issues.apache.org/jira/browse/YUNIKORN-2780 Project: Apache YuniKorn Issue Type: Task Components: core - scheduler, scheduler-interface, shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit As part of state initialization simplification, existing node allocations are no longer passed in the UpdateNode SI function. We should remove the field and the logic in the core as this is effectively now dead code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2779) Shim: Use UpdateAllocation for both asks and allocations
Craig Condit created YUNIKORN-2779: -- Summary: Shim: Use UpdateAllocation for both asks and allocations Key: YUNIKORN-2779 URL: https://issues.apache.org/jira/browse/YUNIKORN-2779 Project: Apache YuniKorn Issue Type: Sub-task Components: shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2778) Core: Use unified UpdateAllocation API for both asks and allocations
Craig Condit created YUNIKORN-2778: -- Summary: Core: Use unified UpdateAllocation API for both asks and allocations Key: YUNIKORN-2778 URL: https://issues.apache.org/jira/browse/YUNIKORN-2778 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2760) `make tools` should check the version of tools
[ https://issues.apache.org/jira/browse/YUNIKORN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2760. Fix Version/s: 1.6.0 Target Version: 1.6.0 Resolution: Fixed Merged to master. Thanks [~blue.tzuhua] for the contribution! > `make tools` should check the version of tools > -- > > Key: YUNIKORN-2760 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2760 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Tzu-Hua Lan >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Makefile, by default, checks only the existence of file. Hence, developers > need to remove tools folder (or call `make distclean`) manually to trigger > the installation after we update the version of tools. > However, how developers can be aware of the tools updates? Personally, I > smell fishy from the error of warning, but that could be implicit and noisy > :cry > In order to fix that, I'd like to introduce the new folder structure to tools > folder: > {code:java} > /tools/{tool_name}-{version} > {code} > That offers a unique path to each version of tool. Developers will not miss > the updates anymore. > *rejected proposal* > {code:java} > /tools/{tool_name}/{version} > {code} > That offers a unique path to each version of tool. Developers will not miss > the updates anymore. > NOTED: we need to remove the existent tool binary if there is naming conflict > in creating the new path. For example, creating /tools/golangci-lint/1.57.2 > will fail if /tools/golangci-lint is a existent file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2459) Core: Merge ask and allocation objects
[ https://issues.apache.org/jira/browse/YUNIKORN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2459. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Core: Merge ask and allocation objects > -- > > Key: YUNIKORN-2459 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2459 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Merge the Ask and Allocation objects into a single Allocation object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2771) Optimization: Use termination grace period of 0 seconds for placeholder pods
[ https://issues.apache.org/jira/browse/YUNIKORN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2771. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Optimization: Use termination grace period of 0 seconds for placeholder pods > > > Key: YUNIKORN-2771 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2771 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > When we create placeholder pods for gang scheduling, we do not specify a > termination grace period, and therefore inherit the Kubernetes default of 30 > seconds. This is unnecessary as the placeholders do not perform any logic and > therefore require no graceful termination. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2739) Core: Discuss removal of TODO regarding reflection
[ https://issues.apache.org/jira/browse/YUNIKORN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2739. -- Resolution: Won't Do Closing as no further action is required. > Core: Discuss removal of TODO regarding reflection > -- > > Key: YUNIKORN-2739 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2739 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chenchen Lai >Priority: Minor > Labels: newbie > > The current Jira is intended to replace all "TODO" comments, which will be > removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose > of the Jira is to discuss whether the tasks described by these TODO comments > are worth executing. > The file link is as follows: > [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/common/security/usergroup.go#L74] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2740) Core: Discuss removal of TODO regarding configurable reservation delay
[ https://issues.apache.org/jira/browse/YUNIKORN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2740. -- Resolution: Won't Do Closing as no further work is required. > Core: Discuss removal of TODO regarding configurable reservation delay > -- > > Key: YUNIKORN-2740 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2740 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chenchen Lai >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Minor > Labels: newbie > > The current Jira is intended to replace all "TODO" comments, which will be > removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose > of the Jira is to discuss whether the tasks described by these TODO comments > are worth executing. > The file link is as follows: > [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/objects/application.go#L1448] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2742) Core: Discuss TODO regarding getting resolver from the config
[ https://issues.apache.org/jira/browse/YUNIKORN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2742. -- Resolution: Won't Do Closing as no action is required. > Core: Discuss TODO regarding getting resolver from the config > - > > Key: YUNIKORN-2742 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2742 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chenchen Lai >Priority: Minor > Labels: newbie > > The current Jira is intended to replace all "TODO" comments, which will be > removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose > of the Jira is to discuss whether the tasks described by these TODO comments > are worth executing. > The file link is as follows: > [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/partition.go#L130] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2741) Core: Discuss removal of TODO regarding add mock for plugin to extend tests
[ https://issues.apache.org/jira/browse/YUNIKORN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2741. -- Resolution: Won't Do Closing as no further action is required. > Core: Discuss removal of TODO regarding add mock for plugin to extend tests > --- > > Key: YUNIKORN-2741 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2741 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chenchen Lai >Priority: Minor > Labels: newbie > > The current Jira is intended to replace all "TODO" comments, which will be > removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose > of the Jira is to discuss whether the tasks described by these TODO comments > are worth executing. > The file link is as follows: > [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/objects/node_test.go#L111] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2743) Core: Consider time out waiting for draining and removal
[ https://issues.apache.org/jira/browse/YUNIKORN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2743. -- Resolution: Won't Do Closing as Won't Do since this isn't required. > Core: Consider time out waiting for draining and removal > > > Key: YUNIKORN-2743 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2743 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chenchen Lai >Priority: Minor > Labels: newbie > > The current Jira is intended to replace all "TODO" comments, which will be > removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose > of the Jira is to discuss whether the tasks described by these TODO comments > are worth executing. > The file link is as follows: > [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/partition_manager.go#L126] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2744) Core: Discuss making web server port configurable
[ https://issues.apache.org/jira/browse/YUNIKORN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2744. -- Resolution: Won't Do Closing as Won't Do since this is not needed. > Core: Discuss making web server port configurable > - > > Key: YUNIKORN-2744 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2744 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chenchen Lai >Priority: Minor > Labels: newbie > > The current Jira is intended to replace all "TODO" comments, which will be > removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose > of the Jira is to discuss whether the tasks described by these TODO comments > are worth executing. > The file link is as follows: > [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/webservice/webservice.go#L65] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2771) Optimization: Use termination grace period of 0 seconds for placeholder pods
Craig Condit created YUNIKORN-2771: -- Summary: Optimization: Use termination grace period of 0 seconds for placeholder pods Key: YUNIKORN-2771 URL: https://issues.apache.org/jira/browse/YUNIKORN-2771 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit When we create placeholder pods for gang scheduling, we do not specify a termination grace period, and therefore inherit the Kubernetes default of 30 seconds. This is unnecessary as the placeholders do not perform any logic and therefore require no graceful termination. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2755) yunikorn-web: pnpm version should be locked
[ https://issues.apache.org/jira/browse/YUNIKORN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2755. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > yunikorn-web: pnpm version should be locked > --- > > Key: YUNIKORN-2755 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2755 > Project: Apache YuniKorn > Issue Type: Bug > Components: webapp >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Now that we are using pnpm, we should lock the version that we are using to > prevent unexpected divergence of package.json and pnpm-lock.yaml. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2755) yunikorn-web: pnpm version should be locked
Craig Condit created YUNIKORN-2755: -- Summary: yunikorn-web: pnpm version should be locked Key: YUNIKORN-2755 URL: https://issues.apache.org/jira/browse/YUNIKORN-2755 Project: Apache YuniKorn Issue Type: Bug Components: webapp Reporter: Craig Condit Assignee: Craig Condit Now that we are using pnpm, we should lock the version that we are using to prevent unexpected divergence of package.json and pnpm-lock.yaml. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2230) Placement rule does not behave as expected
[ https://issues.apache.org/jira/browse/YUNIKORN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2230. Fix Version/s: 1.6.0 Target Version: 1.6.0, 1.5.2 Resolution: Delivered Resolved via other issues. > Placement rule does not behave as expected > -- > > Key: YUNIKORN-2230 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2230 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Kuan Po Tseng >Assignee: Kuan Po Tseng >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > yunikorn configmap > {code:yaml} > apiVersion: v1 > kind: ConfigMap > metadata: > name: yunikorn-configs > namespace: yunikorn > data: > log.level: "DEBUG" > admissionController.filtering.defaultQueue: "" > queues.yaml: | > partitions: > - name: default > placementrules: > - name: provided > create: false > - name: tag > value: namespace > create: true > queues: > - name: root > submitacl: "*" > queues: > - name: sandbox > submitacl: "*" > {code} > test pod > {code:yaml} > apiVersion: v1 > kind: Pod > metadata: > labels: > app: sleep > applicationId: "application-sleep-0001" > name: task0 > spec: > schedulerName: yunikorn > restartPolicy: Never > containers: > - name: sleep-30s > image: "alpine:latest" > command: ["sleep", "30"] > resources: > requests: > cpu: "100m" > memory: "500M" > {code} > Even though there is no queue name specified for the sleep pod, it's still > submitted to root.sandbox(Shim 's default queue value.) What we expected was > that it should submit the application through 'tag' placement rule. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2720) Use createRequest() in handlers_test.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2720. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Use createRequest() in handlers_test.go > --- > > Key: YUNIKORN-2720 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2720 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Manikandan R >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Major > Labels: newbie, pull-request-available > Fix For: 1.6.0 > > > Use createRequest() helper methods where ever applicable in handlers_test.go. > handlers_test.go is huge. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2721) Improve template funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2721. Fix Version/s: 1.6.0 Target Version: 1.6.0 Resolution: Fixed Merged to master. > Improve template funtion's test coverage > > > Key: YUNIKORN-2721 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2721 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2722) Expose the IsOriginator flag in REST
[ https://issues.apache.org/jira/browse/YUNIKORN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2722. Fix Version/s: 1.6.0 Resolution: Fixed Resolving, as this appears to have been merged already. > Expose the IsOriginator flag in REST > > > Key: YUNIKORN-2722 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2722 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Yu-Lin Chen >Assignee: Tzu-Hua Lan >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > The first real pod for each application is marked as originator. And it’s > typically considered as driver/owner pod. This flag is propagated to core and > impacts the preemption decision flow. > > However, the current REST API doesn’t expose the originator flag. Exposing > the flag will allow user to check which allocation is originator and will be > beneficial for monitoring and troubleshooting. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2732) Improve allocation & queue_events funtion's test coverage
[ https://issues.apache.org/jira/browse/YUNIKORN-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2732. Fix Version/s: 1.6.0 Target Version: 1.6.0 Resolution: Fixed Merged to master. > Improve allocation & queue_events funtion's test coverage > - > > Key: YUNIKORN-2732 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2732 > Project: Apache YuniKorn > Issue Type: Test > Components: core - common >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2711) Skip setting the queue name to default queue in the shim
[ https://issues.apache.org/jira/browse/YUNIKORN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2711. Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed Merged to master and branch-1.5. > Skip setting the queue name to default queue in the shim > > > Key: YUNIKORN-2711 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2711 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Mit Desai >Assignee: Mit Desai >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > Admission controller and the scheduler currently checks the pod for the > supplied queue name. If the queue name is not provided, it sets the queue to > default queue 'root.default' > After the changes from YUNIKORN-2703, we do not need to set the queue name on > the shim and the core should take care of setting the default queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2703) Core: Fallback to default queue if no placement rules match
[ https://issues.apache.org/jira/browse/YUNIKORN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2703. Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed Merged to master and backported (manually) to branch-1.5. > Core: Fallback to default queue if no placement rules match > --- > > Key: YUNIKORN-2703 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2703 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Mit Desai >Assignee: Mit Desai >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > YUNIKORN-1650 added an override for default queue name in the config map to > solve for the scenario where the provided placement rule is evaluated before > other rules. > Scheduler also adds a default queue if the pod labels or annotations does not > define a queue name. Because this happens before the placement rules are > evaluated, we end up in the same situation of applications getting placed in > the default queue and ignoring all other placement rules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2700) Use AllocationResult instead of Allocation in scheduler routines
[ https://issues.apache.org/jira/browse/YUNIKORN-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2700. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Use AllocationResult instead of Allocation in scheduler routines > > > Key: YUNIKORN-2700 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2700 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Fix For: 1.6.0 > > > The Allocation object is currently abused as a generic return type in various > scheduler routines. This is most notable when reserving / unreserving. > Instead of returning an Allocation object, wrap in an AllocationResult object > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2700) Use AllocationResult instead of Allocation in scheduler routines
Craig Condit created YUNIKORN-2700: -- Summary: Use AllocationResult instead of Allocation in scheduler routines Key: YUNIKORN-2700 URL: https://issues.apache.org/jira/browse/YUNIKORN-2700 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Craig Condit Assignee: Craig Condit The Allocation object is currently abused as a generic return type in various scheduler routines. This is most notable when reserving / unreserving. Instead of returning an Allocation object, wrap in an AllocationResult object instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2698) E2e tests for k8shim don't compile with latest core
[ https://issues.apache.org/jira/browse/YUNIKORN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2698. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. Also opened YUNIKORN-2699 to address e2e test failures in preemption. > E2e tests for k8shim don't compile with latest core > --- > > Key: YUNIKORN-2698 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2698 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2699) Preemption e2e tests fail in latest master
Craig Condit created YUNIKORN-2699: -- Summary: Preemption e2e tests fail in latest master Key: YUNIKORN-2699 URL: https://issues.apache.org/jira/browse/YUNIKORN-2699 Project: Apache YuniKorn Issue Type: Bug Components: shim - kubernetes Reporter: Craig Condit Assignee: Manikandan R Output: {noformat} Preemption Verify_basic_preemption /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139 STEP: Creating development namespace: dev-anvkm @ 06/25/24 18:08:14.291 STEP: A queue uses resource more than the guaranteed value even after removing one of the pods. The cluster doesn't have enough resource to deploy a pod in another queue which uses resource less than the guaranteed value. @ 06/25/24 18:08:15.301 STEP: Update root.sandbox1 and root.sandbox2 with guaranteed memory 4677M @ 06/25/24 18:08:15.301 STEP: Port-forward the scheduler pod @ 06/25/24 18:08:15.302 port-forward is already running STEP: Enabling new scheduling config @ 06/25/24 18:08:15.302 STEP: Deploy the sleep pod sleepjob1 to the development namespace @ 06/25/24 18:08:18.313 STEP: Deploy the sleep pod sleepjob2 to the development namespace @ 06/25/24 18:08:22.518 STEP: Deploy the sleep pod sleepjob3 to the development namespace @ 06/25/24 18:08:26.517 STEP: Deploy the sleep pod sleepjob4 to the development namespace @ 06/25/24 18:08:30.518 STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:08:38.517 [FAILED] in [It] - /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198 @ 06/25/24 18:08:38.718 Logging yk fullstatedump, spec: Verify_basic_preemption Created log file: /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykFullStateDump.json Logging k8s cluster info, spec: Verify_basic_preemption Created log file: /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_k8sClusterInfo.txt Logging yk container logs, spec: Verify_basic_preemption Created log file: /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykContainerLog.txt STEP: Tear down namespace: dev-anvkm @ 06/25/24 18:08:39.235 STEP: Restoring YuniKorn configuration @ 06/25/24 18:08:40.118 STEP: Restoring the old config maps @ 06/25/24 18:08:40.119 • [FAILED] [27.837 seconds] Preemption [It] Verify_basic_preemption /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139 [FAILED] One of the pods in root.sandbox1 should be preempted Expected : 1 to equal : 2 In [It] at: /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198 @ 06/25/24 18:08:38.718-- Preemption Verify_preemption_on_priority_queue /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:333 STEP: Creating development namespace: dev-u0kt7 @ 06/25/24 18:10:24.975 STEP: A task can only preempt a task with lower or equal priority @ 06/25/24 18:10:25.982 STEP: Update root.sandbox1, root.low-priority, root.high-priority with guaranteed memory 4677M @ 06/25/24 18:10:25.982 STEP: Port-forward the scheduler pod @ 06/25/24 18:10:25.983 port-forward is already running STEP: Enabling new scheduling config @ 06/25/24 18:10:25.983 STEP: Deploy the sleep pod sleepjob1 to the development namespace @ 06/25/24 18:10:28.99 STEP: Deploy the sleep pod sleepjob2 to the development namespace @ 06/25/24 18:10:32.791 STEP: Deploy the sleep pod sleepjob3 to the development namespace @ 06/25/24 18:10:35.792 STEP: Deploy the sleep pod sleepjob4 to the development namespace @ 06/25/24 18:10:38.792 STEP: Deploy the sleep pod sleepjob5 to the development namespace @ 06/25/24 18:10:38.995 STEP: The sleep pod sleepjob4 can't be scheduled @ 06/25/24 18:10:39.194 STEP: The sleep pod sleepjob5 can be scheduled @ 06/25/24 18:10:41.392 STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:10:46.392 [FAILED] in [It] - /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:424 @ 06/25/24 18:10:46.592 Logging yk fullstatedump, spec: Verify_preemption_on_priority_queue Created log file: /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_preemption_on_priority_queue_ykFullStateDump.json Logging k8s cluster info, spec: Verify_preemption_on_priority_queue Created log file: /home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_preemption_on_priority_queue_k8sClusterInfo.txt Logging yk container logs, spec: Verify_preemption_on_priority_queue Created log file:
[jira] [Created] (YUNIKORN-2698) E2e tests for k8shim don't compile with latest core
Craig Condit created YUNIKORN-2698: -- Summary: E2e tests for k8shim don't compile with latest core Key: YUNIKORN-2698 URL: https://issues.apache.org/jira/browse/YUNIKORN-2698 Project: Apache YuniKorn Issue Type: Bug Components: shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2677) Rename AllocationResult to AllocationResultType
[ https://issues.apache.org/jira/browse/YUNIKORN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2677. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Rename AllocationResult to AllocationResultType > --- > > Key: YUNIKORN-2677 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2677 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > In preparation for other refactoring, rename the AllocationResult enum to > AllocationResultType. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2682) YuniKorn Gang Scheduling Issue: Executors Failing to Start When Running Multiple Applications
[ https://issues.apache.org/jira/browse/YUNIKORN-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2682. -- Assignee: Craig Condit Resolution: Workaround > YuniKorn Gang Scheduling Issue: Executors Failing to Start When Running > Multiple Applications > - > > Key: YUNIKORN-2682 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2682 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.3.0 >Reporter: huangzhir >Assignee: Craig Condit >Priority: Major > Attachments: image-2024-06-19-00-02-53-178.png, > image-2024-06-19-00-03-09-703.png > > > h2. Description: > While using YuniKorn's gang scheduling, we encountered a situation where the > scheduling process appears to succeed, but in reality, there is a problem. > When submitting two applications simultaneously, only the driver pods are > successfully running, and the executor pods fail to start due to insufficient > resources. The following error is observed in the scheduler logs: > {code:java} > 2024-06-18T15:15:27.933Z ERROR cache/placeholder_manager.go:99 failed to > create placeholder pod {"error": "pods > \"tg-spark-driver-spark-8e410a4c5ce44da2aa85ba-0\" is forbidden: failed > quota: spark-quota: must specify limits.cpu,limits.memory"} > github.com/apache/yunikorn-k8shim/pkg/cache.(*PlaceholderManager).createAppPlaceholders > github.com/apache/yunikorn-k8shim/pkg/cache/placeholder_manager.go:99 > github.com/apache/yunikorn-k8shim/pkg/cache.(*Application).onReserving.func1 > github.com/apache/yunikorn-k8shim/pkg/cache/application.go:542 {code} > h2. Environment: > * YuniKorn version: 1.3.0 > * Kubernetes version: 1.21.3 > * Spark version: 3.2.2 > h2. *resource-quota.yaml* > {code:java} > apiVersion: v1 > kind: ResourceQuota > metadata: > name: spark-quota > namespace: spark > spec: > hard: > requests.cpu: "5" > requests.memory: "5Gi" > limits.cpu: "5" > limits.memory: "5Gi" {code} > h2. yunikorn-configs.yaml > {code:java} > apiVersion: v1 > kind: ConfigMap > metadata: > name: yunikorn-configs > namespace: yunikorn > data: > log.level: "-1" > log.admission.level: "-1" > log.core.config.level: "-1" > queues.yaml: | > partitions: > - name: default > placementrules: > - name: tag > value: namespace > create: true > queues: > - name: root > submitacl: '*' > properties: > application.sort.policy: fifo > placeholderTimeoutInSeconds: 60 > schedulingStyle: Hard > queues: > - name: spark > properties: > application.sort.policy: fifo > placeholderTimeoutInSeconds: 60 > schedulingStyle: Hard > resources: > guaranteed: > vcore: 5 > memory: 5Gi > max: > vcore: 5 > memory: 5Gi {code} > h2. Spark-submit command > {code:java} > ./bin/spark-submit \ > --master k8s://https://10.10.10.10:6443 \ > --deploy-mode cluster \ > --name spark-pi \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=sparksa \ > --conf spark.kubernetes.namespace=spark \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=1 \ > --conf spark.executor.cores=1 \ > --conf spark.executor.memory=1500m \ > --conf spark.driver.cores=1 \ > --conf spark.driver.memory=1500m \ > --conf spark.kubernetes.driver.limit.cores=1 \ > --conf spark.kubernetes.driver.limit.memory=2G \ > --conf spark.kubernetes.executor.limit.cores=1 \ > --conf spark.kubernetes.executor.limit.memory=2G \ >--conf spark.kubernetes.driver.label.app=spark \ > --conf spark.kubernetes.executor.label.app=spark \ > --conf spark.kubernetes.container.image=apache/spark:v3.3.2 \ > --conf spark.kubernetes.scheduler.name=yunikorn \ > --conf spark.kubernetes.driver.label.queue=root.spark \ > --conf spark.kubernetes.executor.label.queue=root.spark \ > --conf > spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \ > --conf > spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \ > --conf > spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver > \ > --conf > spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name": > "spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory": > "2Gi"},"nodeSelector": {"app": "spark"} }, {"name": "spark-executor", > "minMember": 1, "minResource": {"cpu": "1", "memory":
[jira] [Created] (YUNIKORN-2677) Rename AllocationResult to AllocationResultType
Craig Condit created YUNIKORN-2677: -- Summary: Rename AllocationResult to AllocationResultType Key: YUNIKORN-2677 URL: https://issues.apache.org/jira/browse/YUNIKORN-2677 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Craig Condit Assignee: Craig Condit In preparation for other refactorings, rename the AllocationResult object to AllocationResultType. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2672) Upgrade to K8s 1.29.6
[ https://issues.apache.org/jira/browse/YUNIKORN-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2672. Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed Merged to master and cherry-picked to branch-1.5. > Upgrade to K8s 1.29.6 > - > > Key: YUNIKORN-2672 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2672 > Project: Apache YuniKorn > Issue Type: Task > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Craig Condit >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > A major performance regression was fixed in K8s that on analysis mainly > impacts the plugin implementation. The regression is part of the release > 1.29.4 we currently build against. > See [https://github.com/kubernetes/kubernetes/pull/125197] for details -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2671) Convert Allocation releases field to singular
[ https://issues.apache.org/jira/browse/YUNIKORN-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2671. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Convert Allocation releases field to singular > - > > Key: YUNIKORN-2671 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2671 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Now that repeats are no longer allowed, we have no need to track multiple > releases for an allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2671) Convert Allocation releases field to single release field
Craig Condit created YUNIKORN-2671: -- Summary: Convert Allocation releases field to single release field Key: YUNIKORN-2671 URL: https://issues.apache.org/jira/browse/YUNIKORN-2671 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Craig Condit Assignee: Craig Condit Now that repeats are no longer allowed, we have no need to track multiple releases for an allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2664) Running YuniKorn as leader-elected controller with multiple replicas
[ https://issues.apache.org/jira/browse/YUNIKORN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2664. -- Resolution: Won't Do > Running YuniKorn as leader-elected controller with multiple replicas > > > Key: YUNIKORN-2664 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2664 > Project: Apache YuniKorn > Issue Type: Wish > Components: shim - kubernetes >Reporter: Volodymyr Kot >Priority: Major > > Hey, I noticed that by default YuniKorn is run as a Deployment with a single > replica: > [https://github.com/apache/yunikorn-release/blob/aa9a2939eed81fc74fbbf7afbc0fe60c5aa0acd0/helm-charts/yunikorn/templates/deployment.yaml#L31] > and leader election is disabled in scheduler configuration: > [https://github.com/apache/yunikorn-k8shim/blob/36111c41d97658e168e640c284fe8d71921883b4/conf/scheduler-config.yaml#L20] > > Is there anything about the architecture of YuniKorn that makes this > hard/impossible to do? Or would you be open to a PR that adds ability to run > with multiple replicas and leader election? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2641) Ensure createTime has same semantics for ask and allocation
[ https://issues.apache.org/jira/browse/YUNIKORN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2641. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Ensure createTime has same semantics for ask and allocation > --- > > Key: YUNIKORN-2641 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2641 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > The createTime field in Allocation and AllocationAsk are not used > consistently. Ensure that the field is always set, and that it is not > modified later. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2641) Ensure createTime has same semantics for ask and allocation
Craig Condit created YUNIKORN-2641: -- Summary: Ensure createTime has same semantics for ask and allocation Key: YUNIKORN-2641 URL: https://issues.apache.org/jira/browse/YUNIKORN-2641 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Craig Condit Assignee: Craig Condit The createTime field in Allocation and AllocationAsk are not used consistently. Ensure that the field is always set, and that it is not modified later. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-802) Supports to assign nodes to non-default partition
[ https://issues.apache.org/jira/browse/YUNIKORN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-802. - > Supports to assign nodes to non-default partition > - > > Key: YUNIKORN-802 > URL: https://issues.apache.org/jira/browse/YUNIKORN-802 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Major > Labels: pull-request-available > > see comment > (https://issues.apache.org/jira/browse/YUNIKORN-22?focusedCommentId=17398860=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17398860) > Currently, all nodes are hardcode to be assigned to "default" partition. That > brings two disadvantages. > # we can't select specify nodes, which are used to execute spark job only, > from a cluster > # multi-partitions does not work since non-default partition can't get nodes > Future works: > # support to change partition assignment of existent node (in this PR, the > update request will be skipped) > # support to remove existent node which had been reassigned (in this PR, > removing such node cause error message "Failed to update non existing node > ...") -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-22) k8shim is hardcoded to the default partition
[ https://issues.apache.org/jira/browse/YUNIKORN-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-22. > k8shim is hardcoded to the default partition > > > Key: YUNIKORN-22 > URL: https://issues.apache.org/jira/browse/YUNIKORN-22 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Rainie Li >Priority: Major > > In the application and node code the partition is hardcoded to use the > DefaultPartition constant when creating new objects: > * application.NewApplication > * schedulerNode.addExistingAllocation > This means that in the configuration for the core we must have that same > partition and that we currently would not be able to create a second shim for > the same core as they would interfere with each other. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2593) Remove partition from Allocation/AllocationAsk
[ https://issues.apache.org/jira/browse/YUNIKORN-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2593. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Remove partition from Allocation/AllocationAsk > -- > > Key: YUNIKORN-2593 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2593 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Remove the partitionName field from the Allocation and AllocationAsk objects. > Its use was inconsistent, and can be retrieved from other contexts where > needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2610) Announce deprecation of plugin mode
[ https://issues.apache.org/jira/browse/YUNIKORN-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2610. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Announce deprecation of plugin mode > --- > > Key: YUNIKORN-2610 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2610 > Project: Apache YuniKorn > Issue Type: Task > Components: documentation >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available, release-notes > Fix For: 1.6.0 > > > As discussed on the mailing lists and community meetings, the plan is to > deprecate the yunikorn plugin mode along the following schedule: > * {*}YuniKorn 1.6{*}: Deprecation announced > * {*}YuniKorn 1.7{*}: Scheduler will emit warnings if plugin mode is in use > * {*}YuniKorn 1.8{*}: YuniKorn will no longer ship plugin mode binaries > * {*}YuniKorn 1.9{*}: Implementation removed from codebase > As a first step, for 1.6 we need to update the documentation to give notice > of the deprecation timeline. This will ensure that users have adequate notice > to move away from plugin mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2610) Announce deprecation of plugin mode
Craig Condit created YUNIKORN-2610: -- Summary: Announce deprecation of plugin mode Key: YUNIKORN-2610 URL: https://issues.apache.org/jira/browse/YUNIKORN-2610 Project: Apache YuniKorn Issue Type: Task Components: documentation Reporter: Craig Condit Assignee: Craig Condit As discussed on the mailing lists and community meetings, the plan is to deprecate the yunikorn plugin mode along the following schedule: * {*}YuniKorn 1.6{*}: Deprecation announced * {*}YuniKorn 1.7{*}: Scheduler will emit warnings if plugin mode is in use * {*}YuniKorn 1.8{*}: YuniKorn will no longer ship plugin mode binaries * {*}YuniKorn 1.9{*}: Implementation removed from codebase As a first step, for 1.6 we need to update the documentation to give notice of the deprecation timeline. This will ensure that users have adequate notice to move away from plugin mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2588) Shim: Convert AllocationID to AllocationKey
[ https://issues.apache.org/jira/browse/YUNIKORN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2588. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Shim: Convert AllocationID to AllocationKey > --- > > Key: YUNIKORN-2588 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2588 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2594) Remove unused field AllocationAsk.execTimeout
[ https://issues.apache.org/jira/browse/YUNIKORN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2594. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Remove unused field AllocationAsk.execTimeout > - > > Key: YUNIKORN-2594 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2594 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > The AllocationAsk object contains an unused execTimeout field (it is set but > never used logically). It should be removed in preparation for merging > AllocationAsk and Allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2594) Core: Remove unused field AllocationAsk.execTimeout
Craig Condit created YUNIKORN-2594: -- Summary: Core: Remove unused field AllocationAsk.execTimeout Key: YUNIKORN-2594 URL: https://issues.apache.org/jira/browse/YUNIKORN-2594 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Craig Condit Assignee: Craig Condit The AllocationAsk object contains an unused execTimeout field (it is set but never used logically). It should be removed in preparation for merging AllocationAsk and Allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2593) Simplify partition name
Craig Condit created YUNIKORN-2593: -- Summary: Simplify partition name Key: YUNIKORN-2593 URL: https://issues.apache.org/jira/browse/YUNIKORN-2593 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Craig Condit Assignee: Craig Condit Currently, partition names are treated differently in different places within the core. Specifically, sometimes they are bare (i.e. "default") and other places they are composite (i.e. "[rm:123]default"). This is confusing and unnecessary. It also hampers efforts to merge the AllocationAsk and Allocation objects, as the semantics are different between them. Switch to using bare form ("default") everywhere instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2589) Web: Convert AllocationID to AllocationKey
[ https://issues.apache.org/jira/browse/YUNIKORN-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2589. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Web: Convert AllocationID to AllocationKey > -- > > Key: YUNIKORN-2589 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2589 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: webapp >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2587) Core: Convert AllocationID to AllocationKey
[ https://issues.apache.org/jira/browse/YUNIKORN-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2587. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Core: Convert AllocationID to AllocationKey > --- > > Key: YUNIKORN-2587 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2587 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2586) 3rd party license failure when GOROOT not set
[ https://issues.apache.org/jira/browse/YUNIKORN-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2586. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > 3rd party license failure when GOROOT not set > - > > Key: YUNIKORN-2586 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2586 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Running go-license fails (and therefore the rest of the build) when running > on go1.22.2 and GOROOT is not set. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2585) SI: Convert AllocationID to AllocationKey
[ https://issues.apache.org/jira/browse/YUNIKORN-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2585. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > SI: Convert AllocationID to AllocationKey > - > > Key: YUNIKORN-2585 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2585 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: scheduler-interface >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Convert all usage of AllocationID to AllocationKey since they are the same > thing now. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2588) Shim: Convert AllocationID to AllocationKey
Craig Condit created YUNIKORN-2588: -- Summary: Shim: Convert AllocationID to AllocationKey Key: YUNIKORN-2588 URL: https://issues.apache.org/jira/browse/YUNIKORN-2588 Project: Apache YuniKorn Issue Type: Sub-task Components: shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2587) Core: Convert AllocationID to AllocationKey
Craig Condit created YUNIKORN-2587: -- Summary: Core: Convert AllocationID to AllocationKey Key: YUNIKORN-2587 URL: https://issues.apache.org/jira/browse/YUNIKORN-2587 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2589) Web: Convert AllocationID to AllocationKey
Craig Condit created YUNIKORN-2589: -- Summary: Web: Convert AllocationID to AllocationKey Key: YUNIKORN-2589 URL: https://issues.apache.org/jira/browse/YUNIKORN-2589 Project: Apache YuniKorn Issue Type: Sub-task Components: webapp Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2586) 3rd party license failure when GOROOT not set
Craig Condit created YUNIKORN-2586: -- Summary: 3rd party license failure when GOROOT not set Key: YUNIKORN-2586 URL: https://issues.apache.org/jira/browse/YUNIKORN-2586 Project: Apache YuniKorn Issue Type: Bug Components: shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit Running go-license fails (and therefore the rest of the build) when running on go1.22.2 and GOROOT is not set. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2585) SI: Convert AllocationID to AllocationKey
Craig Condit created YUNIKORN-2585: -- Summary: SI: Convert AllocationID to AllocationKey Key: YUNIKORN-2585 URL: https://issues.apache.org/jira/browse/YUNIKORN-2585 Project: Apache YuniKorn Issue Type: Sub-task Components: scheduler-interface Reporter: Craig Condit Assignee: Craig Condit Convert all usage of AllocationID to AllocationKey since they are the same thing now. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2584) Shim: Remove references to MaxAllocations
[ https://issues.apache.org/jira/browse/YUNIKORN-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2584. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Shim: Remove references to MaxAllocations > - > > Key: YUNIKORN-2584 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2584 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2584) Shim: Remove references to MaxAllocations
Craig Condit created YUNIKORN-2584: -- Summary: Shim: Remove references to MaxAllocations Key: YUNIKORN-2584 URL: https://issues.apache.org/jira/browse/YUNIKORN-2584 Project: Apache YuniKorn Issue Type: Sub-task Components: shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2458) Remove ask repeats from AllocationAsk
[ https://issues.apache.org/jira/browse/YUNIKORN-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2458. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Remove ask repeats from AllocationAsk > - > > Key: YUNIKORN-2458 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2458 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Simplify ask and allocation handling by removing support for repeated > requests in a single ask. This is functionality that is not used by the shim. > By removing support for repeated asks, we also ensure that there is a 1:1 > relationship between ask and allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2574) totalPartitionResource should not be mutated with AddTo/SubFrom
[ https://issues.apache.org/jira/browse/YUNIKORN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2574. Fix Version/s: 1.6.0 1.5.1 Resolution: Fixed Merged to master and branch-1.5. > totalPartitionResource should not be mutated with AddTo/SubFrom > --- > > Key: YUNIKORN-2574 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2574 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.4.0, 1.5.0 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.1 > > > There is a potential data race in {{PartitionContext}}: the field > {{totalPartitionResource}} is mutated in place. The problem is that the > method {{GetTotalPartitionResource()}} does not clone it. > {noformat} > func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource { > pc.RLock() > defer pc.RUnlock() > return pc.totalPartitionResource > } > {noformat} > In general, we should prefer the immutable approach for variables like this, > just like in {{{}objects.Queue{}}}: > {noformat} > func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported > bool) error { > // check this queue: failure stops checks if the allocation is not part > of a node addition > newAllocated := resources.Add(sq.allocatedResource, alloc)< > New object > [ ... removed ... ] > sq.Lock() > defer sq.Unlock() > // all OK update this queue > sq.allocatedResource = newAllocated > sq.updateAllocatedResourceMetrics() > return nil > } > // incPendingResource increments pending resource of this queue and its > parents. > func (sq *Queue) incPendingResource(delta *resources.Resource) { > // update the parent > if sq.parent != nil { > sq.parent.incPendingResource(delta) > } > // update this queue > sq.Lock() > defer sq.Unlock() > sq.pending = resources.Add(sq.pending, delta) < New object > sq.updatePendingResourceMetrics() > } > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2521) Scheduler deadlock
[ https://issues.apache.org/jira/browse/YUNIKORN-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2521. Fix Version/s: 1.6.0 1.5.1 Target Version: 1.6.0, 1.5.1 Resolution: Fixed This was delivered as part of YUNIKORN-2544. > Scheduler deadlock > -- > > Key: YUNIKORN-2521 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2521 > Project: Apache YuniKorn > Issue Type: Bug >Affects Versions: 1.5.0 > Environment: Yunikorn: 1.5 > AWS EKS: v1.28.6-eks-508b6b3 >Reporter: Noah Yoshida >Assignee: Craig Condit >Priority: Critical > Fix For: 1.6.0, 1.5.1 > > Attachments: 0001-YUNIKORN-2539-core.patch, > 0002-YUNIKORN-2539-k8shim.patch, 4_4_goroutine-1.txt, 4_4_goroutine-2.txt, > 4_4_goroutine-3.txt, 4_4_goroutine-4.txt, 4_4_goroutine-5-state-dump.txt, > 4_4_profile001.png, 4_4_profile002.png, 4_4_profile003.png, > 4_4_scheduler-logs.txt, deadlock_2024-04-18.log, goroutine-4-3-1.out, > goroutine-4-3-2.out, goroutine-4-3-3.out, goroutine-4-3.out, > goroutine-4-5.out, goroutine-dump.txt, goroutine-while-blocking-2.out, > goroutine-while-blocking.out, logs-potential-deadlock-2.txt, > logs-potential-deadlock.txt, logs-splunk-ordered.txt, logs-splunk.txt, > profile001-4-5.gif, profile012.gif, profile013.gif, running-logs-2.txt, > running-logs.txt > > > Discussion on Yunikorn slack: > [https://yunikornworkspace.slack.com/archives/CLNUW68MU/p1711048995187179] > Occasionally, Yunikorn will deadlock and prevent any new pods from starting. > All pods stay in Pending. There are no error logs inside of the Yunikorn > scheduler indicating any issue. > Additionally, the pods all have the correct annotations / labels from the > admission service, so they are at least getting put into k8s correctly. > The issue was seen intermittently on Yunikorn version 1.5 in EKS, using > version `v1.28.6-eks-508b6b3`. > At least for me, we run about 25-50 nodes and 200-400 pods. Pods and nodes > are added and removed pretty frequently as we do ML workloads. > Attached is the goroutine dump. We were not able to get a statedump as the > endpoint kept timing out. > You can fix it by restarting the Yunikorn scheduler pod. Sometimes you also > have to delete any "Pending" pods that got stuck while the scheduler was > deadlocked as well, for them to get picked up by the new scheduler pod. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2579) SI: Remove maxAllocations field from AllocationAsk
[ https://issues.apache.org/jira/browse/YUNIKORN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2579. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > SI: Remove maxAllocations field from AllocationAsk > -- > > Key: YUNIKORN-2579 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2579 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: scheduler-interface >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Now that maxAllocations != 1 is no longer supported, we need to remove the > maxAllocationsField from the AllocationAsk in the scheduler interface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2579) SI: Remove maxAllocations field from AllocationAsk
Craig Condit created YUNIKORN-2579: -- Summary: SI: Remove maxAllocations field from AllocationAsk Key: YUNIKORN-2579 URL: https://issues.apache.org/jira/browse/YUNIKORN-2579 Project: Apache YuniKorn Issue Type: Sub-task Components: scheduler-interface Reporter: Craig Condit Assignee: Craig Condit Now that maxAllocations != 1 is no longer supported, we need to remove the maxAllocationsField from the AllocationAsk in the scheduler interface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2539) Add optional deadlock detection
[ https://issues.apache.org/jira/browse/YUNIKORN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2539. Resolution: Fixed > Add optional deadlock detection > --- > > Key: YUNIKORN-2539 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2539 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler, shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.1 > > > We make heavy use of sync.Mutex and sync.RWMutex in our code. Unfortunately, > while these are very performant, they can lead to difficult-to-diagnose > deadlocks. > If we substitute our own locking routines, we can optionally enable deadlock > detection. See [https://github.com/sasha-s/go-deadlock] for a possible > solution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2539) Add optional deadlock detection
Craig Condit created YUNIKORN-2539: -- Summary: Add optional deadlock detection Key: YUNIKORN-2539 URL: https://issues.apache.org/jira/browse/YUNIKORN-2539 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler, shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit We make heavy use of sync.Mutex and sync.RWMutex in our code. Unfortunately, while these are very performant, they can lead to difficult-to-diagnose deadlocks. If we substitute our own locking routines, we can optionally enable deadlock detection. See [https://github.com/sasha-s/go-deadlock] for a possible solution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2534) [Yunikorn] Quota enforcement checks are failing when we have max-application set to 0
[ https://issues.apache.org/jira/browse/YUNIKORN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2534. -- > [Yunikorn] Quota enforcement checks are failing when we have max-application > set to 0 > - > > Key: YUNIKORN-2534 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2534 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Rajesh Kanhaiya Lal >Priority: Major > Attachments: yunikorn-configs-fresh.yaml > > > The Max-application checks are not working when we are setting > max-application to 0 in the yunikorn-config file. > The Config validation is also ignored in case of max-application is set to 0, > for example, the child max-application should be less or equal to the parent > queue is also not working when we have the max-application set to 0. > Attached Yunikorn Config file > User and Group tracking API also does not log max-application in the response. > > {code:java} > curl --location 'http://127.0.0.1:9080/ws/v1/partition/default/usage/users' > [ > { > "userName": "nobody", > "groups": { > "ts333w3": "*", > "ts433": "*", > "ts544": "*", > "ts633": "*" > }, > "queues": { > "queuePath": "root", > "resourceUsage": { > "Resources": { > "memory": 3, > "pods": 3, > "vcore": 300 > } > }, > "runningApplications": [ > "ts333w3", > "ts433", > "ts544" > ], > "children": [ > { > "queuePath": "root.default", > "resourceUsage": { > "Resources": { > "memory": 3, > "pods": 3, > "vcore": 300 > } > }, > "runningApplications": [ > "ts333w3", > "ts433", > "ts544" > ] > } > ] > } > } > ] {code} > Could You please take a look ? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2534) [Yunikorn] Quota enforcement checks are failing when we have max-application set to 0
[ https://issues.apache.org/jira/browse/YUNIKORN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2534. Assignee: (was: Manikandan R) Resolution: Not A Bug This is not a bug. A value of zero is indistinguishable from unset, and we explicitly treat it the same. > [Yunikorn] Quota enforcement checks are failing when we have max-application > set to 0 > - > > Key: YUNIKORN-2534 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2534 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Rajesh Kanhaiya Lal >Priority: Major > Attachments: yunikorn-configs-fresh.yaml > > > The Max-application checks are not working when we are setting > max-application to 0 in the yunikorn-config file. > The Config validation is also ignored in case of max-application is set to 0, > for example, the child max-application should be less or equal to the parent > queue is also not working when we have the max-application set to 0. > Attached Yunikorn Config file > User and Group tracking API also does not log max-application in the response. > > {code:java} > curl --location 'http://127.0.0.1:9080/ws/v1/partition/default/usage/users' > [ > { > "userName": "nobody", > "groups": { > "ts333w3": "*", > "ts433": "*", > "ts544": "*", > "ts633": "*" > }, > "queues": { > "queuePath": "root", > "resourceUsage": { > "Resources": { > "memory": 3, > "pods": 3, > "vcore": 300 > } > }, > "runningApplications": [ > "ts333w3", > "ts433", > "ts544" > ], > "children": [ > { > "queuePath": "root.default", > "resourceUsage": { > "Resources": { > "memory": 3, > "pods": 3, > "vcore": 300 > } > }, > "runningApplications": [ > "ts333w3", > "ts433", > "ts544" > ] > } > ] > } > } > ] {code} > Could You please take a look ? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2532) Resource usage report has an incompatible format change
[ https://issues.apache.org/jira/browse/YUNIKORN-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit closed YUNIKORN-2532. -- > Resource usage report has an incompatible format change > --- > > Key: YUNIKORN-2532 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2532 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Major > > There is some recent change that caused the application resource usage report > to have a new format: > Prior the change, the format was: > {code:java} > YK_APP_SUMMARY: {"appID": "adf53ee0-experiment-organicad-94520240-1-1", > "submissionTime": 1712169262131, "startTime": 1712169264134, "finishTime": > 1712173619983, "user": > "system:serviceaccount:spark-operator-02:spark-operator", "queue": > "root.queue-large", "state": "Completed", "rmID": "test-cluster", > "resourceUsage": > {"abc":{"memory":139178200478515200,"pods":1729129,"vcore":5183062000},"def":{"memory":113789789798400,"pods":1413,"vcore":4239000}}, > "preemptedResource": {}} > {code} > with the change, the new format is: > {code:java} > 2024-04-04T00:33:08.532Z INFOcore.scheduler.application.usage > objects/application_summary.go:60 YK_APP_SUMMARY: {ApplicationID: > afa303d0-test-trino-sparksql--20240404-2-1, SubmissionTime: 1712190615461, > StartTime: 1712190617496, FinishTime: 1712190788532, User: > system:serviceaccount:spark-operator-01:spark-operator, Queue: > root.queue-large, State: Completed, RmID: test-cluster, ResourceUsage: > TrackedResource{UNKNOWN:pods=177,UNKNOWN:vcore=354000,UNKNOWN:memory=1431454089216}, > PreemptedResource: TrackedResource{}, PlaceholderResource: > TrackedResource{}}{code} > There are several incompatibilities: > 1. the class name TrackedResource was not there before, now it is. > 2. the instance type was outside the resource part before, not it's embedded > 3. the instance type was reported correctly before the change, now it's > UNKNOWN > #3 may be a different issue, but it's observed by us at the same time. > I think what should change the format back to the original one, as this is an > incompatible change. What do you think [~wilfreds] , [~pbacsko] ,[~ccondit] ? > Thanks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2532) Resource usage report has an incompatible format change
[ https://issues.apache.org/jira/browse/YUNIKORN-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2532. Resolution: Not A Bug > Resource usage report has an incompatible format change > --- > > Key: YUNIKORN-2532 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2532 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Major > > There is some recent change that caused the application resource usage report > to have a new format: > Prior the change, the format was: > {code:java} > YK_APP_SUMMARY: {"appID": "adf53ee0-experiment-organicad-94520240-1-1", > "submissionTime": 1712169262131, "startTime": 1712169264134, "finishTime": > 1712173619983, "user": > "system:serviceaccount:spark-operator-02:spark-operator", "queue": > "root.queue-large", "state": "Completed", "rmID": "test-cluster", > "resourceUsage": > {"abc":{"memory":139178200478515200,"pods":1729129,"vcore":5183062000},"def":{"memory":113789789798400,"pods":1413,"vcore":4239000}}, > "preemptedResource": {}} > {code} > with the change, the new format is: > {code:java} > 2024-04-04T00:33:08.532Z INFOcore.scheduler.application.usage > objects/application_summary.go:60 YK_APP_SUMMARY: {ApplicationID: > afa303d0-test-trino-sparksql--20240404-2-1, SubmissionTime: 1712190615461, > StartTime: 1712190617496, FinishTime: 1712190788532, User: > system:serviceaccount:spark-operator-01:spark-operator, Queue: > root.queue-large, State: Completed, RmID: test-cluster, ResourceUsage: > TrackedResource{UNKNOWN:pods=177,UNKNOWN:vcore=354000,UNKNOWN:memory=1431454089216}, > PreemptedResource: TrackedResource{}, PlaceholderResource: > TrackedResource{}}{code} > There are several incompatibilities: > 1. the class name TrackedResource was not there before, now it is. > 2. the instance type was outside the resource part before, not it's embedded > 3. the instance type was reported correctly before the change, now it's > UNKNOWN > #3 may be a different issue, but it's observed by us at the same time. > I think what should change the format back to the original one, as this is an > incompatible change. What do you think [~wilfreds] , [~pbacsko] ,[~ccondit] ? > Thanks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2529) Newly added nodes show 'ready:false' under node attributes
[ https://issues.apache.org/jira/browse/YUNIKORN-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2529. Fix Version/s: 1.6.0 Resolution: Implemented Resolving as this was fixed by YUNIKORN-2530. > Newly added nodes show 'ready:false' under node attributes > -- > > Key: YUNIKORN-2529 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2529 > Project: Apache YuniKorn > Issue Type: Bug > Components: webapp >Affects Versions: 1.5.0 >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Fix For: 1.6.0 > > > In the web UI, the attributes column for nodes shows 'ready:false' for newly > added nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2530) Remove unnecessary ready flag on node
[ https://issues.apache.org/jira/browse/YUNIKORN-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2530. Fix Version/s: 1.6.0 Resolution: Fixed Merged all PRs to master. > Remove unnecessary ready flag on node > - > > Key: YUNIKORN-2530 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2530 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler, scheduler-interface, shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > YuniKorn has had a "ready" flag for nodes for a long time, however this flag > is not set correctly and serves no purpose to the scheduler. In Kubernetes, > readiness is a far more complex concept anyway, and a single true/false value > is insufficient. Therefore, we should remove the ready flag to simplify the > interface. This will also fix a minor issue in the Web UI where ready:false > is shown for newly added nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2530) Remove unnecessary ready flag on node
Craig Condit created YUNIKORN-2530: -- Summary: Remove unnecessary ready flag on node Key: YUNIKORN-2530 URL: https://issues.apache.org/jira/browse/YUNIKORN-2530 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler, scheduler-interface, shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit YuniKorn has had a "ready" flag for nodes for a long time, however this flag is not set correctly and serves no purpose to the scheduler. In Kubernetes, readiness is a far more complex concept anyway, and a single true/false value is insufficient. Therefore, we should remove the ready flag to simplify the interface. This will also fix a minor issue in the Web UI where ready:false is shown for newly added nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2529) Newly added nodes show 'ready:false' under node attributes
Craig Condit created YUNIKORN-2529: -- Summary: Newly added nodes show 'ready:false' under node attributes Key: YUNIKORN-2529 URL: https://issues.apache.org/jira/browse/YUNIKORN-2529 Project: Apache YuniKorn Issue Type: Bug Components: webapp Affects Versions: 1.5.0 Reporter: Craig Condit Assignee: Craig Condit In the web UI, the attributes column for nodes shows 'ready:false' for newly added nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2440) [UMBRELLA] Remove stateaware scheduling
[ https://issues.apache.org/jira/browse/YUNIKORN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2440. Fix Version/s: 1.6.0 Resolution: Fixed Resolving as all subtasks are complete. > [UMBRELLA] Remove stateaware scheduling > --- > > Key: YUNIKORN-2440 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2440 > Project: Apache YuniKorn > Issue Type: Task > Components: core - scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Craig Condit >Priority: Major > Labels: release-notes > Fix For: 1.6.0 > > > Umbrella jira to track all the work to remove state ware scheduling: > * remove scheduling code > * remove documentation > * remove configuration options > * document way to achieve similar behaviour (FIFO with max applications) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2509) Update documentation to remove state-aware scheduling
[ https://issues.apache.org/jira/browse/YUNIKORN-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2509. Fix Version/s: 1.6.0 Resolution: Fixed > Update documentation to remove state-aware scheduling > - > > Key: YUNIKORN-2509 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2509 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: documentation >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Remove stateaware scheduling from documentation, including references to > starting state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2508) Remove APP_STARTING references from shim
[ https://issues.apache.org/jira/browse/YUNIKORN-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2508. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Remove APP_STARTING references from shim > > > Key: YUNIKORN-2508 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2508 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Now that APP_STARTING is gone, we need to update some references in the shim > to remove usages of it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2509) Update documentation to remove state-aware scheduling
Craig Condit created YUNIKORN-2509: -- Summary: Update documentation to remove state-aware scheduling Key: YUNIKORN-2509 URL: https://issues.apache.org/jira/browse/YUNIKORN-2509 Project: Apache YuniKorn Issue Type: Sub-task Components: documentation Reporter: Craig Condit Assignee: Craig Condit Remove stateaware scheduling from documentation, including references to starting state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2508) Remove APP_STARTING references from shim
Craig Condit created YUNIKORN-2508: -- Summary: Remove APP_STARTING references from shim Key: YUNIKORN-2508 URL: https://issues.apache.org/jira/browse/YUNIKORN-2508 Project: Apache YuniKorn Issue Type: Sub-task Components: shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit Now that APP_STARTING is gone, we need to update some references in the shim to remove usages of it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2380) [UMBRELLA] YuniKorn 1.5.0 release efforts
[ https://issues.apache.org/jira/browse/YUNIKORN-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2380. Fix Version/s: 1.5.0 Resolution: Fixed Resolving as release is complete. > [UMBRELLA] YuniKorn 1.5.0 release efforts > - > > Key: YUNIKORN-2380 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2380 > Project: Apache YuniKorn > Issue Type: Task > Components: release >Reporter: Wilfred Spiegelenburg >Assignee: TingYao Huang >Priority: Blocker > Fix For: 1.5.0 > > > This umbrella is to track the work items needed for 1.5.0 release. > Release manager: TBD > Multiple new features, enhancements and bug fixes are covered. Please see > [https://issues.apache.org/jira/issues/?jql=project%20%3D%20YUNIKORN%20AND%20"Target%20Version"%20%3D%201.5.0%20ORDER%20BY%20status%20ASC|https://issues.apache.org/jira/issues/?jql=project%20%3D%20YUNIKORN%20AND%20%22Target%20Version%22%20%3D%201.5.0%20ORDER%20BY%20status%20ASC] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2495) Remove App "Starting" state
[ https://issues.apache.org/jira/browse/YUNIKORN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2495. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Remove App "Starting" state > --- > > Key: YUNIKORN-2495 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2495 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > App Starting state is in use for while. Though it has ben introduced mainly > as part of state aware app scheduling, all related code could be assessed and > removed if it is no longer needed anywhere. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2419) [UMBRELLA] Generate reproducible binaries
[ https://issues.apache.org/jira/browse/YUNIKORN-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2419. Fix Version/s: 1.6.0 Resolution: Fixed Resolving as all subtasks are now complete. > [UMBRELLA] Generate reproducible binaries > - > > Key: YUNIKORN-2419 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2419 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes, webapp >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: release-notes > Fix For: 1.6.0 > > > Currently, the binaries we build for YuniKorn differ from one build to the > next. We should attempt to standardize our build output so that independently > built binaries from the same source code can be validated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2487) [Release] Force REPRODUCIBLE_BUILDS=1 on release
[ https://issues.apache.org/jira/browse/YUNIKORN-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2487. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > [Release] Force REPRODUCIBLE_BUILDS=1 on release > > > Key: YUNIKORN-2487 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2487 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > With the updated REPRODUCIBLE_BUILDS logic in k8shim/web, need to pass this > variable into the scripts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2480) Convert yunikorn-web build to use pnpm
[ https://issues.apache.org/jira/browse/YUNIKORN-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2480. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Convert yunikorn-web build to use pnpm > -- > > Key: YUNIKORN-2480 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2480 > Project: Apache YuniKorn > Issue Type: Improvement > Components: webapp >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Our current yunikorn-web build is driven by yarn v1, which is very outdated > and slow. We should switch to using pnpm instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2481) Convert yunikorn-site build to use pnpm
[ https://issues.apache.org/jira/browse/YUNIKORN-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2481. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > Convert yunikorn-site build to use pnpm > --- > > Key: YUNIKORN-2481 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2481 > Project: Apache YuniKorn > Issue Type: Improvement > Components: website >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Our current yunikorn-site build is driven by yarn v1, which is very outdated > and slow. We should switch to using pnpm instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2486) [Web] Use docker to build reproducible binaries
[ https://issues.apache.org/jira/browse/YUNIKORN-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2486. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > [Web] Use docker to build reproducible binaries > --- > > Key: YUNIKORN-2486 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2486 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: website >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2485) [Shim] Use docker to build reproducible binaries
[ https://issues.apache.org/jira/browse/YUNIKORN-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2485. Fix Version/s: 1.6.0 Resolution: Fixed Merged to master. > [Shim] Use docker to build reproducible binaries > > > Key: YUNIKORN-2485 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2485 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > The current build system (even for reproducible builds) results in > differences between environments. To eliminate these differences, we should > build in a docker container when in reproducible build mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2488) SI: Remove stateaware constants
Craig Condit created YUNIKORN-2488: -- Summary: SI: Remove stateaware constants Key: YUNIKORN-2488 URL: https://issues.apache.org/jira/browse/YUNIKORN-2488 Project: Apache YuniKorn Issue Type: Sub-task Components: scheduler-interface Reporter: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2487) [Release] Force REPRODUCIBLE_BUILDS=1 on release
Craig Condit created YUNIKORN-2487: -- Summary: [Release] Force REPRODUCIBLE_BUILDS=1 on release Key: YUNIKORN-2487 URL: https://issues.apache.org/jira/browse/YUNIKORN-2487 Project: Apache YuniKorn Issue Type: Sub-task Components: release Reporter: Craig Condit Assignee: Craig Condit With the updated REPRODUCIBLE_BUILDS logic in k8shim/web, need to pass this variable into the scripts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2486) [Web] Use docker to build reproducible binaries
Craig Condit created YUNIKORN-2486: -- Summary: [Web] Use docker to build reproducible binaries Key: YUNIKORN-2486 URL: https://issues.apache.org/jira/browse/YUNIKORN-2486 Project: Apache YuniKorn Issue Type: Sub-task Components: website Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2485) [Shim] Use sanitized docker env to build reproducible binaries
Craig Condit created YUNIKORN-2485: -- Summary: [Shim] Use sanitized docker env to build reproducible binaries Key: YUNIKORN-2485 URL: https://issues.apache.org/jira/browse/YUNIKORN-2485 Project: Apache YuniKorn Issue Type: Sub-task Components: shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit The current build system (even for reproducible builds) results in differences between environments. To eliminate these differences, we should build in a docker container when in reproducible build mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2484) Shim: Remove stateaware logic
Craig Condit created YUNIKORN-2484: -- Summary: Shim: Remove stateaware logic Key: YUNIKORN-2484 URL: https://issues.apache.org/jira/browse/YUNIKORN-2484 Project: Apache YuniKorn Issue Type: Sub-task Components: shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2483) Remove stateaware scheduling logic
Craig Condit created YUNIKORN-2483: -- Summary: Remove stateaware scheduling logic Key: YUNIKORN-2483 URL: https://issues.apache.org/jira/browse/YUNIKORN-2483 Project: Apache YuniKorn Issue Type: Sub-task Components: core - scheduler Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2383) Branching and tagging for 1.5
[ https://issues.apache.org/jira/browse/YUNIKORN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2383. Fix Version/s: 1.5.0 Resolution: Fixed All tasks complete, resolving. > Branching and tagging for 1.5 > - > > Key: YUNIKORN-2383 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2383 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Wilfred Spiegelenburg >Assignee: TingYao Huang >Priority: Major > Fix For: 1.5.0 > > > branching & tagging for updating dependencies (SI/core/k8shim) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2384) Release notes for 1.5.0
[ https://issues.apache.org/jira/browse/YUNIKORN-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2384. Fix Version/s: 1.5.0 Resolution: Fixed Merged to master. > Release notes for 1.5.0 > --- > > Key: YUNIKORN-2384 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2384 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: documentation >Reporter: Wilfred Spiegelenburg >Assignee: Craig Condit >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.0 > > > Jiras have been tagged with release-notes for this version. These jiras need > to be added with a special mention in the release notes. > [https://issues.apache.org/jira/issues/?filter=12352474] > This Jira might require multiple people to help write the release notes for > the specific jiras mentioned. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2481) Convert yunikorn-site build to use pnpm
Craig Condit created YUNIKORN-2481: -- Summary: Convert yunikorn-site build to use pnpm Key: YUNIKORN-2481 URL: https://issues.apache.org/jira/browse/YUNIKORN-2481 Project: Apache YuniKorn Issue Type: Improvement Components: website Reporter: Craig Condit Assignee: Craig Condit Our current yunikorn-site build is driven by yarn v1, which is very outdated and slow. We should switch to using pnpm instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2480) Convert yunikorn-web build to use pnpm
Craig Condit created YUNIKORN-2480: -- Summary: Convert yunikorn-web build to use pnpm Key: YUNIKORN-2480 URL: https://issues.apache.org/jira/browse/YUNIKORN-2480 Project: Apache YuniKorn Issue Type: Improvement Components: webapp Reporter: Craig Condit Assignee: Craig Condit Our current yunikorn-web build is driven by yarn v1, which is very outdated and slow. We should switch to using pnpm instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2469) Upgrade google.golang.org/protobuf to v1.33.0
[ https://issues.apache.org/jira/browse/YUNIKORN-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2469. Fix Version/s: 1.5.0 Resolution: Fixed Merged all PRs to master and cherry-picked to branch-1.5. > Upgrade google.golang.org/protobuf to v1.33.0 > - > > Key: YUNIKORN-2469 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2469 > Project: Apache YuniKorn > Issue Type: Task > Components: core - common, release, shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2468) Remove language around reproducible builds from README
[ https://issues.apache.org/jira/browse/YUNIKORN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2468. Fix Version/s: 1.5.0 Resolution: Fixed > Remove language around reproducible builds from README > -- > > Key: YUNIKORN-2468 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2468 > Project: Apache YuniKorn > Issue Type: Task > Components: release >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > > The reproducible builds feature is currently not functioning properly in the > 1.5.0 release. We should remove references to it from the README.md file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2467) Remove AllocationAsk from the core when a pod is completed
[ https://issues.apache.org/jira/browse/YUNIKORN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YUNIKORN-2467. Fix Version/s: 1.5.0 Resolution: Fixed Merged to master and cherry-picked to branch-1.5.0. > Remove AllocationAsk from the core when a pod is completed > -- > > Key: YUNIKORN-2467 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2467 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Blocker > Labels: pull-request-available > Fix For: 1.5.0 > > > A new issue was discovered while fixing YUNIKORN-2465. This also results in > growing memory usage in case of long running applications. > When a pod reaches a terminal state (Success / Failed), we send an update > request from the shim to the core ({{Task.releaseAllocation()}}). However, we > only discard the allocation itself and we don't do anything about the ask. It > is kept inside the Application object until it becomes Completed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Reopened] (YUNIKORN-2419) [UMBRELLA] Generate reproducible binaries
[ https://issues.apache.org/jira/browse/YUNIKORN-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit reopened YUNIKORN-2419: > [UMBRELLA] Generate reproducible binaries > - > > Key: YUNIKORN-2419 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2419 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes, webapp >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: release-notes > Fix For: 1.5.0 > > > Currently, the binaries we build for YuniKorn differ from one build to the > next. We should attempt to standardize our build output so that independently > built binaries from the same source code can be validated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2469) Upgrade google.golang.org/protobuf to v1.33.0
Craig Condit created YUNIKORN-2469: -- Summary: Upgrade google.golang.org/protobuf to v1.33.0 Key: YUNIKORN-2469 URL: https://issues.apache.org/jira/browse/YUNIKORN-2469 Project: Apache YuniKorn Issue Type: Task Components: core - common, release, scheduler-interface, shim - kubernetes Reporter: Craig Condit Assignee: Craig Condit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2468) Remove language around reproducible builds from README
Craig Condit created YUNIKORN-2468: -- Summary: Remove language around reproducible builds from README Key: YUNIKORN-2468 URL: https://issues.apache.org/jira/browse/YUNIKORN-2468 Project: Apache YuniKorn Issue Type: Task Components: release Reporter: Craig Condit Assignee: Craig Condit The reproducible builds feature is currently not functioning properly in the 1.5.0 release. We should remove references to it from the README.md file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org