[jira] [Created] (YUNIKORN-2780) Remove unnecessary node ExistingAllocations handling

2024-07-30 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2780:
--

 Summary: Remove unnecessary node ExistingAllocations handling
 Key: YUNIKORN-2780
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2780
 Project: Apache YuniKorn
  Issue Type: Task
  Components: core - scheduler, scheduler-interface, shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit


As part of state initialization simplification, existing node allocations are 
no longer passed in the UpdateNode SI function. We should remove the field and 
the logic in the core as this is effectively now dead code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2779) Shim: Use UpdateAllocation for both asks and allocations

2024-07-30 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2779:
--

 Summary: Shim: Use UpdateAllocation for both asks and allocations
 Key: YUNIKORN-2779
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2779
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2778) Core: Use unified UpdateAllocation API for both asks and allocations

2024-07-30 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2778:
--

 Summary: Core: Use unified UpdateAllocation API for both asks and 
allocations
 Key: YUNIKORN-2778
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2778
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2760) `make tools` should check the version of tools

2024-07-30 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2760.

 Fix Version/s: 1.6.0
Target Version: 1.6.0
Resolution: Fixed

Merged to master. Thanks [~blue.tzuhua] for the contribution!

> `make tools` should check the version of tools
> --
>
> Key: YUNIKORN-2760
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2760
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Makefile, by default, checks only the existence of file. Hence, developers 
> need to remove tools folder (or call `make distclean`) manually to trigger 
> the installation after we update the version of tools.
> However, how developers can be aware of the tools updates? Personally, I 
> smell fishy from the error of warning, but that could be implicit and noisy 
> :cry
> In order to fix that, I'd like to introduce the new folder structure to tools 
> folder: 
> {code:java}
> /tools/{tool_name}-{version}
> {code}
>  That offers a unique path to each version of tool. Developers will not miss 
> the updates anymore.
> *rejected proposal*
> {code:java}
> /tools/{tool_name}/{version}
> {code}
>  That offers a unique path to each version of tool. Developers will not miss 
> the updates anymore.
> NOTED: we need to remove the existent tool binary if there is naming conflict 
> in creating the new path. For example, creating /tools/golangci-lint/1.57.2 
> will fail if /tools/golangci-lint is a existent file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2459) Core: Merge ask and allocation objects

2024-07-30 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2459.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Core: Merge ask and allocation objects
> --
>
> Key: YUNIKORN-2459
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2459
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Merge the Ask and Allocation objects into a single Allocation object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2771) Optimization: Use termination grace period of 0 seconds for placeholder pods

2024-07-26 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2771.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Optimization: Use termination grace period of 0 seconds for placeholder pods
> 
>
> Key: YUNIKORN-2771
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2771
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> When we create placeholder pods for gang scheduling, we do not specify a 
> termination grace period, and therefore inherit the Kubernetes default of 30 
> seconds. This is unnecessary as the placeholders do not perform any logic and 
> therefore require no graceful termination.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2739) Core: Discuss removal of TODO regarding reflection

2024-07-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2739.
--
Resolution: Won't Do

Closing as no further action is required.

> Core: Discuss removal of TODO regarding reflection
> --
>
> Key: YUNIKORN-2739
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2739
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chenchen Lai
>Priority: Minor
>  Labels: newbie
>
> The current Jira is intended to replace all "TODO" comments, which will be 
> removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose 
> of the Jira is to discuss whether the tasks described by these TODO comments 
> are worth executing.
> The file link is as follows:
> [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/common/security/usergroup.go#L74]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2740) Core: Discuss removal of TODO regarding configurable reservation delay

2024-07-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2740.
--
Resolution: Won't Do

Closing as no further work is required.

> Core: Discuss removal of TODO regarding configurable reservation delay
> --
>
> Key: YUNIKORN-2740
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2740
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chenchen Lai
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Minor
>  Labels: newbie
>
> The current Jira is intended to replace all "TODO" comments, which will be 
> removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose 
> of the Jira is to discuss whether the tasks described by these TODO comments 
> are worth executing.
> The file link is as follows:
> [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/objects/application.go#L1448]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2742) Core: Discuss TODO regarding getting resolver from the config

2024-07-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2742.
--
Resolution: Won't Do

Closing as no action is required.

> Core: Discuss TODO regarding getting resolver from the config
> -
>
> Key: YUNIKORN-2742
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2742
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chenchen Lai
>Priority: Minor
>  Labels: newbie
>
> The current Jira is intended to replace all "TODO" comments, which will be 
> removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose 
> of the Jira is to discuss whether the tasks described by these TODO comments 
> are worth executing.
> The file link is as follows:
> [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/partition.go#L130]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2741) Core: Discuss removal of TODO regarding add mock for plugin to extend tests

2024-07-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2741.
--
Resolution: Won't Do

Closing as no further action is required.

> Core: Discuss removal of TODO regarding add mock for plugin to extend tests
> ---
>
> Key: YUNIKORN-2741
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2741
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chenchen Lai
>Priority: Minor
>  Labels: newbie
>
> The current Jira is intended to replace all "TODO" comments, which will be 
> removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose 
> of the Jira is to discuss whether the tasks described by these TODO comments 
> are worth executing.
> The file link is as follows:
> [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/objects/node_test.go#L111]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2743) Core: Consider time out waiting for draining and removal

2024-07-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2743.
--
Resolution: Won't Do

Closing as Won't Do since this isn't required.

> Core: Consider time out waiting for draining and removal
> 
>
> Key: YUNIKORN-2743
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2743
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chenchen Lai
>Priority: Minor
>  Labels: newbie
>
> The current Jira is intended to replace all "TODO" comments, which will be 
> removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose 
> of the Jira is to discuss whether the tasks described by these TODO comments 
> are worth executing.
> The file link is as follows:
> [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/partition_manager.go#L126]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2744) Core: Discuss making web server port configurable

2024-07-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2744.
--
Resolution: Won't Do

Closing as Won't Do since this is not needed.

> Core: Discuss making web server port configurable
> -
>
> Key: YUNIKORN-2744
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2744
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chenchen Lai
>Priority: Minor
>  Labels: newbie
>
> The current Jira is intended to replace all "TODO" comments, which will be 
> removed by [https://github.com/apache/yunikorn-core/issues/915]. The purpose 
> of the Jira is to discuss whether the tasks described by these TODO comments 
> are worth executing.
> The file link is as follows:
> [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/webservice/webservice.go#L65]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2771) Optimization: Use termination grace period of 0 seconds for placeholder pods

2024-07-25 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2771:
--

 Summary: Optimization: Use termination grace period of 0 seconds 
for placeholder pods
 Key: YUNIKORN-2771
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2771
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit


When we create placeholder pods for gang scheduling, we do not specify a 
termination grace period, and therefore inherit the Kubernetes default of 30 
seconds. This is unnecessary as the placeholders do not perform any logic and 
therefore require no graceful termination.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2755) yunikorn-web: pnpm version should be locked

2024-07-15 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2755.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> yunikorn-web: pnpm version should be locked
> ---
>
> Key: YUNIKORN-2755
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2755
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: webapp
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Now that we are using pnpm, we should lock the version that we are using to 
> prevent unexpected divergence of package.json and pnpm-lock.yaml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2755) yunikorn-web: pnpm version should be locked

2024-07-14 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2755:
--

 Summary: yunikorn-web: pnpm version should be locked
 Key: YUNIKORN-2755
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2755
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: webapp
Reporter: Craig Condit
Assignee: Craig Condit


Now that we are using pnpm, we should lock the version that we are using to 
prevent unexpected divergence of package.json and pnpm-lock.yaml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2230) Placement rule does not behave as expected

2024-07-11 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2230.

 Fix Version/s: 1.6.0
Target Version: 1.6.0, 1.5.2
Resolution: Delivered

Resolved via other issues.

> Placement rule does not behave as expected
> --
>
> Key: YUNIKORN-2230
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2230
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> yunikorn configmap
> {code:yaml}
> apiVersion: v1
> kind: ConfigMap
> metadata:
>   name: yunikorn-configs
>   namespace: yunikorn
> data:
>   log.level: "DEBUG"
>   admissionController.filtering.defaultQueue: ""
>   queues.yaml: |
> partitions:
>   - name: default
> placementrules:
>   - name: provided
> create: false
>   - name: tag
> value: namespace
> create: true
> queues:
> - name: root
>   submitacl: "*"
>   queues:
> - name: sandbox
>   submitacl: "*"
> {code}
> test pod
> {code:yaml}
> apiVersion: v1
> kind: Pod
> metadata:
>   labels:
> app: sleep
> applicationId: "application-sleep-0001"
>   name: task0
> spec:
>   schedulerName: yunikorn
>   restartPolicy: Never
>   containers:
> - name: sleep-30s
>   image: "alpine:latest"
>   command: ["sleep", "30"]
>   resources:
> requests:
>   cpu: "100m"
>   memory: "500M"
>  {code}
> Even though there is no queue name specified for the sleep pod, it's still 
> submitted to root.sandbox(Shim 's default queue value.) What we expected was 
> that it should submit the application through 'tag' placement rule.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2720) Use createRequest() in handlers_test.go

2024-07-11 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2720.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Use createRequest() in handlers_test.go
> ---
>
> Key: YUNIKORN-2720
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2720
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.6.0
>
>
> Use createRequest() helper methods where ever applicable in handlers_test.go. 
> handlers_test.go is huge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2721) Improve template funtion's test coverage

2024-07-11 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2721.

 Fix Version/s: 1.6.0
Target Version: 1.6.0
Resolution: Fixed

Merged to master.

> Improve template funtion's test coverage
> 
>
> Key: YUNIKORN-2721
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2721
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2722) Expose the IsOriginator flag in REST

2024-07-11 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2722.

Fix Version/s: 1.6.0
   Resolution: Fixed

Resolving, as this appears to have been merged already.

> Expose the IsOriginator flag in REST
> 
>
> Key: YUNIKORN-2722
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2722
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Yu-Lin Chen
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The first real pod for each application is marked as originator. And it’s 
> typically considered as driver/owner pod. This flag is propagated to core and 
> impacts the preemption decision flow.
>  
> However, the current REST API doesn’t expose the originator flag. Exposing 
> the flag will allow user to check which allocation is originator and will be 
> beneficial for monitoring and troubleshooting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2732) Improve allocation & queue_events funtion's test coverage

2024-07-11 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2732.

 Fix Version/s: 1.6.0
Target Version: 1.6.0
Resolution: Fixed

Merged to master.

> Improve allocation & queue_events funtion's test coverage
> -
>
> Key: YUNIKORN-2732
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2732
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - common
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2711) Skip setting the queue name to default queue in the shim

2024-07-11 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2711.

Fix Version/s: 1.6.0
   1.5.2
   Resolution: Fixed

Merged to master and branch-1.5.

> Skip setting the queue name to default queue in the shim
> 
>
> Key: YUNIKORN-2711
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2711
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> Admission controller and the scheduler currently checks the pod for the 
> supplied queue name. If the queue name is not provided, it sets the queue to 
> default queue 'root.default'
> After the changes from YUNIKORN-2703, we do not need to set the queue name on 
> the shim and the core should take care of setting the default queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2703) Core: Fallback to default queue if no placement rules match

2024-07-11 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2703.

Fix Version/s: 1.6.0
   1.5.2
   Resolution: Fixed

Merged to master and backported (manually) to branch-1.5.

> Core: Fallback to default queue if no placement rules match
> ---
>
> Key: YUNIKORN-2703
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2703
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> YUNIKORN-1650 added an override for default queue name in the config map to 
> solve for the scenario where the provided placement rule is evaluated before 
> other rules.
> Scheduler also adds a default queue if the pod labels or annotations does not 
> define a queue name. Because this happens before the placement rules are 
> evaluated, we end up in the same situation of applications getting placed in 
> the default queue and ignoring all other placement rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2700) Use AllocationResult instead of Allocation in scheduler routines

2024-06-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2700.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Use AllocationResult instead of Allocation in scheduler routines
> 
>
> Key: YUNIKORN-2700
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2700
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
> Fix For: 1.6.0
>
>
> The Allocation object is currently abused as a generic return type in various 
> scheduler routines. This is most notable when reserving / unreserving. 
> Instead of returning an Allocation object, wrap in an AllocationResult object 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2700) Use AllocationResult instead of Allocation in scheduler routines

2024-06-25 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2700:
--

 Summary: Use AllocationResult instead of Allocation in scheduler 
routines
 Key: YUNIKORN-2700
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2700
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Craig Condit
Assignee: Craig Condit


The Allocation object is currently abused as a generic return type in various 
scheduler routines. This is most notable when reserving / unreserving. Instead 
of returning an Allocation object, wrap in an AllocationResult object instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2698) E2e tests for k8shim don't compile with latest core

2024-06-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2698.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master. Also opened YUNIKORN-2699 to address e2e test failures in 
preemption.

> E2e tests for k8shim don't compile with latest core
> ---
>
> Key: YUNIKORN-2698
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2698
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2699) Preemption e2e tests fail in latest master

2024-06-25 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2699:
--

 Summary: Preemption e2e tests fail in latest master
 Key: YUNIKORN-2699
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2699
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Manikandan R


Output:

 
{noformat}
Preemption Verify_basic_preemption
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139
  STEP: Creating development namespace: dev-anvkm @ 06/25/24 18:08:14.291
  STEP: A queue uses resource more than the guaranteed value even after 
removing one of the pods. The cluster doesn't have enough resource to deploy a 
pod in another queue which uses resource less than the guaranteed value. @ 
06/25/24 18:08:15.301
  STEP: Update root.sandbox1 and root.sandbox2 with guaranteed memory 4677M @ 
06/25/24 18:08:15.301
  STEP: Port-forward the scheduler pod @ 06/25/24 18:08:15.302
port-forward is already running  STEP: Enabling new scheduling config @ 
06/25/24 18:08:15.302
  STEP: Deploy the sleep pod sleepjob1 to the development namespace @ 06/25/24 
18:08:18.313
  STEP: Deploy the sleep pod sleepjob2 to the development namespace @ 06/25/24 
18:08:22.518
  STEP: Deploy the sleep pod sleepjob3 to the development namespace @ 06/25/24 
18:08:26.517
  STEP: Deploy the sleep pod sleepjob4 to the development namespace @ 06/25/24 
18:08:30.518
  STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:08:38.517
  [FAILED] in [It] - 
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198
 @ 06/25/24 18:08:38.718
  Logging yk fullstatedump, spec: Verify_basic_preemption
  Created log file: 
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykFullStateDump.json
  Logging k8s cluster info, spec: Verify_basic_preemption
  Created log file: 
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_k8sClusterInfo.txt
  Logging yk container logs, spec: Verify_basic_preemption
  Created log file: 
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_basic_preemption_ykContainerLog.txt
  STEP: Tear down namespace: dev-anvkm @ 06/25/24 18:08:39.235
  STEP: Restoring YuniKorn configuration @ 06/25/24 18:08:40.118
  STEP: Restoring the old config maps @ 06/25/24 18:08:40.119
• [FAILED] [27.837 seconds]
Preemption [It] Verify_basic_preemption
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:139
  [FAILED] One of the pods in root.sandbox1 should be preempted
  Expected
      : 1
  to equal
      : 2
  In [It] at: 
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:198
 @ 06/25/24 18:08:38.718-- Preemption 
Verify_preemption_on_priority_queue
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:333
  STEP: Creating development namespace: dev-u0kt7 @ 06/25/24 18:10:24.975
  STEP: A task can only preempt a task with lower or equal priority @ 06/25/24 
18:10:25.982
  STEP: Update root.sandbox1, root.low-priority, root.high-priority with 
guaranteed memory 4677M @ 06/25/24 18:10:25.982
  STEP: Port-forward the scheduler pod @ 06/25/24 18:10:25.983
port-forward is already running  STEP: Enabling new scheduling config @ 
06/25/24 18:10:25.983
  STEP: Deploy the sleep pod sleepjob1 to the development namespace @ 06/25/24 
18:10:28.99
  STEP: Deploy the sleep pod sleepjob2 to the development namespace @ 06/25/24 
18:10:32.791
  STEP: Deploy the sleep pod sleepjob3 to the development namespace @ 06/25/24 
18:10:35.792
  STEP: Deploy the sleep pod sleepjob4 to the development namespace @ 06/25/24 
18:10:38.792
  STEP: Deploy the sleep pod sleepjob5 to the development namespace @ 06/25/24 
18:10:38.995
  STEP: The sleep pod sleepjob4 can't be scheduled @ 06/25/24 18:10:39.194
  STEP: The sleep pod sleepjob5 can be scheduled @ 06/25/24 18:10:41.392
  STEP: One of the pods in root.sanbox1 is preempted @ 06/25/24 18:10:46.392
  [FAILED] in [It] - 
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/test/e2e/preemption/preemption_test.go:424
 @ 06/25/24 18:10:46.592
  Logging yk fullstatedump, spec: Verify_preemption_on_priority_queue
  Created log file: 
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_preemption_on_priority_queue_ykFullStateDump.json
  Logging k8s cluster info, spec: Verify_preemption_on_priority_queue
  Created log file: 
/home/runner/work/yunikorn-k8shim/yunikorn-k8shim/build/e2e/preemption/Verify_preemption_on_priority_queue_k8sClusterInfo.txt
  Logging yk container logs, spec: Verify_preemption_on_priority_queue
  Created log file: 

[jira] [Created] (YUNIKORN-2698) E2e tests for k8shim don't compile with latest core

2024-06-25 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2698:
--

 Summary: E2e tests for k8shim don't compile with latest core
 Key: YUNIKORN-2698
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2698
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2677) Rename AllocationResult to AllocationResultType

2024-06-20 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2677.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Rename AllocationResult to AllocationResultType
> ---
>
> Key: YUNIKORN-2677
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2677
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> In preparation for other refactoring, rename the AllocationResult enum to 
> AllocationResultType.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2682) YuniKorn Gang Scheduling Issue: Executors Failing to Start When Running Multiple Applications

2024-06-18 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2682.
--
  Assignee: Craig Condit
Resolution: Workaround

> YuniKorn Gang Scheduling Issue: Executors Failing to Start When Running 
> Multiple Applications
> -
>
> Key: YUNIKORN-2682
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2682
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.3.0
>Reporter: huangzhir
>Assignee: Craig Condit
>Priority: Major
> Attachments: image-2024-06-19-00-02-53-178.png, 
> image-2024-06-19-00-03-09-703.png
>
>
> h2. Description:
> While using YuniKorn's gang scheduling, we encountered a situation where the 
> scheduling process appears to succeed, but in reality, there is a problem. 
> When submitting two applications simultaneously, only the driver pods are 
> successfully running, and the executor pods fail to start due to insufficient 
> resources. The following error is observed in the scheduler logs:
> {code:java}
> 2024-06-18T15:15:27.933Z ERROR cache/placeholder_manager.go:99 failed to 
> create placeholder pod {"error": "pods 
> \"tg-spark-driver-spark-8e410a4c5ce44da2aa85ba-0\" is forbidden: failed 
> quota: spark-quota: must specify limits.cpu,limits.memory"}
> github.com/apache/yunikorn-k8shim/pkg/cache.(*PlaceholderManager).createAppPlaceholders
>   github.com/apache/yunikorn-k8shim/pkg/cache/placeholder_manager.go:99
> github.com/apache/yunikorn-k8shim/pkg/cache.(*Application).onReserving.func1
>   github.com/apache/yunikorn-k8shim/pkg/cache/application.go:542 {code}
> h2. Environment:
>  * YuniKorn version: 1.3.0
>  * Kubernetes version: 1.21.3
>  * Spark version: 3.2.2
> h2. *resource-quota.yaml*
> {code:java}
> apiVersion: v1
> kind: ResourceQuota
> metadata:
>   name: spark-quota
>   namespace: spark
> spec:
>   hard:
> requests.cpu: "5"
> requests.memory: "5Gi"
> limits.cpu: "5"
> limits.memory: "5Gi" {code}
> h2. yunikorn-configs.yaml 
> {code:java}
> apiVersion: v1
> kind: ConfigMap
> metadata:
>   name: yunikorn-configs
>   namespace: yunikorn
> data:
>   log.level: "-1"
>   log.admission.level: "-1"
>   log.core.config.level: "-1"
>   queues.yaml: |
> partitions:
>   - name: default
> placementrules:
>   - name: tag
> value: namespace
> create: true
> queues:
>   - name: root
> submitacl: '*'
> properties:
>   application.sort.policy: fifo
>   placeholderTimeoutInSeconds: 60
>   schedulingStyle: Hard
> queues:
>   - name: spark
> properties:
>   application.sort.policy: fifo
>   placeholderTimeoutInSeconds: 60
>   schedulingStyle: Hard
> resources:
>   guaranteed:
> vcore: 5
> memory: 5Gi
>   max:
> vcore: 5
> memory: 5Gi {code}
> h2. Spark-submit command
> {code:java}
> ./bin/spark-submit \
>   --master k8s://https://10.10.10.10:6443 \
>   --deploy-mode cluster \
>   --name spark-pi \
>   --conf spark.kubernetes.authenticate.driver.serviceAccountName=sparksa \
>   --conf spark.kubernetes.namespace=spark \
>   --class org.apache.spark.examples.SparkPi \
>   --conf spark.executor.instances=1 \
>   --conf spark.executor.cores=1 \
>   --conf spark.executor.memory=1500m \
>   --conf spark.driver.cores=1 \
>   --conf spark.driver.memory=1500m \
>   --conf spark.kubernetes.driver.limit.cores=1 \
>   --conf spark.kubernetes.driver.limit.memory=2G \
>   --conf spark.kubernetes.executor.limit.cores=1 \
>   --conf spark.kubernetes.executor.limit.memory=2G \
>--conf spark.kubernetes.driver.label.app=spark \
>   --conf spark.kubernetes.executor.label.app=spark \
>   --conf spark.kubernetes.container.image=apache/spark:v3.3.2 \
>   --conf spark.kubernetes.scheduler.name=yunikorn \
>   --conf spark.kubernetes.driver.label.queue=root.spark \
>   --conf spark.kubernetes.executor.label.queue=root.spark \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
>   --conf 
> spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
>  \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name": 
> "spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory": 
> "2Gi"},"nodeSelector": {"app": "spark"} }, {"name": "spark-executor", 
> "minMember": 1, "minResource": {"cpu": "1", "memory": 

[jira] [Created] (YUNIKORN-2677) Rename AllocationResult to AllocationResultType

2024-06-17 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2677:
--

 Summary: Rename AllocationResult to AllocationResultType
 Key: YUNIKORN-2677
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2677
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Craig Condit
Assignee: Craig Condit


In preparation for other refactorings, rename the AllocationResult object to 
AllocationResultType.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2672) Upgrade to K8s 1.29.6

2024-06-13 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2672.

Fix Version/s: 1.6.0
   1.5.2
   Resolution: Fixed

Merged to master and cherry-picked to branch-1.5.

> Upgrade to K8s 1.29.6
> -
>
> Key: YUNIKORN-2672
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2672
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Craig Condit
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> A major performance regression was fixed in K8s that on analysis mainly 
> impacts the plugin implementation. The regression is part of the release 
> 1.29.4 we currently build against.
> See [https://github.com/kubernetes/kubernetes/pull/125197] for details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2671) Convert Allocation releases field to singular

2024-06-12 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2671.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Convert Allocation releases field to singular
> -
>
> Key: YUNIKORN-2671
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2671
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Now that repeats are no longer allowed, we have no need to track multiple 
> releases for an allocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2671) Convert Allocation releases field to single release field

2024-06-12 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2671:
--

 Summary: Convert Allocation releases field to single release field
 Key: YUNIKORN-2671
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2671
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Craig Condit
Assignee: Craig Condit


Now that repeats are no longer allowed, we have no need to track multiple 
releases for an allocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2664) Running YuniKorn as leader-elected controller with multiple replicas

2024-06-05 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2664.
--
Resolution: Won't Do

> Running YuniKorn as leader-elected controller with multiple replicas
> 
>
> Key: YUNIKORN-2664
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2664
> Project: Apache YuniKorn
>  Issue Type: Wish
>  Components: shim - kubernetes
>Reporter: Volodymyr Kot
>Priority: Major
>
> Hey, I noticed that by default YuniKorn is run as a Deployment with a single 
> replica:
> [https://github.com/apache/yunikorn-release/blob/aa9a2939eed81fc74fbbf7afbc0fe60c5aa0acd0/helm-charts/yunikorn/templates/deployment.yaml#L31]
> and leader election is disabled in scheduler configuration:
> [https://github.com/apache/yunikorn-k8shim/blob/36111c41d97658e168e640c284fe8d71921883b4/conf/scheduler-config.yaml#L20]
>  
> Is there anything about the architecture of YuniKorn that makes this 
> hard/impossible to do? Or would you be open to a PR that adds ability to run 
> with multiple replicas and leader election?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2641) Ensure createTime has same semantics for ask and allocation

2024-05-28 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2641.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Ensure createTime has same semantics for ask and allocation
> ---
>
> Key: YUNIKORN-2641
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2641
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The createTime field in Allocation and AllocationAsk are not used 
> consistently. Ensure that the field is always set, and that it is not 
> modified later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2641) Ensure createTime has same semantics for ask and allocation

2024-05-23 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2641:
--

 Summary: Ensure createTime has same semantics for ask and 
allocation
 Key: YUNIKORN-2641
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2641
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Craig Condit
Assignee: Craig Condit


The createTime field in Allocation and AllocationAsk are not used consistently. 
Ensure that the field is always set, and that it is not modified later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-802) Supports to assign nodes to non-default partition

2024-05-15 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-802.
-

> Supports to assign nodes to non-default partition
> -
>
> Key: YUNIKORN-802
> URL: https://issues.apache.org/jira/browse/YUNIKORN-802
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
>  Labels: pull-request-available
>
> see comment 
> (https://issues.apache.org/jira/browse/YUNIKORN-22?focusedCommentId=17398860=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17398860)
> Currently, all nodes are hardcode to be assigned to "default" partition. That 
> brings two disadvantages.
>  # we can't select specify nodes, which are used to execute spark job only, 
> from a cluster
>  # multi-partitions does not work since non-default partition can't get nodes
> Future works:
>  # support to change partition assignment of existent node (in this PR, the 
> update request will be skipped)
>  # support to remove existent node which had been reassigned (in this PR, 
> removing such node cause error message "Failed to update non existing node 
> ...")



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-22) k8shim is hardcoded to the default partition

2024-05-15 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-22.


> k8shim is hardcoded to the default partition
> 
>
> Key: YUNIKORN-22
> URL: https://issues.apache.org/jira/browse/YUNIKORN-22
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Rainie Li
>Priority: Major
>
> In the application and node code the partition is hardcoded to use the 
> DefaultPartition constant when creating new objects:
>  * application.NewApplication
>  * schedulerNode.addExistingAllocation
>  This means that in the configuration for the core we must have that same 
> partition and that we currently would not be able to create a second shim for 
> the same core as they would interfere with each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2593) Remove partition from Allocation/AllocationAsk

2024-05-08 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2593.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Remove partition from Allocation/AllocationAsk
> --
>
> Key: YUNIKORN-2593
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2593
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Remove the partitionName field from the Allocation and AllocationAsk objects. 
> Its use was inconsistent, and can be retrieved from other contexts where 
> needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2610) Announce deprecation of plugin mode

2024-05-07 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2610.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Announce deprecation of plugin mode
> ---
>
> Key: YUNIKORN-2610
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2610
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: documentation
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available, release-notes
> Fix For: 1.6.0
>
>
> As discussed on the mailing lists and community meetings, the plan is to 
> deprecate the yunikorn plugin mode along the following schedule:
>  * {*}YuniKorn 1.6{*}: Deprecation announced
>  * {*}YuniKorn 1.7{*}: Scheduler will emit warnings if plugin mode is in use
>  * {*}YuniKorn 1.8{*}: YuniKorn will no longer ship plugin mode binaries
>  * {*}YuniKorn 1.9{*}: Implementation removed from codebase
> As a first step, for 1.6 we need to update the documentation to give notice 
> of the deprecation timeline. This will ensure that users have adequate notice 
> to move away from plugin mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2610) Announce deprecation of plugin mode

2024-05-07 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2610:
--

 Summary: Announce deprecation of plugin mode
 Key: YUNIKORN-2610
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2610
 Project: Apache YuniKorn
  Issue Type: Task
  Components: documentation
Reporter: Craig Condit
Assignee: Craig Condit


As discussed on the mailing lists and community meetings, the plan is to 
deprecate the yunikorn plugin mode along the following schedule:
 * {*}YuniKorn 1.6{*}: Deprecation announced
 * {*}YuniKorn 1.7{*}: Scheduler will emit warnings if plugin mode is in use
 * {*}YuniKorn 1.8{*}: YuniKorn will no longer ship plugin mode binaries
 * {*}YuniKorn 1.9{*}: Implementation removed from codebase

As a first step, for 1.6 we need to update the documentation to give notice of 
the deprecation timeline. This will ensure that users have adequate notice to 
move away from plugin mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2588) Shim: Convert AllocationID to AllocationKey

2024-04-27 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2588.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Shim: Convert AllocationID to AllocationKey
> ---
>
> Key: YUNIKORN-2588
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2588
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2594) Remove unused field AllocationAsk.execTimeout

2024-04-27 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2594.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Remove unused field AllocationAsk.execTimeout
> -
>
> Key: YUNIKORN-2594
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2594
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The AllocationAsk object contains an unused execTimeout field (it is set but 
> never used logically). It should be removed in preparation for merging 
> AllocationAsk and Allocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2594) Core: Remove unused field AllocationAsk.execTimeout

2024-04-26 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2594:
--

 Summary: Core: Remove unused field AllocationAsk.execTimeout
 Key: YUNIKORN-2594
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2594
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Craig Condit
Assignee: Craig Condit


The AllocationAsk object contains an unused execTimeout field (it is set but 
never used logically). It should be removed in preparation for merging 
AllocationAsk and Allocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2593) Simplify partition name

2024-04-26 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2593:
--

 Summary: Simplify partition name
 Key: YUNIKORN-2593
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2593
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Craig Condit
Assignee: Craig Condit


Currently, partition names are treated differently in different places within 
the core. Specifically, sometimes they are bare (i.e. "default") and other 
places they are composite (i.e. "[rm:123]default"). This is confusing and 
unnecessary. It also hampers efforts to merge the AllocationAsk and Allocation 
objects, as the semantics are different between them. Switch to using bare form 
("default") everywhere instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2589) Web: Convert AllocationID to AllocationKey

2024-04-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2589.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Web: Convert AllocationID to AllocationKey
> --
>
> Key: YUNIKORN-2589
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2589
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2587) Core: Convert AllocationID to AllocationKey

2024-04-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2587.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Core: Convert AllocationID to AllocationKey
> ---
>
> Key: YUNIKORN-2587
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2587
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2586) 3rd party license failure when GOROOT not set

2024-04-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2586.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> 3rd party license failure when GOROOT not set
> -
>
> Key: YUNIKORN-2586
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2586
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Running go-license fails (and therefore the rest of the build) when running 
> on go1.22.2 and GOROOT is not set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2585) SI: Convert AllocationID to AllocationKey

2024-04-25 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2585.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> SI: Convert AllocationID to AllocationKey
> -
>
> Key: YUNIKORN-2585
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2585
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: scheduler-interface
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Convert all usage of AllocationID to AllocationKey since they are the same 
> thing now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2588) Shim: Convert AllocationID to AllocationKey

2024-04-25 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2588:
--

 Summary: Shim: Convert AllocationID to AllocationKey
 Key: YUNIKORN-2588
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2588
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2587) Core: Convert AllocationID to AllocationKey

2024-04-25 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2587:
--

 Summary: Core: Convert AllocationID to AllocationKey
 Key: YUNIKORN-2587
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2587
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2589) Web: Convert AllocationID to AllocationKey

2024-04-25 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2589:
--

 Summary: Web: Convert AllocationID to AllocationKey
 Key: YUNIKORN-2589
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2589
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: webapp
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2586) 3rd party license failure when GOROOT not set

2024-04-24 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2586:
--

 Summary: 3rd party license failure when GOROOT not set
 Key: YUNIKORN-2586
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2586
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit


Running go-license fails (and therefore the rest of the build) when running on 
go1.22.2 and GOROOT is not set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2585) SI: Convert AllocationID to AllocationKey

2024-04-24 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2585:
--

 Summary: SI: Convert AllocationID to AllocationKey
 Key: YUNIKORN-2585
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2585
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: scheduler-interface
Reporter: Craig Condit
Assignee: Craig Condit


Convert all usage of AllocationID to AllocationKey since they are the same 
thing now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2584) Shim: Remove references to MaxAllocations

2024-04-24 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2584.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Shim: Remove references to MaxAllocations
> -
>
> Key: YUNIKORN-2584
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2584
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2584) Shim: Remove references to MaxAllocations

2024-04-24 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2584:
--

 Summary: Shim: Remove references to MaxAllocations
 Key: YUNIKORN-2584
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2584
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2458) Remove ask repeats from AllocationAsk

2024-04-24 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2458.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Remove ask repeats from AllocationAsk
> -
>
> Key: YUNIKORN-2458
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2458
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Simplify ask and allocation handling by removing support for repeated 
> requests in a single ask. This is functionality that is not used by the shim. 
> By removing support for repeated asks, we also ensure that there is a 1:1 
> relationship between ask and allocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2574) totalPartitionResource should not be mutated with AddTo/SubFrom

2024-04-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2574.

Fix Version/s: 1.6.0
   1.5.1
   Resolution: Fixed

Merged to master and branch-1.5.

> totalPartitionResource should not be mutated with AddTo/SubFrom
> ---
>
> Key: YUNIKORN-2574
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2574
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.1
>
>
> There is a potential data race in {{PartitionContext}}: the field 
> {{totalPartitionResource}} is mutated in place. The problem is that the 
> method {{GetTotalPartitionResource()}} does not clone it.
> {noformat}
> func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource {
>   pc.RLock()
>   defer pc.RUnlock()
>   return pc.totalPartitionResource
> }
> {noformat}
> In general, we should prefer the immutable approach for variables like this, 
> just like in {{{}objects.Queue{}}}:
> {noformat}
> func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported 
> bool) error {
>   // check this queue: failure stops checks if the allocation is not part 
> of a node addition
>   newAllocated := resources.Add(sq.allocatedResource, alloc)<  
> New object
> [ ... removed ... ]
>   sq.Lock()
>   defer sq.Unlock()
>   // all OK update this queue
>   sq.allocatedResource = newAllocated
>   sq.updateAllocatedResourceMetrics()
>   return nil
> }
> // incPendingResource increments pending resource of this queue and its 
> parents.
> func (sq *Queue) incPendingResource(delta *resources.Resource) {
>   // update the parent
>   if sq.parent != nil {
>   sq.parent.incPendingResource(delta)
>   }
>   // update this queue
>   sq.Lock()
>   defer sq.Unlock()
>   sq.pending = resources.Add(sq.pending, delta)     < New object
> sq.updatePendingResourceMetrics()
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2521) Scheduler deadlock

2024-04-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2521.

 Fix Version/s: 1.6.0
1.5.1
Target Version: 1.6.0, 1.5.1
Resolution: Fixed

This was delivered as part of YUNIKORN-2544.

> Scheduler deadlock
> --
>
> Key: YUNIKORN-2521
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2521
> Project: Apache YuniKorn
>  Issue Type: Bug
>Affects Versions: 1.5.0
> Environment: Yunikorn: 1.5
> AWS EKS: v1.28.6-eks-508b6b3
>Reporter: Noah Yoshida
>Assignee: Craig Condit
>Priority: Critical
> Fix For: 1.6.0, 1.5.1
>
> Attachments: 0001-YUNIKORN-2539-core.patch, 
> 0002-YUNIKORN-2539-k8shim.patch, 4_4_goroutine-1.txt, 4_4_goroutine-2.txt, 
> 4_4_goroutine-3.txt, 4_4_goroutine-4.txt, 4_4_goroutine-5-state-dump.txt, 
> 4_4_profile001.png, 4_4_profile002.png, 4_4_profile003.png, 
> 4_4_scheduler-logs.txt, deadlock_2024-04-18.log, goroutine-4-3-1.out, 
> goroutine-4-3-2.out, goroutine-4-3-3.out, goroutine-4-3.out, 
> goroutine-4-5.out, goroutine-dump.txt, goroutine-while-blocking-2.out, 
> goroutine-while-blocking.out, logs-potential-deadlock-2.txt, 
> logs-potential-deadlock.txt, logs-splunk-ordered.txt, logs-splunk.txt, 
> profile001-4-5.gif, profile012.gif, profile013.gif, running-logs-2.txt, 
> running-logs.txt
>
>
> Discussion on Yunikorn slack: 
> [https://yunikornworkspace.slack.com/archives/CLNUW68MU/p1711048995187179]
> Occasionally, Yunikorn will deadlock and prevent any new pods from starting. 
> All pods stay in Pending. There are no error logs inside of the Yunikorn 
> scheduler indicating any issue. 
> Additionally, the pods all have the correct annotations / labels from the 
> admission service, so they are at least getting put into k8s correctly. 
> The issue was seen intermittently on Yunikorn version 1.5 in EKS, using 
> version `v1.28.6-eks-508b6b3`. 
> At least for me, we run about 25-50 nodes and 200-400 pods. Pods and nodes 
> are added and removed pretty frequently as we do ML workloads. 
> Attached is the goroutine dump. We were not able to get a statedump as the 
> endpoint kept timing out. 
> You can fix it by restarting the Yunikorn scheduler pod. Sometimes you also 
> have to delete any "Pending" pods that got stuck while the scheduler was 
> deadlocked as well, for them to get picked up by the new scheduler pod. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2579) SI: Remove maxAllocations field from AllocationAsk

2024-04-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2579.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> SI: Remove maxAllocations field from AllocationAsk
> --
>
> Key: YUNIKORN-2579
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2579
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: scheduler-interface
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Now that maxAllocations != 1 is no longer supported, we need to remove the 
> maxAllocationsField from the AllocationAsk in the scheduler interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2579) SI: Remove maxAllocations field from AllocationAsk

2024-04-23 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2579:
--

 Summary: SI: Remove maxAllocations field from AllocationAsk
 Key: YUNIKORN-2579
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2579
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: scheduler-interface
Reporter: Craig Condit
Assignee: Craig Condit


Now that maxAllocations != 1 is no longer supported, we need to remove the 
maxAllocationsField from the AllocationAsk in the scheduler interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2539) Add optional deadlock detection

2024-04-05 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2539.

Resolution: Fixed

> Add optional deadlock detection
> ---
>
> Key: YUNIKORN-2539
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2539
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler, shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.1
>
>
> We make heavy use of sync.Mutex and sync.RWMutex in our code. Unfortunately, 
> while these are very performant, they can lead to difficult-to-diagnose 
> deadlocks.
> If we substitute our own locking routines, we can optionally enable deadlock 
> detection. See [https://github.com/sasha-s/go-deadlock] for a possible 
> solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2539) Add optional deadlock detection

2024-04-04 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2539:
--

 Summary: Add optional deadlock detection
 Key: YUNIKORN-2539
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2539
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler, shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit


We make heavy use of sync.Mutex and sync.RWMutex in our code. Unfortunately, 
while these are very performant, they can lead to difficult-to-diagnose 
deadlocks.

If we substitute our own locking routines, we can optionally enable deadlock 
detection. See [https://github.com/sasha-s/go-deadlock] for a possible solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2534) [Yunikorn] Quota enforcement checks are failing when we have max-application set to 0

2024-04-04 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2534.
--

> [Yunikorn] Quota enforcement checks are failing when we have max-application 
> set to 0
> -
>
> Key: YUNIKORN-2534
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2534
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Rajesh Kanhaiya Lal
>Priority: Major
> Attachments: yunikorn-configs-fresh.yaml
>
>
> The Max-application checks are not working when we are setting 
> max-application to 0 in the yunikorn-config file.
> The Config validation is also ignored in case of max-application is set to 0, 
> for example, the child max-application should be less or equal to the parent 
> queue is also not working when we have the max-application set to 0.
> Attached Yunikorn Config file
> User and Group tracking API also does not log max-application in the response.
>  
> {code:java}
> curl --location 'http://127.0.0.1:9080/ws/v1/partition/default/usage/users'
> [
>     {
>         "userName": "nobody",
>         "groups": {
>             "ts333w3": "*",
>             "ts433": "*",
>             "ts544": "*",
>             "ts633": "*"
>         },
>         "queues": {
>             "queuePath": "root",
>             "resourceUsage": {
>                 "Resources": {
>                     "memory": 3,
>                     "pods": 3,
>                     "vcore": 300
>                 }
>             },
>             "runningApplications": [
>                 "ts333w3",
>                 "ts433",
>                 "ts544"
>             ],
>             "children": [
>                 {
>                     "queuePath": "root.default",
>                     "resourceUsage": {
>                         "Resources": {
>                             "memory": 3,
>                             "pods": 3,
>                             "vcore": 300
>                         }
>                     },
>                     "runningApplications": [
>                         "ts333w3",
>                         "ts433",
>                         "ts544"
>                     ]
>                 }
>             ]
>         }
>     }
> ] {code}
> Could You please take a look ?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2534) [Yunikorn] Quota enforcement checks are failing when we have max-application set to 0

2024-04-04 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2534.

  Assignee: (was: Manikandan R)
Resolution: Not A Bug

This is not a bug. A value of zero is indistinguishable from unset, and we 
explicitly treat it the same.

> [Yunikorn] Quota enforcement checks are failing when we have max-application 
> set to 0
> -
>
> Key: YUNIKORN-2534
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2534
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Rajesh Kanhaiya Lal
>Priority: Major
> Attachments: yunikorn-configs-fresh.yaml
>
>
> The Max-application checks are not working when we are setting 
> max-application to 0 in the yunikorn-config file.
> The Config validation is also ignored in case of max-application is set to 0, 
> for example, the child max-application should be less or equal to the parent 
> queue is also not working when we have the max-application set to 0.
> Attached Yunikorn Config file
> User and Group tracking API also does not log max-application in the response.
>  
> {code:java}
> curl --location 'http://127.0.0.1:9080/ws/v1/partition/default/usage/users'
> [
>     {
>         "userName": "nobody",
>         "groups": {
>             "ts333w3": "*",
>             "ts433": "*",
>             "ts544": "*",
>             "ts633": "*"
>         },
>         "queues": {
>             "queuePath": "root",
>             "resourceUsage": {
>                 "Resources": {
>                     "memory": 3,
>                     "pods": 3,
>                     "vcore": 300
>                 }
>             },
>             "runningApplications": [
>                 "ts333w3",
>                 "ts433",
>                 "ts544"
>             ],
>             "children": [
>                 {
>                     "queuePath": "root.default",
>                     "resourceUsage": {
>                         "Resources": {
>                             "memory": 3,
>                             "pods": 3,
>                             "vcore": 300
>                         }
>                     },
>                     "runningApplications": [
>                         "ts333w3",
>                         "ts433",
>                         "ts544"
>                     ]
>                 }
>             ]
>         }
>     }
> ] {code}
> Could You please take a look ?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2532) Resource usage report has an incompatible format change

2024-04-03 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2532.
--

> Resource usage report has an incompatible format change
> ---
>
> Key: YUNIKORN-2532
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2532
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> There is some recent change that caused the application resource usage report 
> to have a new format:
> Prior the change, the format was:
> {code:java}
> YK_APP_SUMMARY: {"appID": "adf53ee0-experiment-organicad-94520240-1-1", 
> "submissionTime": 1712169262131, "startTime": 1712169264134, "finishTime": 
> 1712173619983, "user": 
> "system:serviceaccount:spark-operator-02:spark-operator", "queue": 
> "root.queue-large", "state": "Completed", "rmID": "test-cluster", 
> "resourceUsage": 
> {"abc":{"memory":139178200478515200,"pods":1729129,"vcore":5183062000},"def":{"memory":113789789798400,"pods":1413,"vcore":4239000}},
>  "preemptedResource": {}}
>   {code}
> with the change, the new format is:
> {code:java}
>  2024-04-04T00:33:08.532Z INFOcore.scheduler.application.usage
> objects/application_summary.go:60   YK_APP_SUMMARY: {ApplicationID: 
> afa303d0-test-trino-sparksql--20240404-2-1, SubmissionTime: 1712190615461, 
> StartTime: 1712190617496, FinishTime: 1712190788532, User: 
> system:serviceaccount:spark-operator-01:spark-operator, Queue: 
> root.queue-large, State: Completed, RmID: test-cluster, ResourceUsage: 
> TrackedResource{UNKNOWN:pods=177,UNKNOWN:vcore=354000,UNKNOWN:memory=1431454089216},
>  PreemptedResource: TrackedResource{}, PlaceholderResource: 
> TrackedResource{}}{code}
> There are several incompatibilities:
> 1. the class name TrackedResource was not there before, now it is.
> 2. the instance type was outside the resource part before, not it's embedded
> 3. the instance type was reported correctly before the change, now it's 
> UNKNOWN
> #3 may be a different issue, but it's observed by us at the same time.
> I think what should change the format back to the original one, as this is an 
> incompatible change. What do you think [~wilfreds] , [~pbacsko] ,[~ccondit] ?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2532) Resource usage report has an incompatible format change

2024-04-03 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2532.

Resolution: Not A Bug

> Resource usage report has an incompatible format change
> ---
>
> Key: YUNIKORN-2532
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2532
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> There is some recent change that caused the application resource usage report 
> to have a new format:
> Prior the change, the format was:
> {code:java}
> YK_APP_SUMMARY: {"appID": "adf53ee0-experiment-organicad-94520240-1-1", 
> "submissionTime": 1712169262131, "startTime": 1712169264134, "finishTime": 
> 1712173619983, "user": 
> "system:serviceaccount:spark-operator-02:spark-operator", "queue": 
> "root.queue-large", "state": "Completed", "rmID": "test-cluster", 
> "resourceUsage": 
> {"abc":{"memory":139178200478515200,"pods":1729129,"vcore":5183062000},"def":{"memory":113789789798400,"pods":1413,"vcore":4239000}},
>  "preemptedResource": {}}
>   {code}
> with the change, the new format is:
> {code:java}
>  2024-04-04T00:33:08.532Z INFOcore.scheduler.application.usage
> objects/application_summary.go:60   YK_APP_SUMMARY: {ApplicationID: 
> afa303d0-test-trino-sparksql--20240404-2-1, SubmissionTime: 1712190615461, 
> StartTime: 1712190617496, FinishTime: 1712190788532, User: 
> system:serviceaccount:spark-operator-01:spark-operator, Queue: 
> root.queue-large, State: Completed, RmID: test-cluster, ResourceUsage: 
> TrackedResource{UNKNOWN:pods=177,UNKNOWN:vcore=354000,UNKNOWN:memory=1431454089216},
>  PreemptedResource: TrackedResource{}, PlaceholderResource: 
> TrackedResource{}}{code}
> There are several incompatibilities:
> 1. the class name TrackedResource was not there before, now it is.
> 2. the instance type was outside the resource part before, not it's embedded
> 3. the instance type was reported correctly before the change, now it's 
> UNKNOWN
> #3 may be a different issue, but it's observed by us at the same time.
> I think what should change the format back to the original one, as this is an 
> incompatible change. What do you think [~wilfreds] , [~pbacsko] ,[~ccondit] ?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2529) Newly added nodes show 'ready:false' under node attributes

2024-04-02 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2529.

Fix Version/s: 1.6.0
   Resolution: Implemented

Resolving as this was fixed by YUNIKORN-2530.

> Newly added nodes show 'ready:false' under node attributes
> --
>
> Key: YUNIKORN-2529
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2529
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 1.5.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
> Fix For: 1.6.0
>
>
> In the web UI, the attributes column for nodes shows 'ready:false' for newly 
> added nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2530) Remove unnecessary ready flag on node

2024-04-02 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2530.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged all PRs to master.

> Remove unnecessary ready flag on node
> -
>
> Key: YUNIKORN-2530
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2530
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler, scheduler-interface, shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> YuniKorn has had a "ready" flag for nodes for a long time, however this flag 
> is not set correctly and serves no purpose to the scheduler. In Kubernetes, 
> readiness is a far more complex concept anyway, and a single true/false value 
> is insufficient. Therefore, we should remove the ready flag to simplify the 
> interface. This will also fix a minor issue in the Web UI where ready:false 
> is shown for newly added nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2530) Remove unnecessary ready flag on node

2024-04-02 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2530:
--

 Summary: Remove unnecessary ready flag on node
 Key: YUNIKORN-2530
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2530
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler, scheduler-interface, shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit


YuniKorn has had a "ready" flag for nodes for a long time, however this flag is 
not set correctly and serves no purpose to the scheduler. In Kubernetes, 
readiness is a far more complex concept anyway, and a single true/false value 
is insufficient. Therefore, we should remove the ready flag to simplify the 
interface. This will also fix a minor issue in the Web UI where ready:false is 
shown for newly added nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2529) Newly added nodes show 'ready:false' under node attributes

2024-04-02 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2529:
--

 Summary: Newly added nodes show 'ready:false' under node attributes
 Key: YUNIKORN-2529
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2529
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: webapp
Affects Versions: 1.5.0
Reporter: Craig Condit
Assignee: Craig Condit


In the web UI, the attributes column for nodes shows 'ready:false' for newly 
added nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2440) [UMBRELLA] Remove stateaware scheduling

2024-03-20 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2440.

Fix Version/s: 1.6.0
   Resolution: Fixed

Resolving as all subtasks are complete.

> [UMBRELLA] Remove stateaware scheduling
> ---
>
> Key: YUNIKORN-2440
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2440
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Craig Condit
>Priority: Major
>  Labels: release-notes
> Fix For: 1.6.0
>
>
> Umbrella jira to track all the work to remove state ware scheduling:
> * remove scheduling code
> * remove documentation
> * remove configuration options
> * document way to achieve similar behaviour (FIFO with max applications)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2509) Update documentation to remove state-aware scheduling

2024-03-20 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2509.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Update documentation to remove state-aware scheduling
> -
>
> Key: YUNIKORN-2509
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2509
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Remove stateaware scheduling from documentation, including references to 
> starting state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2508) Remove APP_STARTING references from shim

2024-03-20 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2508.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Remove APP_STARTING references from shim
> 
>
> Key: YUNIKORN-2508
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2508
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Now that APP_STARTING is gone, we need to update some references in the shim 
> to remove usages of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2509) Update documentation to remove state-aware scheduling

2024-03-20 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2509:
--

 Summary: Update documentation to remove state-aware scheduling
 Key: YUNIKORN-2509
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2509
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: documentation
Reporter: Craig Condit
Assignee: Craig Condit


Remove stateaware scheduling from documentation, including references to 
starting state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2508) Remove APP_STARTING references from shim

2024-03-20 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2508:
--

 Summary: Remove APP_STARTING references from shim
 Key: YUNIKORN-2508
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2508
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit


Now that APP_STARTING is gone, we need to update some references in the shim to 
remove usages of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2380) [UMBRELLA] YuniKorn 1.5.0 release efforts

2024-03-20 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2380.

Fix Version/s: 1.5.0
   Resolution: Fixed

Resolving as release is complete.

> [UMBRELLA] YuniKorn 1.5.0 release efforts
> -
>
> Key: YUNIKORN-2380
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2380
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: release
>Reporter: Wilfred Spiegelenburg
>Assignee: TingYao Huang
>Priority: Blocker
> Fix For: 1.5.0
>
>
> This umbrella is to track the work items needed for 1.5.0 release.
> Release manager: TBD
> Multiple new features, enhancements and bug fixes are covered. Please see
> [https://issues.apache.org/jira/issues/?jql=project%20%3D%20YUNIKORN%20AND%20"Target%20Version"%20%3D%201.5.0%20ORDER%20BY%20status%20ASC|https://issues.apache.org/jira/issues/?jql=project%20%3D%20YUNIKORN%20AND%20%22Target%20Version%22%20%3D%201.5.0%20ORDER%20BY%20status%20ASC]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2495) Remove App "Starting" state

2024-03-20 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2495.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Remove App "Starting" state
> ---
>
> Key: YUNIKORN-2495
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2495
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> App Starting state is in use for while. Though it has ben introduced mainly 
> as part of state aware app scheduling, all related code could be assessed and 
> removed if it is no longer needed anywhere.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2419) [UMBRELLA] Generate reproducible binaries

2024-03-19 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2419.

Fix Version/s: 1.6.0
   Resolution: Fixed

Resolving as all subtasks are now complete.

> [UMBRELLA] Generate reproducible binaries
> -
>
> Key: YUNIKORN-2419
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2419
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes, webapp
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: release-notes
> Fix For: 1.6.0
>
>
> Currently, the binaries we build for YuniKorn differ from one build to the 
> next. We should attempt to standardize our build output so that independently 
> built binaries from the same source code can be validated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2487) [Release] Force REPRODUCIBLE_BUILDS=1 on release

2024-03-19 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2487.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> [Release] Force REPRODUCIBLE_BUILDS=1 on release
> 
>
> Key: YUNIKORN-2487
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2487
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> With the updated REPRODUCIBLE_BUILDS logic in k8shim/web, need to pass this 
> variable into the scripts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2480) Convert yunikorn-web build to use pnpm

2024-03-19 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2480.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Convert yunikorn-web build to use pnpm
> --
>
> Key: YUNIKORN-2480
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2480
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: webapp
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Our current yunikorn-web build is driven by yarn v1, which is very outdated 
> and slow. We should switch to using pnpm instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2481) Convert yunikorn-site build to use pnpm

2024-03-17 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2481.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> Convert yunikorn-site build to use pnpm
> ---
>
> Key: YUNIKORN-2481
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2481
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: website
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Our current yunikorn-site build is driven by yarn v1, which is very outdated 
> and slow. We should switch to using pnpm instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2486) [Web] Use docker to build reproducible binaries

2024-03-15 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2486.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> [Web] Use docker to build reproducible binaries
> ---
>
> Key: YUNIKORN-2486
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2486
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: website
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2485) [Shim] Use docker to build reproducible binaries

2024-03-15 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2485.

Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master.

> [Shim] Use docker to build reproducible binaries
> 
>
> Key: YUNIKORN-2485
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2485
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The current build system (even for reproducible builds) results in 
> differences between environments. To eliminate these differences, we should 
> build in a docker container when in reproducible build mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2488) SI: Remove stateaware constants

2024-03-13 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2488:
--

 Summary: SI: Remove stateaware constants
 Key: YUNIKORN-2488
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2488
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: scheduler-interface
Reporter: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2487) [Release] Force REPRODUCIBLE_BUILDS=1 on release

2024-03-13 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2487:
--

 Summary: [Release] Force REPRODUCIBLE_BUILDS=1 on release
 Key: YUNIKORN-2487
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2487
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: release
Reporter: Craig Condit
Assignee: Craig Condit


With the updated REPRODUCIBLE_BUILDS logic in k8shim/web, need to pass this 
variable into the scripts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2486) [Web] Use docker to build reproducible binaries

2024-03-13 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2486:
--

 Summary: [Web] Use docker to build reproducible binaries
 Key: YUNIKORN-2486
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2486
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: website
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2485) [Shim] Use sanitized docker env to build reproducible binaries

2024-03-13 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2485:
--

 Summary: [Shim] Use sanitized docker env to build reproducible 
binaries
 Key: YUNIKORN-2485
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2485
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit


The current build system (even for reproducible builds) results in differences 
between environments. To eliminate these differences, we should build in a 
docker container when in reproducible build mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2484) Shim: Remove stateaware logic

2024-03-13 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2484:
--

 Summary: Shim: Remove stateaware logic
 Key: YUNIKORN-2484
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2484
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: shim - kubernetes
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2483) Remove stateaware scheduling logic

2024-03-13 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2483:
--

 Summary: Remove stateaware scheduling logic
 Key: YUNIKORN-2483
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2483
 Project: Apache YuniKorn
  Issue Type: Sub-task
  Components: core - scheduler
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2383) Branching and tagging for 1.5

2024-03-13 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2383.

Fix Version/s: 1.5.0
   Resolution: Fixed

All tasks complete, resolving.

> Branching and tagging for 1.5
> -
>
> Key: YUNIKORN-2383
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2383
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Wilfred Spiegelenburg
>Assignee: TingYao Huang
>Priority: Major
> Fix For: 1.5.0
>
>
> branching & tagging for updating dependencies (SI/core/k8shim)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2384) Release notes for 1.5.0

2024-03-13 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2384.

Fix Version/s: 1.5.0
   Resolution: Fixed

Merged to master.

> Release notes for 1.5.0
> ---
>
> Key: YUNIKORN-2384
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2384
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Wilfred Spiegelenburg
>Assignee: Craig Condit
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> Jiras have been tagged with release-notes for this version. These jiras need 
> to be added with a special mention in the release notes.
> [https://issues.apache.org/jira/issues/?filter=12352474]
> This Jira might require multiple people to help write the release notes for 
> the specific jiras mentioned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2481) Convert yunikorn-site build to use pnpm

2024-03-12 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2481:
--

 Summary: Convert yunikorn-site build to use pnpm
 Key: YUNIKORN-2481
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2481
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: website
Reporter: Craig Condit
Assignee: Craig Condit


Our current yunikorn-site build is driven by yarn v1, which is very outdated 
and slow. We should switch to using pnpm instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2480) Convert yunikorn-web build to use pnpm

2024-03-12 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2480:
--

 Summary: Convert yunikorn-web build to use pnpm
 Key: YUNIKORN-2480
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2480
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: webapp
Reporter: Craig Condit
Assignee: Craig Condit


Our current yunikorn-web build is driven by yarn v1, which is very outdated and 
slow. We should switch to using pnpm instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2469) Upgrade google.golang.org/protobuf to v1.33.0

2024-03-05 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2469.

Fix Version/s: 1.5.0
   Resolution: Fixed

Merged all PRs to master and cherry-picked to branch-1.5.

> Upgrade google.golang.org/protobuf to v1.33.0
> -
>
> Key: YUNIKORN-2469
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2469
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: core - common, release, shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2468) Remove language around reproducible builds from README

2024-03-05 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2468.

Fix Version/s: 1.5.0
   Resolution: Fixed

> Remove language around reproducible builds from README
> --
>
> Key: YUNIKORN-2468
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2468
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: release
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> The reproducible builds feature is currently not functioning properly in the 
> 1.5.0 release. We should remove references to it from the README.md file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2467) Remove AllocationAsk from the core when a pod is completed

2024-03-05 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2467.

Fix Version/s: 1.5.0
   Resolution: Fixed

Merged to master and cherry-picked to branch-1.5.0.

> Remove AllocationAsk from the core when a pod is completed
> --
>
> Key: YUNIKORN-2467
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2467
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> A new issue was discovered while fixing YUNIKORN-2465. This also results in 
> growing memory usage in case of long running applications.
> When a pod reaches a terminal state (Success / Failed), we send an update 
> request from the shim to the core ({{Task.releaseAllocation()}}). However, we 
> only discard the allocation itself and we don't do anything about the ask. It 
> is kept inside the Application object until it becomes Completed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Reopened] (YUNIKORN-2419) [UMBRELLA] Generate reproducible binaries

2024-03-05 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reopened YUNIKORN-2419:


> [UMBRELLA] Generate reproducible binaries
> -
>
> Key: YUNIKORN-2419
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2419
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes, webapp
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: release-notes
> Fix For: 1.5.0
>
>
> Currently, the binaries we build for YuniKorn differ from one build to the 
> next. We should attempt to standardize our build output so that independently 
> built binaries from the same source code can be validated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2469) Upgrade google.golang.org/protobuf to v1.33.0

2024-03-05 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2469:
--

 Summary: Upgrade google.golang.org/protobuf to v1.33.0
 Key: YUNIKORN-2469
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2469
 Project: Apache YuniKorn
  Issue Type: Task
  Components: core - common, release, scheduler-interface, shim - 
kubernetes
Reporter: Craig Condit
Assignee: Craig Condit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2468) Remove language around reproducible builds from README

2024-03-05 Thread Craig Condit (Jira)
Craig Condit created YUNIKORN-2468:
--

 Summary: Remove language around reproducible builds from README
 Key: YUNIKORN-2468
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2468
 Project: Apache YuniKorn
  Issue Type: Task
  Components: release
Reporter: Craig Condit
Assignee: Craig Condit


The reproducible builds feature is currently not functioning properly in the 
1.5.0 release. We should remove references to it from the README.md file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



  1   2   3   4   5   6   7   8   9   10   >