[jira] [Created] (YUNIKORN-2456) Remove weak ciphers from TLS

2024-02-26 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2456:
---

 Summary: Remove weak ciphers from TLS
 Key: YUNIKORN-2456
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2456
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: security, shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The TLS connection for the admission controller allows ciphers that are 
considered weak in the connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2455) Fix incorrect configs of historical event tracing in the documentation

2024-02-26 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created YUNIKORN-2455:


 Summary: Fix incorrect configs of historical event tracing in the 
documentation
 Key: YUNIKORN-2455
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2455
 Project: Apache YuniKorn
  Issue Type: Improvement
Reporter: Chia-Ping Tsai


The configs 
(https://yunikorn.apache.org/docs/design/historical_usage_tracking/#configuration)
 are very different from source code 
(https://github.com/apache/yunikorn-core/blob/master/pkg/common/configs/configs.go#L36)




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2439) Announce deprecation of state aware scheduling

2024-02-26 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2439.

Fix Version/s: 1.5.0
   Resolution: Fixed

Merged to master.

> Announce deprecation of state aware scheduling
> --
>
> Key: YUNIKORN-2439
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2439
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release-notes
>Reporter: Wilfred Spiegelenburg
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available, release-notes
> Fix For: 1.5.0
>
>
> State aware scheduling was a simple scheduling algorithm that provided a stop 
> gap until gang scheduling was implemented. Gang scheduling and state aware do 
> not work together. Gang scheduling is a more generic way of achieving almost 
> the same behaviour.
> State aware scheduling has a number of drawbacks and could be used as an 
> attack vector to slow down overall scheduling performance.
> We should deprecate it and remove in an upcoming release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2454) Event streaming: send instanceUUID before the events

2024-02-26 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2454:
--

 Summary: Event streaming: send instanceUUID before the events
 Key: YUNIKORN-2454
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2454
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Peter Bacsko


It's necessary to send the InstanceUUID first before any events. This way, 
clients can identify easily if Yunikorn has restarted or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2432) Add unit test coverage for UserTracker/GroupTracker/QueueTracker.canRunApp()

2024-02-26 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2432.

Fix Version/s: 1.6.0
   Resolution: Fixed

> Add unit test coverage for UserTracker/GroupTracker/QueueTracker.canRunApp()
> 
>
> Key: YUNIKORN-2432
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2432
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Currently, there is no direct coverage of {{QueueTracker.canRunApp()}}. 
> Although the method is tested indirectly from {{manager_test.go}}, it's 
> desirable to provide proper coverage directly from {{queue_tracker_test.go}}.
> User/GroupTracker code can be expanded with smaller tests too, which validate 
> that the proper arguments are passed to the underlying QueueTracker object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1706) We should clean up failed apps in shim side

2024-02-26 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-1706.

Fix Version/s: 1.5.0
   Resolution: Fixed

Merged to master & cherry-picked to branch-1.5.

> We should clean up failed apps in shim side
> ---
>
> Key: YUNIKORN-1706
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1706
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Wei Huang
>Assignee: Qi Zhu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> I'm running a local dev env *make run_plugin* based on 1.2.0, no admission 
> controller is configured. Additionally, I configured a configmap in the 
> default namespace:
> {code:bash}
> apiVersion: v1
> data:
>   queues.yaml: |
> partitions:
> - name: default
>   nodesortpolicy:
> type: binpacking
>   queues:
>   - name: root
> submitacl: '*'
> queues:
> - name: app1
>   submitacl: '*'
>   properties:
> application.sort.policy: fifo
>   resources:
> max:
>   {memory: 200G, vcore: 1}
> kind: ConfigMap
> metadata:
>   name: yunikorn-configs
> {code}
> Then I create a Pod with the following config:
> {code:bash}
> kind: Pod
> apiVersion: v1
> metadata:
>   name: pod-1
>   labels:
> applicationId: "app1"
> spec:
>   schedulerName: yunikorn
>   containers:
>   - name: pause
> image: registry.k8s.io/pause:3.6
> resources:
>   requests:
>cpu: 1
> {code}
> The pod cannot be scheduled with a status {*}ApplicationRejected{*}, and I 
> observed log in the shim as:
> {code:bash}
> 2023-04-21T16:34:42.354-0700  INFOcache/context.go:741app added   
> {"appID": "app1"}
> 2023-04-21T16:34:42.354-0700  INFOcache/context.go:831task added  
> {"appID": "app1", "taskID": "d643a5ad-c93b-4d99-8eac-9418fbac18b0", 
> "taskState": "New"}
> 2023-04-21T16:34:42.355-0700  INFOcache/context.go:841app request 
> originating pod added   {"appID": "app1", "original task": 
> "d643a5ad-c93b-4d99-8eac-9418fbac18b0"}
> I0421 16:34:42.355111   46423 factory.go:344] "Unable to schedule pod; no 
> fit; waiting" pod="default/pod-1" err="0/1 nodes are available: 1 Pod is not 
> ready for scheduling."
> 2023-04-21T16:34:42.689-0700  INFOcache/application.go:413handle 
> app submission   {"app": "applicationID: app1, queue: root.sandbox, 
> partition: default, totalNumOfTasks: 1, currentState: Submitted", 
> "clusterID": "mycluster"}
> 2023-04-21T16:34:42.692-0700  INFOobjects/application_state.go:132
> Application state transition{"appID": "app1", "source": "New", 
> "destination": "Rejected", "event": "rejectApplication"}
> 2023-04-21T16:34:42.692-0700  ERROR   scheduler/context.go:540Failed 
> to add application to partition (placement rejected) {"applicationID": 
> "app1", "partitionName": "[mycluster]default", "error": "application 'app1' 
> rejected, cannot create queue 'root.sandbox' without placement rules"}
> github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateApplicationEvent
>   
> /Users/weih/go/src/github.pie.apple.com/apache/yunikorn-k8shim/vendor/github.com/apache/yunikorn-core/pkg/scheduler/context.go:540
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent
>   
> /Users/weih/go/src/github.pie.apple.com/apache/yunikorn-k8shim/vendor/github.com/apache/yunikorn-core/pkg/scheduler/scheduler.go:113
> 2023-04-21T16:34:42.693-0700  INFOcache/application.go:565app is 
> rejected by scheduler{"appID": "app1"}
> 2023-04-21T16:34:42.693-0700  INFOcache/application.go:598
> failApplication reason  {"applicationID": "app1", "errMsg": 
> "ApplicationRejected: application 'app1' rejected, cannot create queue 
> 'root.sandbox' without placement rules"}
> 2023-04-21T16:34:42.694-0700  INFOcache/application.go:585setting 
> pod to failed   {"podName": "pod-1"}
> 2023-04-21T16:34:42.712-0700  INFOgeneral/general.go:179  task completes  
> {"appType": "general", "namespace": "default", "podName": "pod-1", "podUID": 
> "d643a5ad-c93b-4d99-8eac-9418fbac18b0", "podStatus": "Failed"}
> 2023-04-21T16:34:42.714-0700  INFOclient/kubeclient.go:246
> Successfully updated pod status {"namespace": "default", "podName": "pod-1", 
> "newStatus": "{Phase:Failed,Conditions:[]PodCondition{},Message: 
> application 'app1' rejected, cannot create queue 'root.sandbox' without 
> placement 
> 

Re: [DISCUSSION] Yunikorn release 1.5.0

2024-02-26 Thread Wilfred Spiegelenburg
The last change is approved YUNIKORN-1706. Peter will commit and backport this.
Please check the private@ list for the last development.

Wilfred

On Sat, 24 Feb 2024 at 18:09, TingYao  wrote:
>
> Hi Everyone,
>
> Update:
>
> We've move some jiras to the next release, and we still got two jiras in
> progress.
> I have created Yunikorn 1.5 branch for all 4 repos(core, k8shim, interface,
> web) as well. Upon the blocker issue fixed, I will start to cherry-picked,
> tagging and go mod dependency changes.
>
> Thanks,
> Tingyao
>
> TingYao  於 2024年2月18日 週日 下午8:45寫道:
>
> > Hi Everyone,
> >
> > I would like to start the discussion for Release 1.5.0.
> >
> > Planned major features:
> >
> > YUNIKORN-970 Change queue metrics to labeled
> > 
> > YUNIKORN-2099 [Umbrella] K8shim simplification
> > 
> > YUNIKORN-2115 [Umbrella] Application tracking history - Phase 2
> >  
> > YUNIKORN-1362 filtering nodes in UI
> > 
> > YUNIKORN-1922 display pending resources in web UI
> > 
> > YUNIKORN-2140 Web UI: resource display rework
> > i
> >
> > Additionally, minor enhancements and bug fixes have been covered as part
> > of this release.
> >
> > There are some open items with target version 1.5.0:
> >
> > https://issues.apache.org/jira/browse/YUNIKORN-2030?jql=project%20%3D%20YUNIKORN%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20%22Target%20Version%22%20%3D%201.5.0%20ORDER%20BY%20priority%20DESC
> >
> > Please review this list and decide whether it's feasible to
> > complete them before code freeze. If not, I will retarget the tickets
> > to 1.6.0.
> >
> > There are some in progress blocker or critical issues with target version
> > 1.5.0:
> >
> > YUNIKORN-2030 Need to check headroom when trying other nodes for reserved
> > allocations
> >  
> > YUNIKORN-1706 We should clean up failed apps in shim side
> > 
> > YUNIKORN-1089 Application handling with invalid task group annotations
> > 
> >
> > Hope we can include those change, otherwise we might need to postpone
> > release.
> >
> > Here is the preliminary schedule:
> > Code freeze on 22 Feb
> > Branch on 23 Feb
> > First RC out latest by 1 March
> >
> > Based on the voting process, we can tentatively plan for release Yunikorn
> > 1.5.0 around the week of 4 - 8 March.
> >
> > Please feel free to share your thoughts.
> >
> > Thanks,
> > Tingyao
> >

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2042) REST API for specific queue

2024-02-26 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2042.
-
 Fix Version/s: 1.5.0
Target Version: 1.5.0  (was: 1.6.0)
Resolution: Fixed

change committed and cherry-picked into branch 1.5

> REST API for specific queue
> ---
>
> Key: YUNIKORN-2042
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2042
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Ted Lin
>Assignee: Ted Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> Expose a REST API for specific queue:
> /ws/v1/partition/%s/queue/%s/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2030) Need to check headroom when trying other nodes for reserved allocations

2024-02-26 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2030.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

change committed and cherry-picked into branch-1.5

thank you for the analysis and change.

> Need to check headroom when trying other nodes for reserved allocations
> ---
>
> Key: YUNIKORN-2030
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2030
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> As reported in YUNIKORN-1996, we are seeing many messages like below from 
> time to time:
> {code:java}
>  WARN    objects/application.go:1504 queue update failed unexpectedly 
>    {“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts 
> queue ‘root.test-queue’ over maximum allocation (map[memory:3300011278336 
> vcore:390584]), current usage (map[memory:3291983380480 pods:91 
> vcore:186000])“}{code}
> Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate 
> why it happened, because it's not supposed to happen as we check if there is 
> enough resource headroom before calling 
>  
> {code:java}
> func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation 
> {code}
> which printed the above message, and only call it when there is enough 
> headroom.
> There maybe a bug in headroom checking?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org