Re: [DISCUSSION] Yunikorn release 1.5.0

2024-02-23 Thread TingYao
Hi Everyone,

Update:

We've move some jiras to the next release, and we still got two jiras in
progress.
I have created Yunikorn 1.5 branch for all 4 repos(core, k8shim, interface,
web) as well. Upon the blocker issue fixed, I will start to cherry-picked,
tagging and go mod dependency changes.

Thanks,
Tingyao

TingYao  於 2024年2月18日 週日 下午8:45寫道:

> Hi Everyone,
>
> I would like to start the discussion for Release 1.5.0.
>
> Planned major features:
>
> YUNIKORN-970 Change queue metrics to labeled
> 
> YUNIKORN-2099 [Umbrella] K8shim simplification
> 
> YUNIKORN-2115 [Umbrella] Application tracking history - Phase 2
>  
> YUNIKORN-1362 filtering nodes in UI
> 
> YUNIKORN-1922 display pending resources in web UI
> 
> YUNIKORN-2140 Web UI: resource display rework
> i
>
> Additionally, minor enhancements and bug fixes have been covered as part
> of this release.
>
> There are some open items with target version 1.5.0:
>
> https://issues.apache.org/jira/browse/YUNIKORN-2030?jql=project%20%3D%20YUNIKORN%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20%22Target%20Version%22%20%3D%201.5.0%20ORDER%20BY%20priority%20DESC
>
> Please review this list and decide whether it's feasible to
> complete them before code freeze. If not, I will retarget the tickets
> to 1.6.0.
>
> There are some in progress blocker or critical issues with target version
> 1.5.0:
>
> YUNIKORN-2030 Need to check headroom when trying other nodes for reserved
> allocations
>  
> YUNIKORN-1706 We should clean up failed apps in shim side
> 
> YUNIKORN-1089 Application handling with invalid task group annotations
> 
>
> Hope we can include those change, otherwise we might need to postpone
> release.
>
> Here is the preliminary schedule:
> Code freeze on 22 Feb
> Branch on 23 Feb
> First RC out latest by 1 March
>
> Based on the voting process, we can tentatively plan for release Yunikorn
> 1.5.0 around the week of 4 - 8 March.
>
> Please feel free to share your thoughts.
>
> Thanks,
> Tingyao
>


[jira] [Created] (YUNIKORN-2451) add trackingType#String to decorate the logging output

2024-02-23 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created YUNIKORN-2451:


 Summary: add trackingType#String to decorate the logging output
 Key: YUNIKORN-2451
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2451
 Project: Apache YuniKorn
  Issue Type: Sub-task
Reporter: Chia-Ping Tsai


The tracking type in the log is an integer number. We can add String method for 
trackingType to offer more readable output.

[https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/ugm/queue_tracker.go#L90]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2243) Update k8shim scheduler plugin document

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2243.
--
Resolution: Won't Do

We leave the design documents as-is to reflect how the features were originally 
designed.

> Update k8shim scheduler plugin document
> ---
>
> Key: YUNIKORN-2243
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2243
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: JiaChi Wang
>Assignee: JiaChi Wang
>Priority: Minor
>
> This is a follow-up issue.
> Comparing YUNIKORN- and [k8s scheduler 
> plugin|https://yunikorn.apache.org/docs/design/scheduler_plugin] document, it 
> shows that the document doesn't match the latest updates. Therefore, we need 
> to update the document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2063) Add Pagination design to Application history

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2063.
--
Resolution: Won't Do

> Add Pagination design to Application history
> 
>
> Key: YUNIKORN-2063
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2063
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>Reporter: PoAn Yang
>Assignee: JiaChi Wang
>Priority: Major
>
> API endpoint: /ws/v1/history/apps



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2064) Add pagination design to container history

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2064.
--
Resolution: Won't Do

> Add pagination design to container history
> --
>
> Key: YUNIKORN-2064
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2064
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>Reporter: PoAn Yang
>Assignee: Kuan-Po Tseng
>Priority: Major
>
> API endpoint: /ws/v1/history/containers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2061) [Umbrella] Add Pagination to APIs which may have lot of items in response

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2061.
--
Resolution: Won't Do

Closing in favor of compression, which is being worked on.

> [Umbrella] Add Pagination to APIs which may have lot of items in response
> -
>
> Key: YUNIKORN-2061
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2061
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common, webapp
>Reporter: PoAn Yang
>Assignee: JiaChi Wang
>Priority: Major
> Attachments: Screenshot 2023-10-21 at 8.01.26 PM.png, Screenshot 
> 2023-10-21 at 8.03.15 PM.png, getQueueApplication_after_compress.png, 
> getQueueApplication_before_compress.png
>
>
> The following API may have lot of items in one response. It's good to add 
> pagination design to them.
>  * Queue applications 
> (/ws/v1/partition/\{partitionName}/queue/\{queueName}/applications)
>  * Application history (/ws/v1/history/apps)
>  * Container history (/ws/v1/history/containers)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2062) Add Pagination design to Queue applications API

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2062.
--
Resolution: Won't Do

> Add Pagination design to Queue applications API
> ---
>
> Key: YUNIKORN-2062
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2062
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>Reporter: PoAn Yang
>Assignee: JiaChi Wang
>Priority: Major
>
> API endpoint: 
> /ws/v1/partition/\{partitionName}/queue/\{queueName}/applications



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2016) [Umbrella] K8Shim simplification

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2016.

 Fix Version/s: 1.5.0
Target Version: 1.5.0  (was: 1.6.0)
Resolution: Fixed

Resolving as all subtasks are complete.

> [Umbrella] K8Shim simplification
> 
>
> Key: YUNIKORN-2016
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2016
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Sunil G
>Priority: Major
> Fix For: 1.5.0
>
>
> As part of recovery changes structures were found that can be simplified in 
> the k8shim.
> Some of it is left over from YUNIKORN-1187 some has been found later



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1987) Identity config fields not set explicitly

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1987.
--
Resolution: Not A Problem

> Identity config fields not set explicitly
> -
>
> Key: YUNIKORN-1987
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1987
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: JiaSheng Chen
>Priority: Major
>  Labels: pull-request-available
>
> Need to find a way to detect fields in any config derived through 
> unmarshalling yaml content to differentiate the default values from the 
> values set explicitly by users.
> For example, a int field in config. By default, zero is assigned as part of 
> the unmarshalling process. There could be a validation in place to throw in 
> case of zero value. So, we need to identify whether user has set it 
> explicitly or default value.
> In short, need to find a way to know whether field has been set or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1941) Limit Yunikorn to only use certain nodes to schedule workloads

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1941.
--
Resolution: Won't Do

> Limit Yunikorn to only use certain nodes to schedule workloads
> --
>
> Key: YUNIKORN-1941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1941
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Marc Singer
>Assignee: Marc Singer
>Priority: Major
>
> We want to limit Yunikorn to utilize a specific part of the Kubernetes 
> cluster for it's workloads. These nodes should have a label or annotation 
> that is configurable in the Yunikorn configuration and if present should 
> limit workloads to only be scheduled on these specific nodes.
> According to the slack #dev channel, this should be accomplishable by 
> limiting the nodes returned from the kubernetes-shim.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1347) Yunikorn triggers EKS auto-scaling even pods requests have exceeded the queue limit

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1347.
--
Resolution: Implemented

> Yunikorn triggers EKS auto-scaling even pods requests have exceeded the queue 
> limit 
> 
>
> Key: YUNIKORN-1347
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1347
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler, shim - kubernetes
>Reporter: Anthony Wu
>Priority: Major
>
> Hi guys,
> We are trying to utilise Yunikorn to manage our AWS EKS infrastructure to 
> limit resource usage for different users and groups. We also use k8s cluster 
> auto-scaler 
> ([https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler]) 
> for auto scaling of the cluster when necessary.
> *Environment*
>  * AWS EKS on k8s 1.21
>  * Yunikorn 1.1 running as k8s scheduler plugin to be most compatible
>  * cluster-autoscaler V1.21.0
> {*}Issues{*}:
> Let's say we have quene has be below limit
> {code:yaml}
> queues:               
> - name: dev
>   submitacl: "*"
>   resources: 
> max: 
>   memory: 100Gi
>       vcore: 10 
> {code}
>  
> Then we try to create 4 pods in the `dev` queue each requires 5 cores and 
> 50Gi memory
> Then we are getting 2 pods {{Running}} and 2 pods {{{}Pending{}}}, because 
> the queue has reached its limit of 10Gi memory and 10 cpus.
> We would expect the queued pods to not triggering EKS auto scaling, as they 
> would not be able to be allocated until other resources have been release in 
> the queue.
> But what we see is that, the Queued pods still trigger the cluster 
> auto-scaling regardless. As shown in the example below:
> {code:java}
> Status:   Pending
> ...
> Conditions:
>   Type   Status
>   PodScheduled   False
> Events:
>   Type ReasonAgeFromMessage
>    --   ---
>   Warning  FailedScheduling  3m5s   yunikorn0/147 nodes are 
> available: 147 Pod is not ready for scheduling.
>   Warning  FailedScheduling  3m5s   yunikorn0/147 nodes are 
> available: 147 Pod is not ready for scheduling.
>   Normal   Scheduling3m3s   yunikorn
> yunikorn/dask-user-07ff5f3b-8qjkl8 is queued and waiting for allocation
>   Normal   TriggeredScaleUp  2m53s  cluster-autoscaler  pod triggered 
> scale-up: 
> [{eksctl-cluster-nodegroup-spot-xlarge-compute-1-NodeGroup-8VURTD4WKCYV 0->4 
> (max: 16)}]
> {code}
> So eventually, EKS auto-added some hosts but not actually been used and 
> allocated as the pods are not approved to be scheduled yet.
> We also tried Gang scheduling with the pods in a task group, but it is also 
> having similar issues: Even the whole gang is not ready to schedule, Yunikorn 
> creates the place-holder pods which triggers auto-scaling of EKS cluster
> *Causes and potential solutions*
> We tried to look at both source code in the auto-scaler and Yunikorn, and we 
> think the reason is just that the auto-scaler does not know about Yunikorn 
> specific events and state (Pending but not QuotaApproved) of a Pod. It 
> searches all the Pods with `PodScheduled=False` to then check whether it 
> needs to add resources for them.
> The issue could be resolved from both side:
>  - To solve from auto-scaler side, it needs to know the special events and 
> state of Yunikorn
>  - To solve from Yunikorn side, I think it needs to not create the pod or at 
> least not in `Pending` phase until it is quota approved 
>  ** not sure how hard to achieve this, but as long as a pod is created and it 
> goes to Pending then auto-scaler will try to pick it up
> We think solving it from Yunikron side would be cleaner, since auto-scaler 
> should not need to know the k8s scheduler implementation in order to make a 
> decision. Also there are other auto-scaler alternatives like AWS Karpenter 
> could suffers the same issue when interact with Yunikorn.
> Wondering whether this issue report make sense to you guys. Let us know if 
> there are any other solutions and whether it is possible to be solved in 
> future :)
> Thanks a lot!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1278) Fix flaky E2E simple preemptor test suite runs

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-1278.

Fix Version/s: 1.5.0
   Resolution: Fixed

> Fix flaky E2E simple preemptor test suite runs
> --
>
> Key: YUNIKORN-1278
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1278
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: test - e2e
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 1.5.0
>
>
> e2e preemption test seems to be flaky. It is failing on different version of 
> K8s for different PR's.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1255) yunikorn web shows negative number of containers in container history

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1255.
--
Resolution: Delivered

All known causes for this issue have been resolved.

> yunikorn web shows negative number of containers in container history
> -
>
> Key: YUNIKORN-1255
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1255
> Project: Apache YuniKorn
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.1.0
> Environment: kubernetes 1.21.9
> yunikorn 1.0.0 as admission controller
>Reporter: Rafał Boniecki
>Priority: Major
> Attachments: Screenshot-2022071712-2237x499.png
>
>
> See attached image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1139) Handle node "ready" attribute like any other node attributes

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1139.
--
Resolution: Won't Do

> Handle node "ready" attribute like any other node attributes
> 
>
> Key: YUNIKORN-1139
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1139
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Please see the discussions - 
> [https://github.com/apache/incubator-yunikorn-core/pull/387#issuecomment-1070773205]
> Handle "Ready" attribute also as any other node attributes. Managing each 
> attribute in a separate place doesn't seem to be clean and organised, may 
> didn't get our attention at times. Yes, it may change during UpdateNode as 
> well but still we should handle it in {{{}initializeAttribute{}}}, 
> {{GetAttribute}} methods using appropriate locks like how listeners are being 
> handled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1153) Admission controller: first health check should be delayed

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1153.
--
Resolution: Won't Do

> Admission controller: first health check should be delayed
> --
>
> Key: YUNIKORN-1153
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1153
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Ryan Lo
>Priority: Minor
>
> When deploying Yunikorn locally, I often see the first health check failing:
> {noformat}
> Events:
>   Type Reason Age   From   Message
>    --      ---
>   Normal   Scheduled  3m12s default-scheduler  Successfully 
> assigned default/yunikorn-admission-controller-78c775cfd9-6pp8d to minikube
>   Normal   Pulled 3m12s kubeletContainer 
> image "apache/yunikorn:admission-latest" already present on machine
>   Normal   Created3m12s kubeletCreated 
> container yunikorn-admission-controller
>   Normal   Started3m11s kubeletStarted 
> container yunikorn-admission-controller
>   Warning  Unhealthy  2m52s (x2 over 3m2s)  kubeletStartup probe 
> failed: Get "https://192.168.49.2:9089/health": dial tcp 192.168.49.2:9089: 
> connect: connection refused
> {noformat}
> We need to add some {{initialDelaySeconds}} to wait with the first probe. 
> 10-15 seconds is probably a good value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-999) [UMBRELLA] Define and publish YuniKorn Improvement Proposal (YIP)

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-999.
-
Resolution: Abandoned

Closing due to lack of interest.

> [UMBRELLA] Define and publish YuniKorn Improvement Proposal (YIP)
> -
>
> Key: YUNIKORN-999
> URL: https://issues.apache.org/jira/browse/YUNIKORN-999
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: community, documentation, website
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
>  Labels: documentation
>
> In dev mailing list, we have discussed and voted to have YuniKorn Improvement 
> Proposal (YIP).
>  
> a YIP will define the following parts, including but not limited to:
> - what's considered a "major change" that needs a YIP
> - what should be included in a YIP (e.g. motivation/business justifications, 
> use case requirements, proposed changes, API changes, 
> migration/compatibility, rejected alternatives, etc)
> - who should initiate or be involved in a YIP
> - end-to-end process
>  
> This is an umbrella and will create subtasks.
>  
> We can publish the YIP process to website, and keep all finalized YIP in 
> Confluence. There're projects keeping their XIPs on confluence but I found 
> that is 1) hard to track changes and version control 2) hard to comment or 
> propose changes as not everyone has confluence access . Keep YIP itself on 
> website will solve those issues and make it easier to find.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-936) app and node recovery event ordering

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-936.
-
Resolution: Delivered

Closing as this was resolved elsewhere.

> app and node recovery event ordering
> 
>
> Key: YUNIKORN-936
> URL: https://issues.apache.org/jira/browse/YUNIKORN-936
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Wilfred Spiegelenburg
>Assignee: Peter Bacsko
>Priority: Major
>
> While working on YUNIKORN-905 a number of unit tests failed due to event 
> ordering. Looking at the change we might have had an issue in the RMProxy for 
> a long time.
> An update request could contain apps, asks and nodes. Processing was ordered 
> like that too. During recovery the order was/is important. There was never an 
> order requirement on the events send by a shim or a use of complex updates 
> events to support this ordering by the shim.
> An event to recover a node could be a separate UpdateRequest from the 
> applications that should be recovered. That means we relied on the go routine 
> and event ordering to hopefully do things correctly: i.e. events send by the 
> shim to create new apps would be processed before node recovery started. Even 
> in the previous implementation there was no guarantee that all the 
> application were added before a node was recovered. The unit tests in the 
> core used the order processing dependency to make sure it worked.
> That is not the real world scenario. and thus a dangerous assumption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-926) Scheduling 10 app * 5000 pod is faster than 1 app *50000 pod

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-926.
-
Resolution: Not A Problem

Many things can impact scheduling performance. Closing this for staleness as 
well as lack of actionable tasks.

> Scheduling 10 app * 5000 pod is faster than 1 app *5 pod
> 
>
> Key: YUNIKORN-926
> URL: https://issues.apache.org/jira/browse/YUNIKORN-926
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Weiwei Yang
>Priority: Minor
>
> This is found during the benchmark testing, on the same env, when scheduling
> A. 10 app * 5000 pods each, total 50k pods
> B. 1 app * 5 pods each, total 50k pods
> A is significantly faster than B. Not sure why this happens, ideally there 
> should be no diff between A and B in terms of perf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-788) Make the scheduler max QPS match the default Kubernetes API server max requests inflight

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-788.
-
Target Version:   (was: 1.6.0)
Resolution: Not A Bug

Closing as this is no longer relevant. QPS has nothing to do with number of 
in-flight requests, and K8s Quality-of-Service should be configured 
administratively for busy clusters.

> Make the scheduler max QPS match the default Kubernetes API server max 
> requests inflight
> 
>
> Key: YUNIKORN-788
> URL: https://issues.apache.org/jira/browse/YUNIKORN-788
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Chaoran Yu
>Assignee: Chaoran Yu
>Priority: Major
>
> The current max QPS is configured as 1000, which is much higher than the 
> [400|https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/]
>  max requests inflight that the API server supports. This mismatch could 
> cause requests to be dropped/rejected by the API server when the load is high 
> (e.g. when a surge occurs in the number of pods that need to be scheduled). 
> We should make the YK default match the API server default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-495) Update core design doc to align with current implementation

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-495.
-
Resolution: Won't Do

The design docs reflect the thinking behind the original designs. They should 
not be updated.

> Update core design doc to align with current implementation
> ---
>
> Key: YUNIKORN-495
> URL: https://issues.apache.org/jira/browse/YUNIKORN-495
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Wilfred Spiegelenburg
>Assignee: Manikandan R
>Priority: Major
>
> The core design document has a number of errors and is out of date after 
> YUNIKORN-317.
> It needs to be re-written from scratch to fix all issues that are in it, 
> these are the points I know of that are out of date based on the work since 
> 0.8:
>  * queue layout
>  * partition details
>  * cache removal
>  * pre-emption
>  * scheduling overhaul



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-457) Find a way to pass the RMID to the webservice

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-457.
-
Resolution: Abandoned

No longer relevant with current REST APIs.

> Find a way to pass the RMID to the webservice
> -
>
> Key: YUNIKORN-457
> URL: https://issues.apache.org/jira/browse/YUNIKORN-457
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Kinga Marton
>Priority: Major
>
> When updating the configuration through the REST API, we need an RMId to 
> reflect the changes in the configmap as well. With the actual approach this 
> might not work properly if we would have more than one RM registered, or if 
> we don't have any RM's.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-422) Automated install of OpenShift 4.5 with Yunikorn scheduler for BigData and ML on your laptop, virtual machine or baremetal server using CodeReady Containers

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-422.
-
Resolution: Abandoned

No movement in 3+ years, and most have moved on to kind for dev work.

> Automated install of OpenShift 4.5 with Yunikorn scheduler for BigData and ML 
> on your laptop, virtual machine or baremetal server using CodeReady Containers
> 
>
> Key: YUNIKORN-422
> URL: https://issues.apache.org/jira/browse/YUNIKORN-422
> Project: Apache YuniKorn
>  Issue Type: Test
>Reporter: marcredhat
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/marcredhat/crcdemos/blob/master/yunikorn/README.adoc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-408) Result mismatch between Container_Status and Container_History

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-408.
-
Resolution: Delivered

No longer relevant in current releases.

> Result mismatch between Container_Status and Container_History
> --
>
> Key: YUNIKORN-408
> URL: https://issues.apache.org/jira/browse/YUNIKORN-408
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: webapp
>Reporter: Vishwas
>Assignee: Adam Antal
>Priority: Major
> Attachments: appStatus_vs_contStatus.PNG, scheduler.log
>
>
> I have created a single pod to use the yunikorn scheduler and it gets 
> allocated properly to the queue.
> When I delete the pod using the kubectl delete comand, I still see the 
> container count as 1 in container_history but number of running container as 
> 0 and also in application status I see it is running.
> I see below log in scheduler, not sure if it is related:
> {code:java}
> 2020-09-10T08:43:41.895ZDEBUG   cache/context.go:241failed to 
> update pod in cache   {"podName": "app-sleep-0", "error": "pod 
> e9a531a8-e001-4cc9-bd3e-2d63852eacd7 is not added to scheduler cache, so 
> cannot be updated"}{code}
> I have attached the snapshot of the UI and the scheduler.log from pod.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-305) Documentation for state aware app sorting user scenario

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-305.
-
Resolution: Abandoned

Closing, as we are deprecating state-aware scheduling for removal.

> Documentation for state aware app sorting user scenario
> ---
>
> Key: YUNIKORN-305
> URL: https://issues.apache.org/jira/browse/YUNIKORN-305
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Ayub Pathan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1154) Add YuniKorn Release Procedure translation zh-cn

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1154.
--

> Add YuniKorn Release Procedure translation zh-cn
> 
>
> Key: YUNIKORN-1154
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1154
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: cdmikechen
>Assignee: Xiang Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-690) [Umbrella] UI usability enhancements 2

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-690.
-

> [Umbrella] UI usability enhancements 2
> --
>
> Key: YUNIKORN-690
> URL: https://issues.apache.org/jira/browse/YUNIKORN-690
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler, webapp
>Reporter: Weiwei Yang
>Assignee: Wen-Chien,Juan
>Priority: Critical
> Fix For: 1.4.0
>
>
> Continuous effort to improve the UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1510) Adding Chinese translation of Go module updates

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1510.
--

> Adding Chinese translation of Go module updates
> ---
>
> Key: YUNIKORN-1510
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1510
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>Reporter: Chen Yu Teng
>Assignee: Lin You-Xuan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1922) show queue pending resources on web UI

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1922.
--

> show queue pending resources on web UI
> --
>
> Key: YUNIKORN-1922
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1922
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: webapp
>Reporter: Wilfred Spiegelenburg
>Assignee: Hsuan Zong Wu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.4.0
>
>
> The rest response for the queue contains the pending resource. We ignore that 
> information when we show the queue details in the drawer painted on the right 
> side. Pending resources is a valuable piece of data to render.
> It provides insight into what is waiting where.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2127) stateAwareFilter doesn't work as description

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2127.
--

> stateAwareFilter doesn't work as description
> 
>
> Key: YUNIKORN-2127
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2127
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: PoAn Yang
>Assignee: PoAn Yang
>Priority: Major
>  Labels: pull-request-available
>
> In stateAwareFilter description, it only allows one (1) application with a 
> state that is not running in the list of candidates. The non-running state 
> can be Starting or Accepted. However, in the following case, the last 
> assertion cannot pass, because the result of candidate list is [appID0, 
> appID1, appID2, appID3]. We should either update the description or fix the 
> function.
>  
> {noformat}
> func TestStateAwareFilter(t *testing.T) {
>     // stable sort is used so equal values stay where they were
>     res := resources.NewResourceFromMap(map[string]resources.Quantity{
>         "vcore": resources.Quantity(100)})
>     // setup all apps with pending resources, all accepted state
>     input := make(map[string]*Application, 4)
>     for i := 0; i < 4; i++ {
>         num := strconv.Itoa(i)
>         appID := "app-" + num
>         app := newApplication(appID, "partition", "queue")
>         app.pending = res
>         input[appID] = app
>         err := app.HandleApplicationEvent(RunApplication) // change app state 
> from New to Accepted
>         assert.NilError(t, err, "state change failed for app %v", appID)
>         err = app.HandleApplicationEvent(RunApplication) // change app state 
> from Accepted to Starting
>         assert.NilError(t, err, "state change failed for app %v", appID)
>         // make sure the time stamps differ at least a bit (tracking in nano 
> seconds)
>         time.Sleep(time.Nanosecond * 5)
>     }
>     list := sortApplications(input, policies.StateAwarePolicy, false, nil)
>     assertAppListLength(t, list, []string{appID0}, "should only have one 
> starting app") // this will fail
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1787) Re-use predicate results for reservation

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1787.
--

> Re-use predicate results for reservation
> 
>
> Key: YUNIKORN-1787
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1787
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler, shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>
> As mentioned in 
> https://issues.apache.org/jira/browse/YUNIKORN-1378?focusedCommentId=17720232&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17720232,
>  we don't need to run predicates again for reservation.
> If the cluster is busy and no suitable node were found for the request, we 
> check if the predicate that failed is a reservation predicate. If it is, then 
> there's no reason to run the reservation predicates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-474) Remove direct dependency of core internals from shim

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-474.
-

> Remove direct dependency of core internals from shim
> 
>
> Key: YUNIKORN-474
> URL: https://issues.apache.org/jira/browse/YUNIKORN-474
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Priority: Major
>
> Internal core implementations are used in the shim for unit tests.
> Unit tests should not depend on internal implementations of the core 
> structures. The parts of the core should be mocked up but not directly called 
> by the unit tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2117) Track user events

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2117.
--

> Track user events
> -
>
> Key: YUNIKORN-2117
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2117
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-150) Add a link on queue’s detail info page that links to the apps page to show running in this queue

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-150.
-

> Add a link on queue’s detail info page that links to the apps page to show 
> running in this queue
> 
>
> Key: YUNIKORN-150
> URL: https://issues.apache.org/jira/browse/YUNIKORN-150
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: webapp
>Reporter: Weiwei Yang
>Assignee: Akhil PB
>Priority: Major
> Fix For: 1.0.0
>
>
> A problem we usually have is. When we look at queues, we don't know what are 
> the apps using the queue resources.
> We could go back to the apps page and find out the apps by going over all 
> apps, but pretty time-consuming. It would be good if we can add a quick link 
> for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2166) Translate 1.4 release notes and roadmap to zh-cn

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2166.
--

> Translate 1.4 release notes and roadmap to zh-cn
> 
>
> Key: YUNIKORN-2166
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2166
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: documentation
>Reporter: Wilfred Spiegelenburg
>Assignee: JiaSheng Chen
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.4.0
>
>
> Release 1.4 has been created and the web site is updated. The notes have  
> only been partially translated and need further work.
> The roadmap has been updated for 1.5 and need translating.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2206) Make user/queue headroom checks more performant

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2206.
--

> Make user/queue headroom checks more performant
> ---
>
> Key: YUNIKORN-2206
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2206
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>
> Inside Application.tryAllocate() and tryReservedAllocate(), we have the 
> following code:
> {noformat}
>   if !userHeadroom.FitInMaxUndef(ask.GetAllocatedResource()) {
>   continue
>   }
>   // check if this fits in the queue's headroom
>   if !headRoom.FitInMaxUndef(ask.GetAllocatedResource()) {
>   continue
>   }
> {noformat}
> These calls are relatively expensive, but necessary. Calling them once isn't 
> a problem. 
> However, repeated calls are slowing things down considerably.  Yunikorn keeps 
> trying to schedule asks and until there's enough room for the request, these 
> checks will fail. We need a way to speed it up.
> See YUNIKORN-2201 about the details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2225) Consider using "ginkgo.Skip" if the env conditions are not matched

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2225.
--

> Consider using "ginkgo.Skip" if the env conditions are not matched
> --
>
> Key: YUNIKORN-2225
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2225
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Kuan-Po Tseng
>Priority: Minor
>
> For example: 
> https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/bin_packing/bin_packing_test.go#L72
> The spec requires 2 nodes at least, so it gets failed if the cluster has 
> single node only. It seems to me "skip" is more suitable than "fail" for such 
> spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2401) Update github.com/uber/jaeger-client-go and github.com/uber/jaeger-lib

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2401.
--

> Update github.com/uber/jaeger-client-go and github.com/uber/jaeger-lib
> --
>
> Key: YUNIKORN-2401
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2401
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Yu-Lin Chen
>Priority: Major
>  Labels: core
>
> as title, the latest version of jaeger-client-go is 2.30.0 
> (https://github.com/jaegertracing/jaeger-client-go/releases)
> the latest version of jaeger-lib is v2.4.1 
> (https://github.com/jaegertracing/jaeger-lib/releases)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2226) Update mutation webhook process doc

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2226.
--

> Update mutation webhook process doc
> ---
>
> Key: YUNIKORN-2226
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2226
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dong-Lin Hsieh
>Assignee: Dong-Lin Hsieh
>Priority: Minor
>  Labels: pull-request-available
>
> Add updatePreemptionInfo to the doc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2402) Replace jaeger-client-go by OpenTelemetry

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2402.
--

> Replace jaeger-client-go by OpenTelemetry
> -
>
> Key: YUNIKORN-2402
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2402
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Priority: Major
>
> jaeger-client-go is deprecated and the substitute is 
> https://opentelemetry.io. Please check 
> https://github.com/jaegertracing/jaeger-client-go?tab=readme-ov-file#-this-library-is-deprecated
>  for more details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1950) Improving test coverage for whole user/group enforcement feature - Phase 2

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-1950.

Resolution: Fixed

> Improving test coverage for whole user/group enforcement feature - Phase 2
> --
>
> Key: YUNIKORN-1950
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1950
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: PoAn Yang
>Assignee: Dong-Lin Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> There are some remaining items which are not covered by PR for YUNIKORN-1871.
>  
> Config changes:
> 15. set user limit only and ensure it has been honoured. change the same user 
> limit settings and ensure the new config has been honoured.
> 16. set group limits only and ensure it has been honoured. change the same 
> group limit settings and ensure the new config has been honoured.
> 17. set user limit only and ensure it has been honoured. remove the user 
> limit settings, add new user limit and ensure the new config has been 
> honoured.
> 18. set group limits only and ensure it has been honoured. remove the group 
> limit settings, add a new group limit and ensure the new config has been 
> honoured.
> 19. set user limit & wild card user limit only and ensure it has been 
> honoured. change the same user limit & wild card user settings and ensure the 
> new config has been honoured.
> 20. set group limit & wild card group limit only and ensure it has been 
> honoured. change the same group limit & wild card group settings and ensure 
> the new config has been honoured.
> 21. set group limit for more than 1 group (group A & group B) & wild card 
> group limit only and ensure it has been honoured. remove the group, group B. 
> apply config changes and ensure users of group B use wild card user group 
> settings. also do some activities by users of another group, say group C. 
> Ensure users of both group B and group C should not exceed the wild card 
> group quota cumulatively, whereas users of group A should not exceed group A 
> quota.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Reopened] (YUNIKORN-1950) Improving test coverage for whole user/group enforcement feature - Phase 2

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reopened YUNIKORN-1950:


> Improving test coverage for whole user/group enforcement feature - Phase 2
> --
>
> Key: YUNIKORN-1950
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1950
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: PoAn Yang
>Assignee: Dong-Lin Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> There are some remaining items which are not covered by PR for YUNIKORN-1871.
>  
> Config changes:
> 15. set user limit only and ensure it has been honoured. change the same user 
> limit settings and ensure the new config has been honoured.
> 16. set group limits only and ensure it has been honoured. change the same 
> group limit settings and ensure the new config has been honoured.
> 17. set user limit only and ensure it has been honoured. remove the user 
> limit settings, add new user limit and ensure the new config has been 
> honoured.
> 18. set group limits only and ensure it has been honoured. remove the group 
> limit settings, add a new group limit and ensure the new config has been 
> honoured.
> 19. set user limit & wild card user limit only and ensure it has been 
> honoured. change the same user limit & wild card user settings and ensure the 
> new config has been honoured.
> 20. set group limit & wild card group limit only and ensure it has been 
> honoured. change the same group limit & wild card group settings and ensure 
> the new config has been honoured.
> 21. set group limit for more than 1 group (group A & group B) & wild card 
> group limit only and ensure it has been honoured. remove the group, group B. 
> apply config changes and ensure users of group B use wild card user group 
> settings. also do some activities by users of another group, say group C. 
> Ensure users of both group B and group C should not exceed the wild card 
> group quota cumulatively, whereas users of group A should not exceed group A 
> quota.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2333) Redundant min and max computation in resources.go

2024-02-23 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-2333.
--

> Redundant min and max computation in resources.go
> -
>
> Key: YUNIKORN-2333
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2333
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Methods like `ComponentWiseMin` & `ComponentWiseMax` can avoid the redundant 
> min and max calculations for resource computed already. Not really sure about 
> the performance gain, but good to make this change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2450) Rename `updateLowestId` to `updateLowestID`, `TestLoggerIds` to `TestLoggerIDs`

2024-02-23 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created YUNIKORN-2450:


 Summary: Rename `updateLowestId` to `updateLowestID`, 
`TestLoggerIds` to `TestLoggerIDs`
 Key: YUNIKORN-2450
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2450
 Project: Apache YuniKorn
  Issue Type: Improvement
Reporter: Chia-Ping Tsai


This is follow-up of https://issues.apache.org/jira/browse/YUNIKORN-2413

The methods are shown below.

[https://github.com/apache/yunikorn-core/blob/master/pkg/events/event_ringbuffer.go#L206]
[https://github.com/apache/yunikorn-core/blob/master/pkg/log/logger_test.go#L38]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2449) add PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 to github actions for web CI

2024-02-23 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created YUNIKORN-2449:


 Summary: add PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 to github actions 
for web CI
 Key: YUNIKORN-2449
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2449
 Project: Apache YuniKorn
  Issue Type: Improvement
Reporter: Chia-Ping Tsai


from [~wilfreds] comment (YUNIKORN-2477)

{quote}
in the Makefile we do have an extra argument 
{{PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1}} which we do not have in the github 
action calls. There we just run the {{yarn}} commands directly. We might need 
to add the same in the github actions or call the make targets to get the 
equivalence.
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org