[jira] [Commented] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps

2024-10-14 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889433#comment-17889433
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2860:
-

Not possible because the placeholder pods are also used to trigger autoscaling 
of a cluster. You need pods for that as the cluster autoscaler needs to know 
the details. Without pods you will not scale up and cause all kinds of issues.

> submit gang applications Simultaneously  may cause unexpected pending apps 
> ---
>
> Key: YUNIKORN-2860
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
>Reporter: shawn
>Assignee: Qi Zhu
>Priority: Major
> Attachments: image-2024-09-11-15-41-12-142.png, 
> image-2024-09-11-15-42-07-739.png, image-2024-09-13-15-33-13-964.png, 
> image-2024-09-13-15-33-19-380.png, image-2024-09-13-15-35-26-177.png, 
> state-dump.txt, yunikorn-scheduler.txt
>
>
>   
>   I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get 
> pending, while two pgs get running, that's not expected.
>  It can be reproduced as follows:
> queues
>       1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n 
> yunikorn
>  * queues.yaml
> {code:java}
> partitions:
>   - name: default
>     queues:
>       - name: root
>         queues:
>           - name: my-dev
>             submitacl: "*"
>             resources:
>               guaranteed: { memory: 1G, vcore: 1 }
>               max: { memory: 2G, vcore: 2 }{code}
>          2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: gang-scheduling-job-example1
> spec:
>   completions: 2
>   parallelism: 2
>   template:
>     metadata:
>       labels:
>         app: sleep
>         applicationId: "gang-scheduling-job-example1"
>         queue: root.my-dev
>       annotations:
>         yunikorn.apache.org/task-group-name: task-group-example-0
>         yunikorn.apache.org/task-groups: |-
>           [{
>               "name": "task-group-example-0",
>               "minMember": 2,
>               "minResource": {
>                 "cpu": "1",
>                 "memory": "1G"
>               },
>               "nodeSelector": {},
>               "tolerations": [],
>               "affinity": {}
>           }]
>     spec:
>       schedulerName: yunikorn
>       restartPolicy: Never
>       containers:
>         - name: sleep30
>           image: "nginx:latest"
>           command: ["sleep", "9"]
>           resources:
>             requests:
>               cpu: "1"
>               memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always 
> reproducible)
> !image-2024-09-11-15-41-12-142.png!
>  
> app state as follows
> !image-2024-09-11-15-42-07-739.png|width=754,height=280!
> full state dump as state-dump.txt, yunikorn scheduler logs are in 
> yunikorn-scheduler.txt
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2926) The Pod using gang scheduling is stuck in the Pending state

2024-10-14 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889430#comment-17889430
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2926:
-

When the placeholders are released we do not immediately try to place the real 
pod (the larger one) on a node. We cannot do that as we need to track changes. 
The released placeholder must be processed before we look again.

If the real pod is larger than the placeholder the large resource requirement 
might not fit in the queue or any node and thus never get scheduled. So we 
process them as normal allocations with all the checks. Depending on the 
difference you might be able to accomodate all real pods or just a fraction of 
them.

These scenarios mean that in the pods stay pending and there is nothing wrong. 
You need to do a proper analysis of why the pod stays pending. Nothing provided 
here shows that we have a problem.

> The Pod using gang scheduling is stuck in the Pending state
> ---
>
> Key: YUNIKORN-2926
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2926
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: wangzhihui
>Priority: Minor
> Fix For: 1.5.0
>
> Attachments: image-2024-10-15-11-54-33-458.png, image.png
>
>
> desc:
>  The reason for the real allocation is larger than all placeholder,Then 
> release all allocations。Causing all Pods is Pending state.
> !image-2024-10-15-11-54-33-458.png!
> !image.png!
> {code:java}
> // code placeholder
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: simple-gang-job
> spec:
>   completions: 2
>   parallelism: 2
>   template:
>     metadata:
>       labels:
>         app: sleep
>         applicationId: "simple-gang-job"
>         queue: root.default
>       annotations:
>         yunikorn.apache.org/schedulingPolicyParameters: 
> "placeholderTimeoutInSeconds=30 gangSchedulingStyle=Hard"
>         yunikorn.apache.org/task-group-name: task-group-example
>         yunikorn.apache.org/task-groups: |-
>           [{
>               "name": "task-group-example",
>               "minMember": 1,
>               "minResource": {
>                 "cpu": "100m",
>                 "memory": "50M"
>               },
>               "nodeSelector": {},
>               "tolerations": [],
>               "affinity": {},
>               "topologySpreadConstraints": []
>           }]
>     spec:
>       schedulerName: yunikorn
>       restartPolicy: Never
>       containers:
>         - name: sleep30
>           image: "alpine:latest"
>           command: ["sleep", ""]
>           resources:
>             requests:
>               cpu: "200m"
>               memory: "50M" {code}
> solution:
> If the app is in Hard mode, it will transition to a Failing state. If it is 
> in Soft mode, it will transition to a Resuming state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2925) Remove internal objects from application REST response

2024-10-14 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2925:
---

 Summary: Remove internal objects from application REST response
 Key: YUNIKORN-2925
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2925
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The REST api for application objects exposes an internal object type (resource) 
directly without conversion. That means any internal representation change will 
break REST compatibility. This should never have happened and needs to be 
reversed ASAP. All other REST calls 

The other problem with the exposed information is that it is only accurate for 
the COMPLETING or COMPLETED state of an application. The data is incomplete at 
any other state as it is only updated when an allocation finishes. Running 
allocations are not included. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2907) Queue config processing log spew

2024-10-07 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2907:
---

 Summary: Queue config processing log spew
 Key: YUNIKORN-2907
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2907
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Wilfred Spiegelenburg


During configuration updates a shadow queue structure is build based on the new 
configuration. The shadow structure is then walked and compared to the existing 
queue structure. Actions are taken based on the existing queue structure: add 
or remove of queues that exist in new or existing structure. Update if 
differences are found between queues that exist in new and existing structures.

During the build of the shadow structure queue creations are logged. This logs 
the creation of the whole queue structure. The logs do not make clear the 
queues are not really added but that it is the shadow structure being created. 
In case of large queue structures this causes a log spew, and makes the log 
difficult to read.

The actions taken based on the comparison are logged clearly.

We need to be able to distinguish between a real create and one for the shadow 
create in the log. The same code is executed when we create the "real" queue.
The creation of the shadow queue structure should not log, log only at debug 
level and or log with a clear message that it is the shadow structure creation. 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2901) when creating new queues, queue name is used as queue path

2024-10-07 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2901:

Target Version: 1.7.0

See the init in the {{newDynamicQueueInternal()}} the 
{{newDynamicQueueInternal()}} should follow the same setup for the queue path.

The path should be set to the full path. It does not affect scheduling but can 
affect metrics and logging. 

> when creating new queues, queue name is used as queue path
> --
>
> Key: YUNIKORN-2901
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2901
> Project: Apache YuniKorn
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0
>Reporter: Hengzhe Guo
>Priority: Major
>
> At 
> [https://github.com/apache/yunikorn-core/blame/master/pkg/scheduler/objects/queue.go#L121]
>  in NewConfiguredQueue, new queue's name is made the path. For non-root 
> queues, the path is later correctly set as full path at line 137. But several 
> actions between them use this name as path, causing issues like emitting 
> metrics with wrong label



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2901) when creating new queues, queue name is used as queue path

2024-10-07 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2901:

Affects Version/s: 1.6.0
   1.5.0
   1.4.0
   1.3.0
   1.2.0
   1.1.0
   1.0.0

> when creating new queues, queue name is used as queue path
> --
>
> Key: YUNIKORN-2901
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2901
> Project: Apache YuniKorn
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0
>Reporter: Hengzhe Guo
>Priority: Major
>
> At 
> [https://github.com/apache/yunikorn-core/blame/master/pkg/scheduler/objects/queue.go#L121]
>  in NewConfiguredQueue, new queue's name is made the path. For non-root 
> queues, the path is later correctly set as full path at line 137. But several 
> actions between them use this name as path, causing issues like emitting 
> metrics with wrong label



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2895) Don't add duplicated allocation to node when the allocation ask fails

2024-10-01 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886295#comment-17886295
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2895:
-

There is something more broken. We should never see an ask at that point in the 
scheduling cycle that has been allocated. The application was write locked, and 
remains locked, a number of function calls before we get here. We filter all 
allocations and only proceed with the allocations that have not been allocated. 
We do that inside the lock.

The allocation should only be manipulated through the application which means 
the allocated flag cannot be changed while scheduling is in progress.

I think the issue is located in the maintenance of the {{sortedRequests}} on 
the application. That list used to be rebuild each cycle but now we 
insert/delete from the slice. During recovery I think we broke things. Recovery 
is using the same path as a node addition so this *could* happen on any node 
add or maybe even on a simple add of an new ask.

When we call {{application.AddAllocationAsk}} we check that the object is not 
allocated. That fact is always true as we create a new ask object from the SI. 
So we skip to the next step. This triggers a check for an already known 
outstanding ask. If that outstanding ask is not allocated we replace the object 
with the new one. We also make sure that we update resources if those have 
changed on the queues and app (pending).

{*}First issue{*}: if the old ask _IS_ allocated we will still replace that 
allocation with the new one in the requests map. We skip adjusting the pending 
resources using the already registered ask. This is where it breaks down: the 
requests list should never contain already allocated objects. It means we have 
a reference leak, and thus a memory leak. Long after the allocation is removed 
a reference will be kept in requests that will not get removed until we clean 
up the application. The GC will thus not remove it. For long running 
applications with lots of requests this can become significant.

{*}Second issue{*}: Caused by the replacement also. The new object is not 
marked allocated which causes a big problem as we will try and schedule it. We 
now could have an unallocated and an allocated object with the same key one in 
requests and one in allocations. After we schedule the second one the 
allocations list will be updated and we lose the original info.

{*}Third issue{*}: independent of the state we proceed to add the ask to the 
requests. The requests are stored in a map based on the allocation key. Which 
means we are always only tracking a single ask. Never any duplicates. The 
sorted requests however is a sorted slice of references to objects. There is no 
checks in the add into the sorted request slice to replace the existing entry. 
We will happily add a second one to the slice. Two objects same key they are 
both considered when scheduling which means we can easily cause issues there.

This code of adding allocations and asks needs a proper review. Over time with 
multiple changes on top of each other we have introduced issues here.

> Don't add duplicated allocation to node when the allocation ask fails
> -
>
> Key: YUNIKORN-2895
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2895
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
>
> When i try to revisit the new update allocation logic, the potential 
> duplicated allocation to node will happen if the allocation already 
> allocated.  And we try to add the allocation to the node again and don't 
> revert it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2885) Fix security vulnerabilities in dependencies

2024-09-30 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885865#comment-17885865
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2885:
-

Since updating to pnpm v9 dependabot, which we used as a tool to do this for 
us, no longer works. There is an open issue against dependabot for [pnpm v9 
support.|https://github.com/dependabot/dependabot-core/issues/10534] Until that 
gets fixed we need to make sure that we run this kind of a check and update 
before each release.

We need to have this documented or tracked somewhere to make sure we do not 
forget when get to YuniKorn 1.7 in a couple of months.

[~ccondit] / [~pbacsko] for some more visibility

> Fix security vulnerabilities in dependencies
> 
>
> Key: YUNIKORN-2885
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2885
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: webapp
>Reporter: JunHong Peng
>Assignee: JunHong Peng
>Priority: Major
>  Labels: pull-request-available
>
> {{pnpm audit}} report: 
> [audit-report.md|https://github.com/user-attachments/files/17089735/audit-report.md]
> 26 vulnerabilities found
> Severity: 12 moderate | 14 high
> After Upgrade Angular v18 (#YUNIKORN-2861) Audit Report: 
> [audit-report.md|https://github.com/user-attachments/files/17164041/audit-report.md]
> 8 vulnerabilities found
> Severity: 3 moderate | 5 high



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2886) update Spark operator documentation for YuniKorn integration

2024-09-23 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2886:
---

 Summary: update Spark operator documentation for YuniKorn 
integration
 Key: YUNIKORN-2886
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2886
 Project: Apache YuniKorn
  Issue Type: New Feature
  Components: documentation
Reporter: Wilfred Spiegelenburg


Spark Operator 2.0 has been released with full YuniKorn support. We need to 
update the website and push this information.

Spark Operator with YuniKorn details:
 * Support gang scheduling with Yunikorn
 * Set schedulerName to Yunikorn
 * Account for spark.executor.pyspark.memory in Yunikorn gang scheduling 

See [Spark Operator 
v2.0.0|https://github.com/kubeflow/spark-operator/releases/tag/v2.0.0] tag for 
details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2784) Scheduler stuck

2024-09-22 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883644#comment-17883644
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2784:
-

That pod is not scheduled by YuniKorn. You would need to debug the default 
scheduler to figure out why that pod is not scheduled. Only pods with the 
{{schedulerName}} set to YuniKorn are relevant for us to look at.

> Scheduler stuck
> ---
>
> Key: YUNIKORN-2784
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2784
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Dmitry
>Priority: Major
> Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot 
> 2024-08-02 at 1.20.23 PM.png, Screenshot 2024-09-18 at 7.26.17 PM.png, 
> dumps.tgz, logs
>
>
> Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending 
> (screenshot 1). Also all other ones, but these are the most visible and 
> should be running 100%.
> After restarting the scheduler, all get scheduled immediately (screenshot 2).
> Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and 
> `/debug/pprof/goroutine?debug=2`
> Also logs from the scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2784) Scheduler stuck

2024-09-18 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882842#comment-17882842
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2784:
-

Correct there is no instant way to move. That is why we are looking at the 
change in YUNIKORN-2791. It will expose all pods even the ones not scheduled by 
YuniKorn inside YuniKorn. Instead of the pods showing up as a usage on the node 
only we see the pod and can look at possible preemption. This is the same case 
for all pod types not just daemon sets.

You have a limit range set on your cluster. The pods might be tiny when you 
create them but they are not when you schedule them. The pod asks for 3GB of 
memory as each container is given a minimum of 1GB. Check the pod for details 
it is annotated on the pod that the container resources were changed. The limit 
range will be applied to every pod in the cluster. Which means that a pod with 
3 containers each asking for 100MB of memory, 300MB total for the pod, after 
the limit range application needs 3GB when scheduling. A 10 fold increase. If 
that happens for all your pods you waste a huge amount of resources. It could 
explain also why the node is seen as "full" when you expect it to be empty.

> Scheduler stuck
> ---
>
> Key: YUNIKORN-2784
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2784
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Dmitry
>Priority: Major
> Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot 
> 2024-08-02 at 1.20.23 PM.png, dumps.tgz, logs
>
>
> Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending 
> (screenshot 1). Also all other ones, but these are the most visible and 
> should be running 100%.
> After restarting the scheduler, all get scheduled immediately (screenshot 2).
> Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and 
> `/debug/pprof/goroutine?debug=2`
> Also logs from the scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2784) Scheduler stuck

2024-09-18 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882590#comment-17882590
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2784:
-

I think we can get into this situation when the node is full and all pods 
running on the node are either daemonset pods already or have a higher priority 
than the daemonset pod we are preempting for. That does not seem the case here.

Although it is not much different than described above the reason we cannot 
find pods is slightly different.

What I can see is that the pod has a node selector for the node 
{{prp-perfsonar-1.ucsc.edu}} defined and we have reserved the node 
{{{}prp-perfsonar-1.ucsc.edu{}}}. That is the correct node. The allocation has 
the following annotation on it inside YuniKorn:
{code:java}
"yunikorn.apache.org/requiredNode": "prp-perfsonar-1.ucsc.edu" {code}
The question is now why does the node not allow the simple allocation? Tracking 
back to the state dump and the node shows that we have not enough resources 
available to place the pod. This is the partial node detail from the dump:
{code:java}
          "nodeID": "prp-perfsonar-1.ucsc.edu",
          "capacity": {
            "devices.kubevirt.io/kvm": 1000,
            "devices.kubevirt.io/tun": 1000,
            "devices.kubevirt.io/vhost-net": 1000,
            "ephemeral-storage": 609974506511,
            "hugepages-1Gi": 0,
            "hugepages-2Mi": 0,
            "memory": 16273350656,
            "pods": 110,
            "smarter-devices/fuse": 20,
            "smarter-devices/vfio": 20,
            "smarter-devices/vfio_vfio": 20,
            "vcore": 16000
          },
          "allocated": {
            "memory": 1073741824,
            "pods": 1,
            "vcore": 100
          },
          "occupied": {
            "memory": 12673089536,
            "pods": 15,
            "vcore": 1883
          },
          "available": {
            "devices.kubevirt.io/kvm": 1000,
            "devices.kubevirt.io/tun": 1000,
            "devices.kubevirt.io/vhost-net": 1000,
            "ephemeral-storage": 609974506511,
            "hugepages-1Gi": 0,
            "hugepages-2Mi": 0,
            "memory": 2526519296,
            "pods": 94,
            "smarter-devices/fuse": 20,
            "smarter-devices/vfio": 20,
            "smarter-devices/vfio_vfio": 20,
            "vcore": 14017
          },
{code}
The only other pod that YuniKorn is aware of on that node is another daemonset 
pod. That pod has the ID dae0ed3b-2cbd-4286-96b6-e220ffcaacb7. The pod we are 
trying to place is requesting 3Gi of memory and there is only 2.5Gi available. 
So we resrve the node and try to preempt on that specific node. The other 
daemon set pod is filtered out and that leaves us with "nothing" to preempt and 
thus stopped.

This is a side effect of running multiple schedulers in the cluster. The node 
is occupied with pods placed by the default scheduler. YuniKorn does not see 
those pods (yet) as per YUNIKORN-2791. That leaves us in a state that we cannot 
find anything to preempt and thus not get the pod up and running.

One of the main reasons not to run multiple schedulers in a cluster.

> Scheduler stuck
> ---
>
> Key: YUNIKORN-2784
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2784
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Dmitry
>Priority: Major
> Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot 
> 2024-08-02 at 1.20.23 PM.png, dumps.tgz, logs
>
>
> Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending 
> (screenshot 1). Also all other ones, but these are the most visible and 
> should be running 100%.
> After restarting the scheduler, all get scheduled immediately (screenshot 2).
> Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and 
> `/debug/pprof/goroutine?debug=2`
> Also logs from the scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2868) [UMBRELLA] YuniKorn 1.6.0 release efforts

2024-09-17 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882562#comment-17882562
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2868:
-

[~pbacsko] before we build and publish the helm chart we need the linked PR 
committed and applied at least locally to the code from which we build the 
chart. Otherwise [artifact 
hub|https://artifacthub.io/packages/helm/yunikorn/yunikorn] will not show the 
correct K8s versions we support.

> [UMBRELLA] YuniKorn 1.6.0 release efforts
> -
>
> Key: YUNIKORN-2868
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2868
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2871) Update website for 1.6.0

2024-09-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2871:

Description: 
Multiple tasks all need to be done at once:
 * create versioned docs
 * create release announcement
 * update downloads page
 * update roadmap doc
 * update doap file
 * K8s supported versions update to add 1.30 and 1.31

  was:
Multiple tasks all need to be done at once:
 * create versioned docs
 * create release announcement
 * update downloads page
 * update roadmap doc
 * update doap file


> Update website for 1.6.0
> 
>
> Key: YUNIKORN-2871
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2871
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Multiple tasks all need to be done at once:
>  * create versioned docs
>  * create release announcement
>  * update downloads page
>  * update roadmap doc
>  * update doap file
>  * K8s supported versions update to add 1.30 and 1.31



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2870) Release notes for 1.6.0

2024-09-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2870:

Description: 
Jiras have been tagged with release-notes for this version. These jiras need to 
be added with a special mention in the release notes.

[https://issues.apache.org/jira/issues/?filter=12352474]

This Jira might require multiple people to help write the release notes for the 
specific jiras mentioned.

> Release notes for 1.6.0
> ---
>
> Key: YUNIKORN-2870
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2870
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Jiras have been tagged with release-notes for this version. These jiras need 
> to be added with a special mention in the release notes.
> [https://issues.apache.org/jira/issues/?filter=12352474]
> This Jira might require multiple people to help write the release notes for 
> the specific jiras mentioned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2871) Update website for 1.6.0

2024-09-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2871:

Description: 
Multiple tasks all need to be done at once:
 * create versioned docs
 * create release announcement
 * update downloads page
 * update roadmap doc
 * update doap file

> Update website for 1.6.0
> 
>
> Key: YUNIKORN-2871
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2871
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Multiple tasks all need to be done at once:
>  * create versioned docs
>  * create release announcement
>  * update downloads page
>  * update roadmap doc
>  * update doap file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps

2024-09-15 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881884#comment-17881884
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2860:
-

There have been 542 jiras marked as fixed between 1.3.0 and 1.5.2 [1]. So there 
is lots of change. 1.6.0 adds another 300+ jiras. Some of these kinds of 
changes came from cleanup of leaks, some were added as we new functionality. 
Not sure if anyone can exactly state what came from where.

 

[1] jql search query: {{project = YUNIKORN AND status in (Resolved, Closed) AND 
fixVersion in (1.4.0, 1.5.0, 1,5.1, 1.5.2) ORDER BY key DESC}}

> submit gang applications Simultaneously  may cause unexpected pending apps 
> ---
>
> Key: YUNIKORN-2860
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
>Reporter: shawn
>Assignee: Qi Zhu
>Priority: Major
> Attachments: image-2024-09-11-15-41-12-142.png, 
> image-2024-09-11-15-42-07-739.png, image-2024-09-13-15-33-13-964.png, 
> image-2024-09-13-15-33-19-380.png, image-2024-09-13-15-35-26-177.png, 
> state-dump.txt, yunikorn-scheduler.txt
>
>
>   
>   I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get 
> pending, while two pgs get running, that's not expected.
>  It can be reproduced as follows:
> queues
>       1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n 
> yunikorn
>  * queues.yaml
> {code:java}
> partitions:
>   - name: default
>     queues:
>       - name: root
>         queues:
>           - name: my-dev
>             submitacl: "*"
>             resources:
>               guaranteed: { memory: 1G, vcore: 1 }
>               max: { memory: 2G, vcore: 2 }{code}
>          2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: gang-scheduling-job-example1
> spec:
>   completions: 2
>   parallelism: 2
>   template:
>     metadata:
>       labels:
>         app: sleep
>         applicationId: "gang-scheduling-job-example1"
>         queue: root.my-dev
>       annotations:
>         yunikorn.apache.org/task-group-name: task-group-example-0
>         yunikorn.apache.org/task-groups: |-
>           [{
>               "name": "task-group-example-0",
>               "minMember": 2,
>               "minResource": {
>                 "cpu": "1",
>                 "memory": "1G"
>               },
>               "nodeSelector": {},
>               "tolerations": [],
>               "affinity": {}
>           }]
>     spec:
>       schedulerName: yunikorn
>       restartPolicy: Never
>       containers:
>         - name: sleep30
>           image: "nginx:latest"
>           command: ["sleep", "9"]
>           resources:
>             requests:
>               cpu: "1"
>               memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always 
> reproducible)
> !image-2024-09-11-15-41-12-142.png!
>  
> app state as follows
> !image-2024-09-11-15-42-07-739.png|width=754,height=280!
> full state dump as state-dump.txt, yunikorn scheduler logs are in 
> yunikorn-scheduler.txt
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Comment Edited] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps

2024-09-11 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881159#comment-17881159
 ] 

Wilfred Spiegelenburg edited comment on YUNIKORN-2860 at 9/12/24 3:51 AM:
--

This is partially a K8s issue and partially ours. There is no guarantee that 
the pods we create for gang scheduling placeholders come back in a pre-defined 
order. Even if and when we process thing serially there is no guarantee that we 
get them back in that order.

I have seen before that we create 10 placeholders and the first pod that K8s 
returns to us is the last pod we created. That becomes worse when multiple 
origins (applications) are involved. We pass on from the k8shim to the core 
serially. We then schedule based on that ordering.

I think we can improve this if we track the applications that have requested 
gangs in a queue in order of creation and only service the ones that fit in the 
queue or just the first one out of that list until all placeholders are 
allocated. That could be investigated for a next release. Not sure if it can 
work as something like this might have side effects we do not want or cause 
bigger issues. For instance if the placeholder pods continually fail placements 
due to predicates or something we need to "escape" this.


was (Author: wifreds):
This is partially a K8s issue and partially ours. There is no guarantee that 
the pods we create for gang scheduling placeholders come back in a pre-defined 
order. Even if and when we process thing serially there is no guarantee that we 
get them back in that order.

I have seen before that we create 10 placeholders and the first pod that K8s 
returns to us is the last pod we created. That becomes worse when multiple 
origins (applications) are involved. We pass on from the k8shim to the core 
serially.

I think we can improve this if we track the applications that have requested 
gangs in a queue in order of creation and only service the ones that fit in the 
queue or just the first one out of that list until all placeholders are 
allocated. That could be investigated for a next release. Not sure if it can 
work as something like this might have side effects we do not want or cause 
bigger issues. For instance if the placeholder pods continually fail placements 
due to predicates or something we need to "escape" this.

> submit gang applications Simultaneously  may cause unexpected pending apps 
> ---
>
> Key: YUNIKORN-2860
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
>Reporter: shawn
>Priority: Major
> Attachments: image-2024-09-11-15-41-12-142.png, 
> image-2024-09-11-15-42-07-739.png, state-dump.txt, yunikorn-scheduler.txt
>
>
>   
>   I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get 
> pending, while two pgs get running, that's not expected.
>  It can be reproduced as follows:
> queues
>       1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n 
> yunikorn
>  * queues.yaml
> {code:java}
> partitions:
>   - name: default
>     queues:
>       - name: root
>         queues:
>           - name: my-dev
>             submitacl: "*"
>             resources:
>               guaranteed: { memory: 1G, vcore: 1 }
>               max: { memory: 2G, vcore: 2 }{code}
>          2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: gang-scheduling-job-example1
> spec:
>   completions: 2
>   parallelism: 2
>   template:
>     metadata:
>       labels:
>         app: sleep
>         applicationId: "gang-scheduling-job-example1"
>         queue: root.my-dev
>       annotations:
>         yunikorn.apache.org/task-group-name: task-group-example-0
>         yunikorn.apache.org/task-groups: |-
>           [{
>               "name": "task-group-example-0",
>               "minMember": 2,
>               "minResource": {
>                 "cpu": "1",
>                 "memory": "1G"
>               },
>               "nodeSelector": {},
>               "tolerations": [],
>               "affinity": {}
>           }]
>     spec:
>       schedulerName: yunikorn
>       restartPolicy: Never
>       containers:
>         - name: sleep30
>           image: "nginx:latest"
>           command: ["sleep", "9"]
>           resources:
>             requests:
>               cpu: "1"
>               memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always 
> reproducible)
> !image-2024-0

[jira] [Updated] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps

2024-09-11 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2860:

Target Version: 1.7.0

> submit gang applications Simultaneously  may cause unexpected pending apps 
> ---
>
> Key: YUNIKORN-2860
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
>Reporter: shawn
>Priority: Major
> Attachments: image-2024-09-11-15-41-12-142.png, 
> image-2024-09-11-15-42-07-739.png, state-dump.txt, yunikorn-scheduler.txt
>
>
>   
>   I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get 
> pending, while two pgs get running, that's not expected.
>  It can be reproduced as follows:
> queues
>       1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n 
> yunikorn
>  * queues.yaml
> {code:java}
> partitions:
>   - name: default
>     queues:
>       - name: root
>         queues:
>           - name: my-dev
>             submitacl: "*"
>             resources:
>               guaranteed: { memory: 1G, vcore: 1 }
>               max: { memory: 2G, vcore: 2 }{code}
>          2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: gang-scheduling-job-example1
> spec:
>   completions: 2
>   parallelism: 2
>   template:
>     metadata:
>       labels:
>         app: sleep
>         applicationId: "gang-scheduling-job-example1"
>         queue: root.my-dev
>       annotations:
>         yunikorn.apache.org/task-group-name: task-group-example-0
>         yunikorn.apache.org/task-groups: |-
>           [{
>               "name": "task-group-example-0",
>               "minMember": 2,
>               "minResource": {
>                 "cpu": "1",
>                 "memory": "1G"
>               },
>               "nodeSelector": {},
>               "tolerations": [],
>               "affinity": {}
>           }]
>     spec:
>       schedulerName: yunikorn
>       restartPolicy: Never
>       containers:
>         - name: sleep30
>           image: "nginx:latest"
>           command: ["sleep", "9"]
>           resources:
>             requests:
>               cpu: "1"
>               memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always 
> reproducible)
> !image-2024-09-11-15-41-12-142.png!
>  
> app state as follows
> !image-2024-09-11-15-42-07-739.png|width=754,height=280!
> full state dump as state-dump.txt, yunikorn scheduler logs are in 
> yunikorn-scheduler.txt
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps

2024-09-11 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881159#comment-17881159
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2860:
-

This is partially a K8s issue and partially ours. There is no guarantee that 
the pods we create for gang scheduling placeholders come back in a pre-defined 
order. Even if and when we process thing serially there is no guarantee that we 
get them back in that order.

I have seen before that we create 10 placeholders and the first pod that K8s 
returns to us is the last pod we created. That becomes worse when multiple 
origins (applications) are involved. We pass on from the k8shim to the core 
serially.

I think we can improve this if we track the applications that have requested 
gangs in a queue in order of creation and only service the ones that fit in the 
queue or just the first one out of that list until all placeholders are 
allocated. That could be investigated for a next release. Not sure if it can 
work as something like this might have side effects we do not want or cause 
bigger issues. For instance if the placeholder pods continually fail placements 
due to predicates or something we need to "escape" this.

> submit gang applications Simultaneously  may cause unexpected pending apps 
> ---
>
> Key: YUNIKORN-2860
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
>Reporter: shawn
>Priority: Major
> Attachments: image-2024-09-11-15-41-12-142.png, 
> image-2024-09-11-15-42-07-739.png, state-dump.txt, yunikorn-scheduler.txt
>
>
>   
>   I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get 
> pending, while two pgs get running, that's not expected.
>  It can be reproduced as follows:
> queues
>       1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n 
> yunikorn
>  * queues.yaml
> {code:java}
> partitions:
>   - name: default
>     queues:
>       - name: root
>         queues:
>           - name: my-dev
>             submitacl: "*"
>             resources:
>               guaranteed: { memory: 1G, vcore: 1 }
>               max: { memory: 2G, vcore: 2 }{code}
>          2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: gang-scheduling-job-example1
> spec:
>   completions: 2
>   parallelism: 2
>   template:
>     metadata:
>       labels:
>         app: sleep
>         applicationId: "gang-scheduling-job-example1"
>         queue: root.my-dev
>       annotations:
>         yunikorn.apache.org/task-group-name: task-group-example-0
>         yunikorn.apache.org/task-groups: |-
>           [{
>               "name": "task-group-example-0",
>               "minMember": 2,
>               "minResource": {
>                 "cpu": "1",
>                 "memory": "1G"
>               },
>               "nodeSelector": {},
>               "tolerations": [],
>               "affinity": {}
>           }]
>     spec:
>       schedulerName: yunikorn
>       restartPolicy: Never
>       containers:
>         - name: sleep30
>           image: "nginx:latest"
>           command: ["sleep", "9"]
>           resources:
>             requests:
>               cpu: "1"
>               memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always 
> reproducible)
> !image-2024-09-11-15-41-12-142.png!
>  
> app state as follows
> !image-2024-09-11-15-42-07-739.png|width=754,height=280!
> full state dump as state-dump.txt, yunikorn scheduler logs are in 
> yunikorn-scheduler.txt
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2850) Watch configmap only in yunikorn's deployed namespace

2024-09-05 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2850:

Target Version: 1.7.0
Labels: newbie  (was: )

> Watch configmap only in yunikorn's deployed namespace
> -
>
> Key: YUNIKORN-2850
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2850
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Tae-kyeom, Kim
>Priority: Major
>  Labels: newbie
>
> Currently, Yunikorn uses configmap informer to handle configuration hot 
> reload.
> However, In current implementation informer watches all namespaces even only 
> need to watch namespace in which yunikorn is deployed. It causes in efficient 
> behavior when sync and cache configmap states. If there is too many unrelated 
> configmap in other namespace cause long recovery time to list and memory 
> presure to handle configmap caches which is redundant.
> So, If we could replace configmap informer to namespace restricted one, it 
> would improve startup / recovery time and reduce memory usage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2772) Scheduler restart does not preserve app start time

2024-09-03 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879088#comment-17879088
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2772:
-

This is a multi step issue. We do not communicate a timestamps when we create 
an application or an allocation. The issue does not just exist for an 
application. The allocations are also involved. Sorting apps on a queue is one 
side of the problem but sorting allocations within in application could also be 
off.

The k8shim creates the application based on what is considered the oldest pod 
it finds (allocated or still pending). That originator pod create time should 
set as the application create time. Second point is that each task which 
converts into an allocation should have a create time set based on the pod 
detail.

These two changes made on the k8shim side need to be communicated into the core 
and the create steps should pickup these two new values and not use a new 
timestamp. The create time is currently communicated through a tag on the 
application as per YUNIKORN-1155 changes to support placeholder timeout fix on 
recovery. That tag is always set and could be used to set the create time. The 
allocation can follow the same principal.

Starting work on this

> Scheduler restart does not preserve app start time
> --
>
> Key: YUNIKORN-2772
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2772
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Mit Desai
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>
> Whenever the scheduler is restarted, all the applications create time is set 
> to the current time, ignoring the original value that comes from the API 
> server.
> Due to this, FIFO sorting can show irregularity in scheduling.
> If there is an App1 that started 2 days ago and App2 that started 1 day ago, 
> during scheduler restart, both the apps will get almost same created time 
> (nano seconds apart). App2 create time can be just a few nano seconds ahead 
> of App1 and hence App2 gets priority over App1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2846) Throw a warning if a pod has inconsistent metadata in admission controller

2024-09-02 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2846:

Fix Version/s: (was: 1.6.0)

> Throw a warning if a pod has inconsistent metadata in admission controller
> --
>
> Key: YUNIKORN-2846
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2846
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Yu-Lin Chen
>Assignee: Yu-Lin Chen
>Priority: Major
>  Labels: release-notes
>
> Similar to YUNIKORN-2810,
> If the same metadata (such as queue or applicationID) is configured 
> inconsistently when submitting a pod request, admission controller should 
> reject the request in 1.7.0
>  
> In 1.6.0, we only throw a warning. The rejection will be implemented in 1.7.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2840) sortQueues: fair max performance and correctness change

2024-08-26 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2840:
---

 Summary: sortQueues: fair max performance and correctness change
 Key: YUNIKORN-2840
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2840
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Wilfred Spiegelenburg


In YUNIKORN-2678 the fair queue sorting was improved to take guaranteed quota 
into account correctly. During the review there were two minor points left over 
that would need improving:
 * performance
 * correctness on change 

Currently {{GetFairMaxResource()}} gets called for each child this does a 
recursive call back up the queue hierarchy. This is a performance loss 
specially when sorting a deep hierarchy or a larger number of children.

The parent details for a real fair comparison between the children should also 
not change. When they do, as in the current implementation, two children might 
use different inputs when sorted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2809) Fix layout of node transition diagram

2024-08-26 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2809.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

Thank you [~blue.tzuhua] for your first contribution to the SI repo.

Committed the change 

> Fix layout of node transition diagram
> -
>
> Key: YUNIKORN-2809
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2809
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: scheduler-interface
>Reporter: Wilfred Spiegelenburg
>Assignee: Tzu-Hua Lan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.6.0
>
> Attachments: image-2024-08-16-15-57-12-928.png
>
>
> Fix formatting of the node state transition diagram. It is missing white 
> space and the diagram is not readable at the moment. Screenshot taken from 
> file after the 
> [commit|https://github.com/apache/yunikorn-scheduler-interface/blob/38a38685cd4ee2d108f28f6e749ce06cf5db96ce/scheduler-interface-spec.md]
> !image-2024-08-16-15-57-12-928.png|width=321,height=184!
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2838) SI: Update protobuf dependencies

2024-08-26 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2838.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

protobuf and grpc updated to current latest versions

> SI: Update protobuf dependencies
> 
>
> Key: YUNIKORN-2838
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2838
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: scheduler-interface
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Kubernetes 1.31.0 has moved to grpc v1.65.0 and protobuf v1.34.2 upstream. We 
> should update our own dependencies in the scheduler interface to match.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2818) 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps leave the state

2024-08-19 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875034#comment-17875034
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2818:
-

The metrics should be accurate and they are not at all. This is what we track 
at the moment:

!image-2024-08-20-11-36-06-479.png!

Looks like Running is the only one we track correctly. We do not even track 
New, Accepted is broken. Failed and Completed are handled differently which 
should not be the case. 

We seem to have a similar issue in the scheduler metrics. Rejected is not 
tracked in all cases. We do not know really how many applications were 
submitted as nothing tracks New and Accepted is broken also in that case.

 

> 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps 
> leave the state
> --
>
> Key: YUNIKORN-2818
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2818
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Hengzhe Guo
>Priority: Major
> Attachments: image-2024-08-20-11-36-06-479.png
>
>
> currently its behavior is the same as the applicationSubmission counter 
> metric, which is increase only, but I think it should reflect the current 
> number of app in the state in the queue. Like 'running', the metric should be 
> decrease when an app leave the state



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2818) 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps leave the state

2024-08-19 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2818:

Attachment: image-2024-08-20-11-36-06-479.png

> 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps 
> leave the state
> --
>
> Key: YUNIKORN-2818
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2818
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Hengzhe Guo
>Priority: Major
> Attachments: image-2024-08-20-11-36-06-479.png
>
>
> currently its behavior is the same as the applicationSubmission counter 
> metric, which is increase only, but I think it should reflect the current 
> number of app in the state in the queue. Like 'running', the metric should be 
> decrease when an app leave the state



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2818) 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps leave the state

2024-08-19 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2818:

Attachment: (was: image-2024-08-20-11-35-06-312.png)

> 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps 
> leave the state
> --
>
> Key: YUNIKORN-2818
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2818
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Hengzhe Guo
>Priority: Major
>
> currently its behavior is the same as the applicationSubmission counter 
> metric, which is increase only, but I think it should reflect the current 
> number of app in the state in the queue. Like 'running', the metric should be 
> decrease when an app leave the state



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2818) 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps leave the state

2024-08-19 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2818:

Attachment: image-2024-08-20-11-35-06-312.png

> 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps 
> leave the state
> --
>
> Key: YUNIKORN-2818
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2818
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Hengzhe Guo
>Priority: Major
>
> currently its behavior is the same as the applicationSubmission counter 
> metric, which is increase only, but I think it should reflect the current 
> number of app in the state in the queue. Like 'running', the metric should be 
> decrease when an app leave the state



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2803) Use FitIn for node check

2024-08-18 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2803:

Fix Version/s: (was: 1.6.0)

> Use FitIn for node check
> 
>
> Key: YUNIKORN-2803
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2803
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Use FitIn instead of FitInMaxUndef to know whether ask fits in node or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2803) Use FitIn for node check

2024-08-18 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2803:

Target Version: 1.6.0

> Use FitIn for node check
> 
>
> Key: YUNIKORN-2803
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2803
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Use FitIn instead of FitInMaxUndef to know whether ask fits in node or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2253) Support retry when bind volume failed case instead of failing the task

2024-08-15 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874131#comment-17874131
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2253:
-

I think we need to abandon a code change for this and add to the 
troubleshooting the recommendation to increase the timeout. We then start work 
on YUNIKORN-2804 as soon as we have forked YuniKorn 1.6. Adding code for 
something that we can do as well with an already supported configuration value 
is a bad idea.

> Support retry when bind volume failed case instead of failing the task
> --
>
> Key: YUNIKORN-2253
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2253
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>
> Currently, we support bind volume to pass the time out parameter, but we'd 
> better support retry bind volume, because the timeout is one of the error for 
> bind volume fails.
> We will benefit a lot if we can retry successfully, it will make task not 
> failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2809) Fix layout of node transition diagram

2024-08-15 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2809:
---

 Summary: Fix layout of node transition diagram
 Key: YUNIKORN-2809
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2809
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: scheduler-interface
Reporter: Wilfred Spiegelenburg
 Attachments: image-2024-08-16-15-57-12-928.png

Fix formatting of the node state transition diagram. It is missing white space 
and the diagram is not readable at the moment. Screenshot taken from file after 
the 
[commit|https://github.com/apache/yunikorn-scheduler-interface/blob/38a38685cd4ee2d108f28f6e749ce06cf5db96ce/scheduler-interface-spec.md]

!image-2024-08-16-15-57-12-928.png|width=321,height=184!

{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2806) Deadlock in preemption after YUNIKORN-2769

2024-08-15 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2806:

Summary: Deadlock in preemption after YUNIKORN-2769  (was: Deadlock in 
preemption)

> Deadlock in preemption after YUNIKORN-2769
> --
>
> Key: YUNIKORN-2806
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2806
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> A deadlock exists in TryPreemption() where the current app gets locked twice: 
> once in TryAllocate(), and again in findEligiblePreemptionVictims() where 
> apps are iterated to find victims. The current app is not excluded like it 
> should be, resulting in a deadlock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2806) Deadlock in preemption

2024-08-15 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874105#comment-17874105
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2806:
-

To limit comments and requests for backports etc:

This was introduced in as part of YUNIKORN-2769 up until that point we had this 
specific check as part of the victim list. This issue has not been part of any 
release. It has only existed in master for ~3 days.

[PR diff 
snippet|https://github.com/apache/yunikorn-core/pull/923/files#diff-27632d48eb925e150a33bc92370ceaa66c31048018d11ca7a53a0b50ab7250acL1753]

> Deadlock in preemption
> --
>
> Key: YUNIKORN-2806
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2806
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> A deadlock exists in TryPreemption() where the current app gets locked twice: 
> once in TryAllocate(), and again in findEligiblePreemptionVictims() where 
> apps are iterated to find victims. The current app is not excluded like it 
> should be, resulting in a deadlock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2098) Change go lint SHA detection (following)

2024-08-13 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2098.
-
Fix Version/s: 1.6.0
   Resolution: Delivered

As a side effect of all the clean up work around the linter we no longer use a 
SHA detection as we are clean. The lint command has been updated as part of 
other changes to remove all SHA detection code.

> Change go lint SHA detection (following)
> 
>
> Key: YUNIKORN-2098
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2098
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Dong-Lin Hsieh
>Assignee: Dong-Lin Hsieh
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Following https://issues.apache.org/jira/browse/YUNIKORN-285
> Currently, we will always use the "ORIGIN/HEAD" ref. Fallback to "HEAD^" when 
> "ORIGIN/HEAD" doesn't exist. 
> This will avoid the 'fatal: Needed a single revision' error in forked repos.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2802) Consider more resources and types which can be pruned when it is zero

2024-08-13 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873414#comment-17873414
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2802:
-

Pruning can be looked at for non configuration based values on the queues for 
instance:
 * allocated
 * pending
 * preempting

However we cannot prune max or guaranteed as that changes the semantics of the 
value.

In general: computed values could  be pruned, configured values must never be 
pruned

> Consider more resources and types which can be pruned when it is zero
> -
>
> Key: YUNIKORN-2802
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2802
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Qi Zhu
>Priority: Minor
>
> We rethink to support more resources and types which can be pruned, see 
> details:
> [https://github.com/apache/yunikorn-core/pull/943#discussion_r1716250452]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2796) Root queue and partition should not have resource types with 0 values

2024-08-08 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871937#comment-17871937
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2796:
-

BTW: this is more of a display issue than an enforcement issue.

Scheduling of resource types that are not known or registered will fail to find 
a node to run on and will never make it to the point of the quota changes on a 
queue.

 

> Root queue and partition should not have resource types with 0 values
> -
>
> Key: YUNIKORN-2796
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2796
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>
> When we register a node the node available resources get added to the 
> partition and root queue. When we remove the node the resources get removed 
> again. Updates do a similar action.
> When we no longer have nodes that expose a specific resource we leave the 
> resource type in the root queue and partition with a 0. It looks strange to 
> have a maximum with 0 set for the partition or root and contradicts the quota 
> interpretation documented.
> A resource we do not have at a certain point in time should not have a quota 
> of 0 assigned in the root or partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2796) Root queue and partition should not have resource types with 0 values

2024-08-08 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2796:
---

 Summary: Root queue and partition should not have resource types 
with 0 values
 Key: YUNIKORN-2796
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2796
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


When we register a node the node available resources get added to the partition 
and root queue. When we remove the node the resources get removed again. 
Updates do a similar action.

When we no longer have nodes that expose a specific resource we leave the 
resource type in the root queue and partition with a 0. It looks strange to 
have a maximum with 0 set for the partition or root and contradicts the quota 
interpretation documented.

A resource we do not have at a certain point in time should not have a quota of 
0 assigned in the root or partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2789) Queue internalGetMax should use permissive calculator

2024-08-07 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871861#comment-17871861
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2789:
-

The only place were we use the {{ComponentWiseMin()}} function is in the queue 
call that we do not want anymore. Pushing through a refactor at the same time: 
rename {{ComponentWiseMinPermissive()}} to become just {{ComponentWiseMin()}}

> Queue internalGetMax should use permissive calculator
> -
>
> Key: YUNIKORN-2789
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2789
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> We have documented for queue resources that:
> {quote}Resources that are not specified in the list are not limited, for max 
> resources, or guaranteed in the case of guaranteed resources.
> {quote}
> However in the implementation on the queue, internalGetMax, we call 
> resources.ComponentWiseMin(). This returns 0 values for each type that is not 
> defined in the two resources passed in. That does not line up.
> Example for getting the maximum resources of a queue using GetMaxQueueSet 
> what I would expect based on the documentation:
>  
> {code:java}
> parent: max{memory: 100G}
> parent.child: max{vcore: 100}
>   => result child max{memory: 100G, vcore: 100}{code}
>  
>  
> currently we get:
> {code:java}
> parent: max{memory: 100G}
> parent.child: max{vcore: 100}
>   => result child max{memory: 0, vcore: 0}{code}
> Similar when we add the root and call GetMaxResource:
> {code:java}
> root: max{memory: 100G, vcore: 200}
> root.parent: max{vcore: 100}
> root.parent.child: max{nvidia.com/gpu: 10}
>=> result parent max{memory: 0, vcore: 100}
>    => result child max{memory: 0, vcore: 0, nvidia.com/gpu: 0}{code}
> The fact that the resource type does not exist, even in the root, should not 
> mean a zero set. The nodes that expose the specific resource might not have 
> been registered or scaled up yet.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2789) Queue internalGetMax should use permissive calculator

2024-08-07 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2789:

Summary: Queue internalGetMax should use permissive calculator  (was: Queue 
internalGetMax should not use permissive calculator)

> Queue internalGetMax should use permissive calculator
> -
>
> Key: YUNIKORN-2789
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2789
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> We have documented for queue resources that:
> {quote}Resources that are not specified in the list are not limited, for max 
> resources, or guaranteed in the case of guaranteed resources.
> {quote}
> However in the implementation on the queue, internalGetMax, we call 
> resources.ComponentWiseMin(). This returns 0 values for each type that is not 
> defined in the two resources passed in. That does not line up.
> Example for getting the maximum resources of a queue using GetMaxQueueSet 
> what I would expect based on the documentation:
>  
> {code:java}
> parent: max{memory: 100G}
> parent.child: max{vcore: 100}
>   => result child max{memory: 100G, vcore: 100}{code}
>  
>  
> currently we get:
> {code:java}
> parent: max{memory: 100G}
> parent.child: max{vcore: 100}
>   => result child max{memory: 0, vcore: 0}{code}
> Similar when we add the root and call GetMaxResource:
> {code:java}
> root: max{memory: 100G, vcore: 200}
> root.parent: max{vcore: 100}
> root.parent.child: max{nvidia.com/gpu: 10}
>=> result parent max{memory: 0, vcore: 100}
>    => result child max{memory: 0, vcore: 0, nvidia.com/gpu: 0}{code}
> The fact that the resource type does not exist, even in the root, should not 
> mean a zero set. The nodes that expose the specific resource might not have 
> been registered or scaled up yet.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota

2024-08-07 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871845#comment-17871845
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2790:
-

Examples 2: kubelet restart with delayed custom resource registration each node 
has 2 GPUs
 * root queue max is 1 vcore, 2 GPU (one node with GPU is registered)
 * root queue usage is 8000 vcore, 3 GPU (old GPU job from before the kubelet 
restart on second node)
 * request is for 1000 vcore, 1 GPU

Currently: all allocations in the system are *blocked* as the root queue is 
considered over quota always

New behaviour: allocation is *blocked* as allocation requests another GPU while 
already over quota

> GPU node restart could leave root queue always out of quota
> ---
>
> Key: YUNIKORN-2790
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2790
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>  Labels: pull-request-available, release-notes
>
> On a node restart the pods assigned and running on a node are not checked 
> against the quota of the queue(s) they run in. This has multiple reasons. 
> Pods on a node that are scheduled by YuniKorn and already running must not be 
> rejected. Rejecting pods could cause lots of side effects.
> The combination of a node restart and the reconfiguring a GPU driver could 
> however cause a secondary issue. The node on restart might not expose the GPU 
> resource yet. Pods that ran before the restart can be using the GPU resource. 
> After those pods are added, ignoring quotas, the root queue will show a usage 
> for a resource that has not been registered yet.
> This fact prevents all scheduling from progressing. Even for pods not 
> requesting the GPU resource. Each scheduling action will check the root queue 
> quota and fail. This prevents the GPU driver pods to be placed and the GPU to 
> be registered by the node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2794) Resource: Change SubOnlyExisting() to same signature as AddOnlyExisting()

2024-08-07 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2794:
---

 Summary: Resource: Change SubOnlyExisting() to same signature as 
AddOnlyExisting()
 Key: YUNIKORN-2794
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2794
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Wilfred Spiegelenburg


The AddOnlyExisting function takes two resource objects and returns a new 
object. The SubOnlyExisting method is called on a resource receiver modifying 
the receiver object. 

These two should use the same kind of signature taking two resource objects and 
returning a new object. In most use cases for SubOnlyExisting we do a clone 
before we call within a locked method on an object that contains the resource.

This clone becomes obsolete when we make the change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota

2024-08-07 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871829#comment-17871829
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2790:
-

To clarify what will change with this fix:

Examples 1: lowering a configured queue quota
 * queue max is 1 vcore, 5 GPU (changed from 10 GPU to 5 GPU)
 * queue usage is 8000 vcore, 6 GPU
 * request is for 1000 vcore

Currently: allocation is *blocked* (queue is always considered over quota)

New behaviour: allocation is allowed

Examples 2: kubelet restart with delayed custom resource registration
 * root queue max is 1 vcore, 0 GPU (no nodes with GPUs are registered yet)
 * root queue usage is 8000 vcore, 1 GPU (old GPU job from before the kubelet 
restart)
 * request is for 1000 vcore

Currently: all allocations in the system are *blocked* as the root queue is 
considered over quota always

New behaviour: allocation is allowed

> GPU node restart could leave root queue always out of quota
> ---
>
> Key: YUNIKORN-2790
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2790
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>  Labels: pull-request-available, release-notes
>
> On a node restart the pods assigned and running on a node are not checked 
> against the quota of the queue(s) they run in. This has multiple reasons. 
> Pods on a node that are scheduled by YuniKorn and already running must not be 
> rejected. Rejecting pods could cause lots of side effects.
> The combination of a node restart and the reconfiguring a GPU driver could 
> however cause a secondary issue. The node on restart might not expose the GPU 
> resource yet. Pods that ran before the restart can be using the GPU resource. 
> After those pods are added, ignoring quotas, the root queue will show a usage 
> for a resource that has not been registered yet.
> This fact prevents all scheduling from progressing. Even for pods not 
> requesting the GPU resource. Each scheduling action will check the root queue 
> quota and fail. This prevents the GPU driver pods to be placed and the GPU to 
> be registered by the node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2782) Cleanup dead code in cache/context

2024-08-07 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871821#comment-17871821
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2782:
-

thanks [~chia7712] for clarifying what I tried to say :)

> Cleanup dead code in cache/context
> --
>
> Key: YUNIKORN-2782
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2782
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Tzu-Hua Lan
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.6.0
>
>
> In the cache context we have a number of functions that only get called from 
> tests. We need to clean up and only use one version:
>  * RemoveApplication & RemoveApplicationInternal
> We should only have RemoveApplication but the internal version is used 
> everywhere
>  * UpdateApplication is not used at all



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2678) Fair queue sorting is inconsistent

2024-08-07 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2678:

Labels: pull-request-available release-notes  (was: pull-request-available)

> Fair queue sorting is inconsistent
> --
>
> Key: YUNIKORN-2678
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2678
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.5.1
> Environment: EKS 1.29
>Reporter: Paul Santa Clara
>Assignee: Paul Santa Clara
>Priority: Major
>  Labels: pull-request-available, release-notes
> Attachments: Screenshot 2024-08-06 at 5.18.18 PM.png, Screenshot 
> 2024-08-06 at 5.18.21 PM.png, Screenshot 2024-08-06 at 5.18.30 PM.png, 
> jira-queues.yaml, jira-tier0-screenshot.png, jira-tier1-screenshot.png, 
> jira-tier2-screenshot.png, jira-tier3-screenshot.png, 
> yunikorn-fair-4-tiers-complete.png, yunikorn-fair-4-tiers.png
>
>
> Please see the attached queue configuration(jira-queues.yaml). 
> I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 
> pods in Tier3.  Each Pod will require 1 VCore. Initially, there will be 0 
> suitable nodes to run the Pods and all will be Pending. Karpenter will soon 
> provision Nodes and Yunikorn will react by binding the Pods. 
> Given this 
> [code|https://github.com/apache/yunikorn-core/blob/a786feb5761be28e802d08976d224c40639cd86b/pkg/scheduler/objects/sorters.go#L81C74-L81C95],
>  I would expect Yunikorn to distribute the allocations such that each of the 
> Tier’ed queues reaches its Guarantees.  Instead, I observed a roughly even 
> distribution of allocation across all of the queues.
> Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically 
> overshoots them.
>  
> {code:java}
> > kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l
>    86
> > kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l
>    83
> > kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l
>    78
> > kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l
>    77
> {code}
> Please see attached screen shots for queue usage.
> Note, this situation can also be reproduced without the use of Karpenter by 
> simply setting Yunikorn's `service.schedulingInterval` to a high duration, 
> say 1m.  Doing so will force Yunikorn to react to 400 Pods -across 4 queues- 
> at roughly the same time forcing prioritization of queue allocations.
> Test code to generate Pods:
> {code:java}
> from kubernetes import client, config
> config.load_kube_config()
> v1 = client.CoreV1Api()
> def create_pod_manifest(tier, exec,):
> pod_manifest = {
> 'apiVersion': 'v1',
> 'kind': 'Pod',
> 'metadata': {
> 'name': f"rolling-test-tier-{tier}-exec-{exec}",
> 'namespace': 'finance',
> 'labels': {
> 'applicationId': f"MyOwnApplicationId-tier-{tier}",
> 'queue': f"root.tiers.{tier}"
> },
> "yunikorn.apache.org/user.info": 
> '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}'
> },
> 'spec': {
> "affinity": {
> "nodeAffinity" : {
> "requiredDuringSchedulingIgnoredDuringExecution" : {
> "nodeSelectorTerms" : [
> {
> "matchExpressions" : [
> {
> "key" : "di.rbx.com/dedicated",
> "operator" : "In",
> "values" : ["spark"]
> }
> ]
> }
> ]
> }
> },
> },
> "tolerations" : [
> {
> "effect" : "NoSchedule",
> "key": "dedicated",
> "operator" : "Equal",
> "value" : "spark"
> },
> ],
> "schedulerName": "yunikorn",
> 'restartPolicy': 'Always',
> 'containers': [{
> "name": "ubuntu",
> 'image': 'ubuntu',
> "command": ["sleep", "604800"],
> "imagePullPolicy": "IfNotPresent",
> "resources" : {
> "limits" : {
> 'cpu' : "1"
> },
> "requests" : {
> 'cpu' : "1"
> }
> }
>

[jira] [Updated] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota

2024-08-07 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2790:

Labels: release-notes  (was: )

> GPU node restart could leave root queue always out of quota
> ---
>
> Key: YUNIKORN-2790
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2790
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>  Labels: release-notes
>
> On a node restart the pods assigned and running on a node are not checked 
> against the quota of the queue(s) they run in. This has multiple reasons. 
> Pods on a node that are scheduled by YuniKorn and already running must not be 
> rejected. Rejecting pods could cause lots of side effects.
> The combination of a node restart and the reconfiguring a GPU driver could 
> however cause a secondary issue. The node on restart might not expose the GPU 
> resource yet. Pods that ran before the restart can be using the GPU resource. 
> After those pods are added, ignoring quotas, the root queue will show a usage 
> for a resource that has not been registered yet.
> This fact prevents all scheduling from progressing. Even for pods not 
> requesting the GPU resource. Each scheduling action will check the root queue 
> quota and fail. This prevents the GPU driver pods to be placed and the GPU to 
> be registered by the node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Comment Edited] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota

2024-08-07 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871556#comment-17871556
 ] 

Wilfred Spiegelenburg edited comment on YUNIKORN-2790 at 8/7/24 7:28 AM:
-

Solution is to not check resource types that are not requested by pods when we 
check for a fit in the queue. This will allow a pod asking for memory and 
vcores to be scheduled even if the root queue is out of GPU or storage.

This should not happen on any other queue but the root queue for node 
registration delays. It could happen for a different queue in the hierarchy if 
the quota on a queue has been changed and set to a lower value than the 
currently running workload. Lower the GPU quota on a queue should still allow 
memory and vcore only pods to be scheduled.

This makes scheduling more resilient for configuration changes and custom 
resource registration delays.


was (Author: wifreds):
Solution is to not check resource types that are not requested by pods when we 
check for a fit in the queue. This will allow a pod asking for memory and 
vcores to be scheduled even if the root queue is out of GPU or storage. This 
should not happen on any other queue in the hierarchy unless the quota has been 
changed to become lower than the currently running workload.

This makes scheduling more resilient for configuration changes and custom 
resource registration delays.

> GPU node restart could leave root queue always out of quota
> ---
>
> Key: YUNIKORN-2790
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2790
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>
> On a node restart the pods assigned and running on a node are not checked 
> against the quota of the queue(s) they run in. This has multiple reasons. 
> Pods on a node that are scheduled by YuniKorn and already running must not be 
> rejected. Rejecting pods could cause lots of side effects.
> The combination of a node restart and the reconfiguring a GPU driver could 
> however cause a secondary issue. The node on restart might not expose the GPU 
> resource yet. Pods that ran before the restart can be using the GPU resource. 
> After those pods are added, ignoring quotas, the root queue will show a usage 
> for a resource that has not been registered yet.
> This fact prevents all scheduling from progressing. Even for pods not 
> requesting the GPU resource. Each scheduling action will check the root queue 
> quota and fail. This prevents the GPU driver pods to be placed and the GPU to 
> be registered by the node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota

2024-08-07 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871556#comment-17871556
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2790:
-

Solution is to not check resource types that are not requested by pods when we 
check for a fit in the queue. This will allow a pod asking for memory and 
vcores to be scheduled even if the root queue is out of GPU or storage. This 
should not happen on any other queue in the hierarchy unless the quota has been 
changed to become lower than the currently running workload.

This makes scheduling more resilient for configuration changes and custom 
resource registration delays.

> GPU node restart could leave root queue always out of quota
> ---
>
> Key: YUNIKORN-2790
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2790
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>
> On a node restart the pods assigned and running on a node are not checked 
> against the quota of the queue(s) they run in. This has multiple reasons. 
> Pods on a node that are scheduled by YuniKorn and already running must not be 
> rejected. Rejecting pods could cause lots of side effects.
> The combination of a node restart and the reconfiguring a GPU driver could 
> however cause a secondary issue. The node on restart might not expose the GPU 
> resource yet. Pods that ran before the restart can be using the GPU resource. 
> After those pods are added, ignoring quotas, the root queue will show a usage 
> for a resource that has not been registered yet.
> This fact prevents all scheduling from progressing. Even for pods not 
> requesting the GPU resource. Each scheduling action will check the root queue 
> quota and fail. This prevents the GPU driver pods to be placed and the GPU to 
> be registered by the node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota

2024-08-06 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2790:
---

 Summary: GPU node restart could leave root queue always out of 
quota
 Key: YUNIKORN-2790
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2790
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


On a node restart the pods assigned and running on a node are not checked 
against the quota of the queue(s) they run in. This has multiple reasons. Pods 
on a node that are scheduled by YuniKorn and already running must not be 
rejected. Rejecting pods could cause lots of side effects.

The combination of a node restart and the reconfiguring a GPU driver could 
however cause a secondary issue. The node on restart might not expose the GPU 
resource yet. Pods that ran before the restart can be using the GPU resource. 
After those pods are added, ignoring quotas, the root queue will show a usage 
for a resource that has not been registered yet.

This fact prevents all scheduling from progressing. Even for pods not 
requesting the GPU resource. Each scheduling action will check the root queue 
quota and fail. This prevents the GPU driver pods to be placed and the GPU to 
be registered by the node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2789) Queue internalGetMax should not use permissive calculator

2024-08-06 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2789:
---

 Summary: Queue internalGetMax should not use permissive calculator
 Key: YUNIKORN-2789
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2789
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - common
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


We have documented for queue resources that:
{quote}Resources that are not specified in the list are not limited, for max 
resources, or guaranteed in the case of guaranteed resources.
{quote}
However in the implementation on the queue, internalGetMax, we call 
resources.ComponentWiseMin(). This returns 0 values for each type that is not 
defined in the two resources passed in. That does not line up.

Example for getting the maximum resources of a queue using GetMaxQueueSet what 
I would expect based on the documentation:

 
{code:java}
parent: max{memory: 100G}
parent.child: max{vcore: 100}
  => result child max{memory: 100G, vcore: 100}{code}
 

 

currently we get:
{code:java}
parent: max{memory: 100G}
parent.child: max{vcore: 100}
  => result child max{memory: 0, vcore: 0}{code}
Similar when we add the root and call GetMaxResource:
{code:java}
root: max{memory: 100G, vcore: 200}
root.parent: max{vcore: 100}
root.parent.child: max{nvidia.com/gpu: 10}
   => result parent max{memory: 0, vcore: 100}
   => result child max{memory: 0, vcore: 0, nvidia.com/gpu: 0}{code}
The fact that the resource type does not exist, even in the root, should not 
mean a zero set. The nodes that expose the specific resource might not have 
been registered or scaled up yet.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2678) Yunikorn does not appear to be considering Guaranteed resources when allocating Pending Pods.

2024-08-06 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871541#comment-17871541
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2678:
-

The current calculation is broken and I have gone through the slack discussion. 
I can see why we would want to use the max resource as a substitute for 
guaranteed resource. Looking forward to a PR.

One point I would already make is that the max used should only rely on the 
configured values in the hierarchy. The current cluster size must not be taken 
into account. So root maximum must be ignored when we look at this. Besides 
that looking at the {{internalGetMax()}} code there is a bug there for which I 
will file a jira. That will most likely influence this sorting as it revolves 
around setting 0 values.

> Yunikorn does not appear to be considering Guaranteed resources when 
> allocating Pending Pods.
> -
>
> Key: YUNIKORN-2678
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2678
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.5.1
> Environment: EKS 1.29
>Reporter: Paul Santa Clara
>Assignee: Paul Santa Clara
>Priority: Major
> Attachments: Screenshot 2024-08-06 at 5.18.18 PM.png, Screenshot 
> 2024-08-06 at 5.18.21 PM.png, Screenshot 2024-08-06 at 5.18.30 PM.png, 
> jira-queues.yaml, jira-tier0-screenshot.png, jira-tier1-screenshot.png, 
> jira-tier2-screenshot.png, jira-tier3-screenshot.png
>
>
> Please see the attached queue configuration(jira-queues.yaml). 
> I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 
> pods in Tier3.  Each Pod will require 1 VCore. Initially, there will be 0 
> suitable nodes to run the Pods and all will be Pending. Karpenter will soon 
> provision Nodes and Yunikorn will react by binding the Pods. 
> Given this 
> [code|https://github.com/apache/yunikorn-core/blob/a786feb5761be28e802d08976d224c40639cd86b/pkg/scheduler/objects/sorters.go#L81C74-L81C95],
>  I would expect Yunikorn to distribute the allocations such that each of the 
> Tier’ed queues reaches its Guarantees.  Instead, I observed a roughly even 
> distribution of allocation across all of the queues.
> Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically 
> overshoots them.
>  
> {code:java}
> > kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l
>    86
> > kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l
>    83
> > kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l
>    78
> > kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l
>    77
> {code}
> Please see attached screen shots for queue usage.
> Note, this situation can also be reproduced without the use of Karpenter by 
> simply setting Yunikorn's `service.schedulingInterval` to a high duration, 
> say 1m.  Doing so will force Yunikorn to react to 400 Pods -across 4 queues- 
> at roughly the same time forcing prioritization of queue allocations.
> Test code to generate Pods:
> {code:java}
> from kubernetes import client, config
> config.load_kube_config()
> v1 = client.CoreV1Api()
> def create_pod_manifest(tier, exec,):
> pod_manifest = {
> 'apiVersion': 'v1',
> 'kind': 'Pod',
> 'metadata': {
> 'name': f"rolling-test-tier-{tier}-exec-{exec}",
> 'namespace': 'finance',
> 'labels': {
> 'applicationId': f"MyOwnApplicationId-tier-{tier}",
> 'queue': f"root.tiers.{tier}"
> },
> "yunikorn.apache.org/user.info": 
> '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}'
> },
> 'spec': {
> "affinity": {
> "nodeAffinity" : {
> "requiredDuringSchedulingIgnoredDuringExecution" : {
> "nodeSelectorTerms" : [
> {
> "matchExpressions" : [
> {
> "key" : "di.rbx.com/dedicated",
> "operator" : "In",
> "values" : ["spark"]
> }
> ]
> }
> ]
> }
> },
> },
> "tolerations" : [
> {
> "effect" : "NoSchedule",
> "key": "dedicated",
> "operator" : "Equal",
> "value" : "s

[jira] [Commented] (YUNIKORN-2678) Yunikorn does not appear to be considering Guaranteed resources when allocating Pending Pods.

2024-08-04 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870913#comment-17870913
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2678:
-

I have not looked at this code in years.

However when I look at the it now I think the issue is in the {{nil || zero}} 
check when we set the 
"[used|https://github.com/apache/yunikorn-core/blob/v1.5.2/pkg/common/resources/resources.go#L488]";
 value in the shares. That does not take into account that we have a large 
discrepancy between resource types in absolute values. Resources like memory or 
storage will always dominate above pods or GPUs.

Introducing max in the mix with guarantee will have side effects. I create a 
queue with max memory set to 1TB, no guaranteed. I create a second queue with 
max set to 1TB but a guaranteed memory of 100GB. Both queues use 50GB. In that 
case share of queue 1 will be 0.05, queue 2 will have a share of 0.5 Queue 1 
will win and get scheduled until it uses 500GB, with a guaranteed of 0. Queue 1 
should not have a smaller share than queue 2 until all guaranteed is used.

That looks as broken as what we have now.

I could see two options:
 # setting a fixed share value if not specified in guaranteed
 # not adding anything to the shares unless set in guaranteed

Both options above will fix that same issue. I think option 2 above is the 
better solution. We want to schedule on guaranteed setting. We need to test if 
that still distributes fairly between the queues when one queue has a usage 
over its guaranteed compared to a second queue with no guaranteed.  

If we really want to have a policy for "least used queue" we can build one 
based on the maximum resource and the usage.

The other option which would be nice to have would be adding a configurable 
resource weights option like we have in the node sorting. That would be a new 
feature...

> Yunikorn does not appear to be considering Guaranteed resources when 
> allocating Pending Pods.
> -
>
> Key: YUNIKORN-2678
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2678
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.5.1
> Environment: EKS 1.29
>Reporter: Paul Santa Clara
>Assignee: Paul Santa Clara
>Priority: Major
> Attachments: jira-queues.yaml, jira-tier0-screenshot.png, 
> jira-tier1-screenshot.png, jira-tier2-screenshot.png, 
> jira-tier3-screenshot.png
>
>
> Please see the attached queue configuration(jira-queues.yaml). 
> I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 
> pods in Tier3.  Each Pod will require 1 VCore. Initially, there will be 0 
> suitable nodes to run the Pods and all will be Pending. Karpenter will soon 
> provision Nodes and Yunikorn will react by binding the Pods. 
> Given this 
> [code|https://github.com/apache/yunikorn-core/blob/a786feb5761be28e802d08976d224c40639cd86b/pkg/scheduler/objects/sorters.go#L81C74-L81C95],
>  I would expect Yunikorn to distribute the allocations such that each of the 
> Tier’ed queues reaches its Guarantees.  Instead, I observed a roughly even 
> distribution of allocation across all of the queues.
> Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically 
> overshoots them.
>  
> {code:java}
> > kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l
>    86
> > kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l
>    83
> > kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l
>    78
> > kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l
>    77
> {code}
> Please see attached screen shots for queue usage.
> Note, this situation can also be reproduced without the use of Karpenter by 
> simply setting Yunikorn's `service.schedulingInterval` to a high duration, 
> say 1m.  Doing so will force Yunikorn to react to 400 Pods -across 4 queues- 
> at roughly the same time forcing prioritization of queue allocations.
> Test code to generate Pods:
> {code:java}
> from kubernetes import client, config
> config.load_kube_config()
> v1 = client.CoreV1Api()
> def create_pod_manifest(tier, exec,):
> pod_manifest = {
> 'apiVersion': 'v1',
> 'kind': 'Pod',
> 'metadata': {
> 'name': f"rolling-test-tier-{tier}-exec-{exec}",
> 'namespace': 'finance',
> 'labels': {
> 'applicationId': f"MyOwnApplicationId-tier-{tier}",
> 'queue': f"root.tiers.{tier}"
> },
> "yunikorn.apache.org/user.info": 
> '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}'
>  

[jira] [Updated] (YUNIKORN-2281) Support OIDC credentials in YuniKorn

2024-08-02 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2281:

Labels: release-notes  (was: )

> Support OIDC credentials in YuniKorn
> 
>
> Key: YUNIKORN-2281
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2281
> Project: Apache YuniKorn
>  Issue Type: New Feature
>Reporter: Dmitry
>Assignee: Manikandan R
>Priority: Major
>  Labels: release-notes
>
> Currently only alphanumeric chars are allowed in usernames. We're using 
> CiLogon OIDC users, in the form of "http://cilogon.org/serverA/users/123456";, 
> which is denied in configuration by the admission controller:
> > error: configmaps "yunikorn-configs" could not be patched: admission 
> > webhook "admission-webhook.yunikorn.validate-conf" denied the request: 
> > invalid limit user name 'http://cilogon.org/serverA/users/123456' in limit 
> > definition
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2646) Deadlock detected during preemption

2024-08-02 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870395#comment-17870395
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2646:
-

This is not the cause, it cannot be.

YuniKorn 1.5.2 has a deadlock fix as per YUNIKORN-2629.

If there is a lock up left we should see it for others too, specially when you 
say it happens really often. We have not got the evidence that confirms this, 
we cannot fix or change without understanding what is broken. We need logs or a 
reproduction that shows the issue.

When you get to the "stuck" state collect the details and open a *_new_* jira:
 * scheduler logs
 * state dump via /ws/v1/fullstatedump
 * pprof output of /debug/pprof/goroutine?debug=2

If it really is a deadlock in the code the state dump will most likely fail. 
Logs and pprof never fail so we should have a full routine dump. You can even 
collect two in a row to .

> Deadlock detected during preemption
> ---
>
> Key: YUNIKORN-2646
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2646
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Dmitry
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
> Attachments: yunikorn-logs-lock.txt.gz, yunikorn-logs.txt.gz
>
>
> Hitting deadlocks in 1.5.1
> The log is attached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Comment Edited] (YUNIKORN-2646) Deadlock detected during preemption

2024-08-01 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870352#comment-17870352
 ] 

Wilfred Spiegelenburg edited comment on YUNIKORN-2646 at 8/2/24 4:29 AM:
-

It is a false positive detection. The code explicitly prevents the case from 
happening. See [this 
comment|https://issues.apache.org/jira/browse/YUNIKORN-2646?focusedCommentId=17850240&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17850240]
 here again worded slightly differently:

The detector is not smart enough to understand this part of the logic and just 
sees the order:
 # FIRST: Application A lock taken followed by Application B.
 # SECOND: Application B lock is taken followed by Application A.

That triggers the detection. The fact this sequence is only possible because we 
have a guarantee in our code that between FIRST and SECOND all locks are 
released without exception cannot be expressed in rules.

BTW: running with deadlock detection in production is a really bad idea. It 
causes a lot of overhead.


was (Author: wifreds):
It is a false positive detection. The code explicitly prevents the case from 
happening. See this comment here again worded slightly differently:

The detector is not smart enough to understand this part of the logic and just 
sees the order:
 # FIRST: Application A lock taken followed by Application B.
 # SECOND: Application B lock is taken followed by Application A.

That triggers the detection. The fact this sequence is only possible because we 
have a guarantee in our code that between FIRST and SECOND all locks are 
released without exception cannot be expressed in rules.

BTW: running with deadlock detection in production is a really bad idea. It 
causes a lot of overhead.

> Deadlock detected during preemption
> ---
>
> Key: YUNIKORN-2646
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2646
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Dmitry
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
> Attachments: yunikorn-logs-lock.txt.gz, yunikorn-logs.txt.gz
>
>
> Hitting deadlocks in 1.5.1
> The log is attached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Comment Edited] (YUNIKORN-2646) Deadlock detected during preemption

2024-08-01 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870352#comment-17870352
 ] 

Wilfred Spiegelenburg edited comment on YUNIKORN-2646 at 8/2/24 4:26 AM:
-

It is a false positive detection. The code explicitly prevents the case from 
happening. See this comment here again worded slightly differently:

The detector is not smart enough to understand this part of the logic and just 
sees the order:
 # FIRST: Application A lock taken followed by Application B.
 # SECOND: Application B lock is taken followed by Application A.

That triggers the detection. The fact this sequence is only possible because we 
have a guarantee in our code that between FIRST and SECOND all locks are 
released without exception cannot be expressed in rules.

BTW: running with deadlock detection in production is a really bad idea. It 
causes a lot of overhead.


was (Author: wifreds):
It is a false positive detection. The code explicitly prevents the case from 
happening. See this comment here again worded slightly differently:

The detector is not smart enough to understand this part of the logic and just 
sees the order:
 # FIRST: Application A lock taken followed by Application B.
 # SECOND: Application B lock is taken followed by Application A.

That triggers the detection. The fact this sequence is only possible because we 
have a guarantee in our code that between FIRST and SECOND all locks are 
released without exception cannot be expressed in rules.

BTW: running with deadlock detection in production is a really bad idea. It 
causes a lot of overhead.

> Deadlock detected during preemption
> ---
>
> Key: YUNIKORN-2646
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2646
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Dmitry
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
> Attachments: yunikorn-logs-lock.txt.gz, yunikorn-logs.txt.gz
>
>
> Hitting deadlocks in 1.5.1
> The log is attached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2646) Deadlock detected during preemption

2024-08-01 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870352#comment-17870352
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2646:
-

It is a false positive detection. The code explicitly prevents the case from 
happening. See this comment here again worded slightly differently:

The detector is not smart enough to understand this part of the logic and just 
sees the order:
 # FIRST: Application A lock taken followed by Application B.
 # SECOND: Application B lock is taken followed by Application A.

That triggers the detection. The fact this sequence is only possible because we 
have a guarantee in our code that between FIRST and SECOND all locks are 
released without exception cannot be expressed in rules.

BTW: running with deadlock detection in production is a really bad idea. It 
causes a lot of overhead.

> Deadlock detected during preemption
> ---
>
> Key: YUNIKORN-2646
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2646
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Dmitry
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
> Attachments: yunikorn-logs-lock.txt.gz, yunikorn-logs.txt.gz
>
>
> Hitting deadlocks in 1.5.1
> The log is attached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2782) Cleanup dead code in cache/context

2024-08-01 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2782:
---

 Summary: Cleanup dead code in cache/context
 Key: YUNIKORN-2782
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2782
 Project: Apache YuniKorn
  Issue Type: Task
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


In the cache context we have a number of functions that only get called from 
tests. We need to clean up and only use one version:
 * RemoveApplication & RemoveApplicationInternal
We should only have RemoveApplication but the internal version is used 
everywhere
 * UpdateApplication is not used at all



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2709) Update website for 1.5.2

2024-07-31 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2709.
-
Fix Version/s: 1.5.2
   Resolution: Fixed

release is done

> Update website for 1.5.2
> 
>
> Key: YUNIKORN-2709
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2709
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2708) Release notes for 1.5.2

2024-07-31 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2708.
-
Fix Version/s: 1.5.2
   Resolution: Fixed

release is done

> Release notes for 1.5.2
> ---
>
> Key: YUNIKORN-2708
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2708
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available, release
> Fix For: 1.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2764) Consider to log explicit placeholder release reason to originator pod

2024-07-24 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868536#comment-17868536
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2764:
-

This requires a change in the event processing in the k8shim. The current event 
does not allow us to add the message. The way the events are created uses a 
fixed list of values. So while the core sends the detail the app event does not 
have the option to add this.

We need to have a good look at these app and task events and states in the next 
release as most are not really used.

> Consider to log explicit placeholder release reason to originator pod
> -
>
> Key: YUNIKORN-2764
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2764
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Yu-Lin Chen
>Priority: Major
> Attachments: image-2024-07-19-21-48-54-829.png
>
>
> When placeholders allocation are released with terminationType 
> `si.TerminationType_TIMEOUT`. The reason could be one of the following:
>  # "releasing allocated placeholders on placeholder timeout" 
> ([Link-1|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L434])
>  
> ([Link-2|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L456])
>  # "releasing placeholders on app complete" 
> ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L360])
>  # “cancel placeholder: resource incompatible” 
> ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L1148])
> Those reasons are encapsulated in 
> *si.AllocationResponse([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/context.go#L901])
>  and passes to shim. However, the shim doesn’t expose them, it simply logs an 
> event to the originator pod with a generic reason 
> ([Link|https://github.com/apache/yunikorn-k8shim/blob/f2819084f8720aa0eec8e1f41a886413b22d93b2/pkg/cache/application.go#L695-L696]):
>  * Type: Warning
>  * Reason: GangScheduling
>  * Message: Application XX placeholder has been timed out
> We could consider to expose the true reason to originator pod. Ex: (In 
> originator pod.)
>  * Type: Warning
>  * Reason: GangScheduling
>  * Message: placeholder xxx has been released. (reason: cancel placeholder: 
> resource incompatible)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2764) Consider to log explicit placeholder release reason to originator pod

2024-07-24 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2764:

Target Version: 1.7.0

> Consider to log explicit placeholder release reason to originator pod
> -
>
> Key: YUNIKORN-2764
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2764
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Yu-Lin Chen
>Priority: Major
> Attachments: image-2024-07-19-21-48-54-829.png
>
>
> When placeholders allocation are released with terminationType 
> `si.TerminationType_TIMEOUT`. The reason could be one of the following:
>  # "releasing allocated placeholders on placeholder timeout" 
> ([Link-1|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L434])
>  
> ([Link-2|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L456])
>  # "releasing placeholders on app complete" 
> ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L360])
>  # “cancel placeholder: resource incompatible” 
> ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L1148])
> Those reasons are encapsulated in 
> *si.AllocationResponse([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/context.go#L901])
>  and passes to shim. However, the shim doesn’t expose them, it simply logs an 
> event to the originator pod with a generic reason 
> ([Link|https://github.com/apache/yunikorn-k8shim/blob/f2819084f8720aa0eec8e1f41a886413b22d93b2/pkg/cache/application.go#L695-L696]):
>  * Type: Warning
>  * Reason: GangScheduling
>  * Message: Application XX placeholder has been timed out
> We could consider to expose the true reason to originator pod. Ex: (In 
> originator pod.)
>  * Type: Warning
>  * Reason: GangScheduling
>  * Message: placeholder xxx has been released. (reason: cancel placeholder: 
> resource incompatible)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2687) Placeholder Timeout and Replacement Failure in Gang Scheduling

2024-07-14 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865844#comment-17865844
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2687:
-

I agree with [~blue.tzuhua] this is normal gang scheduling. Using YuniKorn 1.3 
and K8s 1.21, you are working with really old releases. The K8s version is out 
of support and since 1.3 we have fixed numerous jiras.

Analysis: Looks like K8s never came back with a response that the placeholder 
for the driver was released when we tried to:
{code:java}
2024-06-20T17:48:27.093ZINFOobjects/application.go:668
ask added successfully to application{"appID": 
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask": 
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "placeholder": false, "pendingDelta": 
"map[memory:1975517184 pods:1 vcore:1000]"}
...
2024-06-20T17:48:27.093ZINFOscheduler/partition.go:828
scheduler replace placeholder processed{"appID": 
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey": 
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "uuid": 
"a53e9cbb-931d-4d1d-95e5-8e2425ba95be", "placeholder released uuid": 
"bdb020e8-708c-4eb7-b48c-fba16155941c"}
...
2024-06-20T17:48:27.094ZINFOcache/application.go:637try 
to release pod from application{"appID": 
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID": 
"bdb020e8-708c-4eb7-b48c-fba16155941c", "terminationType": 
"PLACEHOLDER_REPLACED"}

{code}
That means we just wait for that to happen. We cannot do more than that. Looks 
like you had an issue on the K8s side... 

> Placeholder Timeout and Replacement Failure in Gang Scheduling
> --
>
> Key: YUNIKORN-2687
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2687
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: huangzhir
>Assignee: Tzu-Hua Lan
>Priority: Blocker
>
> h1. *Description:*
> When using gang scheduling with YuniKorn, the driver pod encounters a 
> placeholder timeout, leading to a failure in replacement. The pod shows a 
> pending status for approximately 60 seconds.
> h2. *Observed Behavior:*
>  * The driver pod ({{{}spark-pi-d86d1d9036b8e8e9-driver{}}}) is queued and 
> waiting for allocation.
>  * The pod belongs to the {{spark-driver}} task group and is scheduled as a 
> gang member.
>  * A warning indicating "Placeholder timed out" is logged, and the 
> placeholder is not replaced successfully.
>  * The pod is eventually assigned and bound to a node, and the task completes.
>  * There is a 60-second pending period observed for the driver pod.
> h2. *Pod Status:*
> {code:java}
> kubectl get pod -n spark
> NAME   READY   STATUS
> RESTARTS   AGE
> spark-pi-6d2eea9036f9c838-driver   0/1 Pending   0
>   61s
> tg-spark-driver-spark-b459ba53c0654abe8fe6c7-0 1/1 Terminating   0
>   60s
> tg-spark-executor-spark-b459ba53c0654abe8fe6c7-0   1/1 Running   0
>   60s
> kubectl describe pod spark-pi-6d2eea9036f9c838-driver -n spark
> ..
>   Type Reason AgeFrom  Message
>    --      ---
>   Normal   Scheduling 2m52s  yunikorn  
> spark/spark-pi-d86d1d9036b8e8e9-driver is queued and waiting for allocation
>   Normal   GangScheduling 2m52s  yunikorn  Pod belongs to the 
> taskGroup spark-driver, it will be scheduled as a gang member
>   Warning  Placeholder timed out  113s   yunikorn  Application 
> spark-37606583a9174b1886d039c353fe5be5 placeholder has been timed out
>   Normal   Scheduled  100s   yunikorn  Successfully assigned 
> spark/spark-pi-d86d1d9036b8e8e9-driver to node 10.10.10.66
>   Normal   PodBindSuccessful  100s   yunikorn  Pod 
> spark/spark-pi-d86d1d9036b8e8e9-driver is successfully bound to node 
> 10.10.10.66
>   Normal   TaskCompleted  50syunikorn  Task 
> spark/spark-pi-d86d1d9036b8e8e9-driver is completed
>   Normal   Pulled 99skubelet   Container image 
> "apache/spark:v3.3.2" already present on machine
>   Normal   Created99skubelet   Created container 
> spark-kubernetes-driver
>   Normal   Started99skubelet   Started container 
> spark-kubernetes-driver{code}
> h2. *Scheduler Logs:*
> {code:java}
> 2024-06-20T17:49:26.093ZINFOobjects/application.go:440
> Placeholder timeout, releasing placeholders{"AppID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1, 
> "releasing placeholders": 1}
> 2024-06-20T17:49:26.093ZDEBUGrmproxy/rmproxy.go:59
> enq

[jira] [Comment Edited] (YUNIKORN-2262) propagate the error message when queue creation gets failed

2024-07-14 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865832#comment-17865832
 ] 

Wilfred Spiegelenburg edited comment on YUNIKORN-2262 at 7/15/24 2:30 AM:
--

The implementation in fmt is smarter than I expected. It works because 
{{fmt.Errorf()}} under the hood does what Join does and creates an object that 
implements Unwrap etc. It does so by interpreting the format string and when it 
sees a %w it takes the arg and places it in an array. Adding format string 
scanning overhead etc.

So why not make it explicit and make the code more performant and readable. 
Unless you need a combination of multiple format directives using 
{{fmt.Errorf}} will be slower than just {{errors.Join()}}


was (Author: wifreds):
That works because {{fmt.Errorf()}} under the hood does what the join does and 
creates an object that implements Unwrap etc. It does so by interpreting the 
format string and when it sees a %w it takes the arg and places it in an array. 
Adding format string scanning overhead etc.

So why not make it explicit and make the code more performant and readable. 
Unless you need a combination of multiple format directives using 
{{fmt.Errorf}} will be slower than just {{errors.Join()}}

> propagate the error message when queue creation gets failed
> ---
>
> Key: YUNIKORN-2262
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2262
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Minor
>
> [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334]
> the error message of root cause is swallowed, so it is hard to be inspired by 
> the common message "failed to create rule based queue ..."
> BTW, the error I met is the parent queue "is already a leaf". The error 
> message is helpful and it makes us catch up the root cause easily.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2262) propagate the error message when queue creation gets failed

2024-07-14 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865832#comment-17865832
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2262:
-

That works because {{fmt.Errorf()}} under the hood does what the join does and 
creates an object that implements Unwrap etc. It does so by interpreting the 
format string and when it sees a %w it takes the arg and places it in an array. 
Adding format string scanning overhead etc.

So why not make it explicit and make the code more performant and readable. 
Unless you need a combination of multiple format directives using 
{{fmt.Errorf}} will be slower than just {{errors.Join()}}

> propagate the error message when queue creation gets failed
> ---
>
> Key: YUNIKORN-2262
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2262
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Minor
>
> [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334]
> the error message of root cause is swallowed, so it is hard to be inspired by 
> the common message "failed to create rule based queue ..."
> BTW, the error I met is the parent queue "is already a leaf". The error 
> message is helpful and it makes us catch up the root cause easily.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2757) Consider adding new field `resolvedMaxResource` to queue dao to show the true limit

2024-07-14 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865830#comment-17865830
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2757:
-

Not sure this adds anything. Max resources is either set or not. Max resources 
of a child can never be more than what is set on the parent. Without taking 
usage into account it does not really matter what max is and where it came from.

Lets take an example: I have a hierarchy with 4 levels, level 1 being the root 
and level 4 the leaf. A max is set at level 2 only. The root has a dynamic 
maximum based on the cluster size. There is no way of interpreting a maximum at 
the leaf level 4. What is the impact of that maximum if I do not know what my 
queue structure looks like? Do I have 1 or 10 queues at level 3? How many 
children has each level 3 queue? Are there any maximums sets at level 3? Are 
there any siblings under the parent queue of the level 4 queue I am looking at? 
Do sibling queues of my level 4 queue have maximums set or not...

The only time the resolved maximum from the parent would come into play is when 
a gang is submitted. We reject the application is it is larger than this 
resolved maximum. If that rejections is not clear enough we can improve that.

For scheduling the resolved maximum is also irrelevant. We use the headroom of 
a queue to decide if the allocation fits. Current usage linked to maximums 
gives the correct picture. That only works at the specific queue level. 

> Consider adding new field `resolvedMaxResource` to queue dao to show the true 
> limit
> ---
>
> Key: YUNIKORN-2757
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2757
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Priority: Major
>
> The true max resources of queue is based on all parents. It could be hard to 
> see/understand the true "max resources" of queue by human eyes if there is a 
> huge queue trees.
> Hence, it would be nice to add the "resolved" max resources to restful APIs. 
> Also, our UI can leverages the field to help users to understand which max 
> resource will be used by this queue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2323) Gang scheduling user experience issues

2024-07-12 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865361#comment-17865361
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2323:
-

The issue #2 as mentioned has been fixed via [github PR 
#876|https://github.com/apache/yunikorn-k8shim/pull/876] 

We now send additional events for gang scheduling covering:
    * placeholder timeout (resuming state)
    * placeholder creation
    * placeholder create failure(s)

I think with that we can close this jira.

> Gang scheduling user experience issues
> --
>
> Key: YUNIKORN-2323
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2323
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Affects Versions: 1.4.0
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> In case of any issues, users are finding it bit difficult to understand what 
> is going on with the gang app. 
> Issue 1:
> "driver pod is getting struck"
> At times, when driver pod is not able to run successfully for some reasons, 
> users are getting the perspective that pod is getting struck and app is 
> hanged, not moving further. Users are waiting for some time and don't 
> understand the clear picture. How do we close the gap quickly and communicate 
> accordingly through events?
> Issue 2:
> ResumeApplication is fired when all ph's are timed out. Do we need to inform 
> the users about this event as they may not clue any about this significant 
> change?
> Issue 3: 
> When Gang app ph's are in progress (and allocated), when there is request for 
> real asks and there is resource crunch, do we need to trigger auto scaling?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2738) Only check failure reason once not for every pod

2024-07-12 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2738.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

> Only check failure reason once not for every pod
> 
>
> Key: YUNIKORN-2738
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2738
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The reason for an application failure does not change and can be 
> pre-calculated for all pods when a failure is handled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2262) propagate the error message when queue creation gets failed

2024-07-12 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865345#comment-17865345
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2262:
-

The join is a more generic way to handle it. It makes it really easy to use a 
predefined error like "parent rule returned a leaf" or "queue not found" etc 
and test for them. In other projects I have used similar constructs in test and 
production code.

This is an example to check if the exit of a http server was a crash or a 
normal shutdown:
{code:java}
if httpError != nil && !errors.Is(httpError, http.ErrServerClosed) {
log.Logger().Errorw("Failed to start web server", "error", httpError)
}
{code}
If you use the %w you would need to use string contains etc for these kinds of 
checks. Really fragile, the slightest change can break that. Using the join 
makes code readable and if we use predefined errors it will not break.

It also make checks in tests simple: did we really fail for the right reason or 
did someone break the code and we failed unexpectedly

> propagate the error message when queue creation gets failed
> ---
>
> Key: YUNIKORN-2262
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2262
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Minor
>
> [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334]
> the error message of root cause is swallowed, so it is hard to be inspired by 
> the common message "failed to create rule based queue ..."
> BTW, the error I met is the parent queue "is already a leaf". The error 
> message is helpful and it makes us catch up the root cause easily.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2262) propagate the error message when queue creation gets failed

2024-07-11 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865303#comment-17865303
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2262:
-

Please use [error joining|https://pkg.go.dev/errors#Join] for this to show we 
have wrapped the error. It will make testing etc easier than re-writing using 
%w 

> propagate the error message when queue creation gets failed
> ---
>
> Key: YUNIKORN-2262
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2262
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chenchen Lai
>Priority: Minor
>
> [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334]
> the error message of root cause is swallowed, so it is hard to be inspired by 
> the common message "failed to create rule based queue ..."
> BTW, the error I met is the parent queue "is already a leaf". The error 
> message is helpful and it makes us catch up the root cause easily.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2743) Core: Remove TODO regarding time out waiting for draining and removal

2024-07-11 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865301#comment-17865301
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2743:
-

yes and with the proper implementation of YUNIKORN-2688 which should prevent 
new workloads from being added and it will all work.

> Core: Remove TODO regarding time out waiting for draining and removal
> -
>
> Key: YUNIKORN-2743
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2743
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Chenchen Lai
>Priority: Minor
>  Labels: newbie
>
> for remove //TODO comment
> in pkg/scheduler/partition_manager.go
> [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/partition_manager.go#L126]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2735) YuniKorn doesn't schedule correctly after some pods were marked as Unschedulable

2024-07-11 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865271#comment-17865271
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2735:
-

Reservation is not an under the hood optimisation it prevents starvation for 
large(r) allocations or allocations with specific resource requests that fit 
only on specific nodes. If you turn it off you will see a large impact in real 
world scenarios.

Disabling reservation will cause large allocations to be starved in busy 
clusters and we should never do that. The variable for turning it off was 
introduced during the development of the code and should have been removed. 
Surprised that it has survived this long.

We have/had a TODO in the code to make this configurable. Currently it is fixed 
to 2 seconds. It should be a reloadable configuration value. I would also argue 
that the current 2 seconds is too quick and 30 seconds would allow us to be a 
bit more eager.

I would propose the following setup:
 * configuration name: service.ReservationDelay
 * granularity: seconds
 * default: 30 seconds
 * minimum: 2 seconds (allow current behaviour)
 * maximum: 3600 seconds (prevent starvation and turning off reservations)
 * reloadable: true
 * notes:
 ** old reservations are not re-evaluated when the value is changed
 ** settings outside the minimum..maximum range will use the default
 ** when reloading the value is not changed if outside the range

> YuniKorn doesn't schedule correctly after some pods were marked as 
> Unschedulable
> 
>
> Key: YUNIKORN-2735
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2735
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Volodymyr Kot
>Priority: Major
> Attachments: bug-logs, driver.yml, executor.yml, nodestate, podstate
>
>
> It is a bit of an edge case, but I can consistently reproduce this on master 
> - see steps and comments used below:
>  # Create a new cluster with kind, with 4 cpus/8Gb of memory
>  # Deploy YuniKorn using helm
>  # Set up service account for Spark
>  ## "kubectl create serviceaccount spark"
>  ## "kubectl create clusterrolebinding spark-role --clusterrole=edit 
> --serviceaccount=default:spark --namespace=default"
>  # Run kubectl proxy" to be able to run spark-submit
>  # Create Spark application* 1 with driver and 2 executors - fits fully, 
> placeholders are created and replaced
>  # Create Spark application 2 with driver and 2 executors - only one executor 
> placeholder is scheduled, rest of the pods are marked Unschedulable
>  # Delete one of the executors from application 1
>  # Spark driver re-creates the executor, it is marked as unschedulable
>  
> At that point scheduler is "stuck", and won't schedule either executor from 
> application 1 OR placeholder for executor from application 2 - it deems both 
> of those unschedulable. See logs below, and please let me know if I 
> misunderstood something/it is expected behavior!
>  
> *Script used to run spark-submit:
> {code:java}
> ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 
> --deploy-mode cluster --name spark-pi \
>    --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi 
> \
>    --class org.apache.spark.examples.SparkPi \
>    --conf spark.executor.instances=2 \
>    --conf spark.kubernetes.executor.request.cores=0.5 \
>    --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \
>    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>    --conf spark.kubernetes.driver.podTemplateFile=./driver.yml \
>    --conf spark.kubernetes.executor.podTemplateFile=./executor.yml \
>    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0.jar 3 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2735) YuniKorn doesn't schedule correctly after some pods were marked as Unschedulable

2024-07-11 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865266#comment-17865266
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2735:
-

{quote}At that point scheduler is "stuck", and won't schedule either executor 
from application 1 OR placeholder for executor from application 2 - it deems 
both of those unschedulable. See logs below, and please let me know if I 
misunderstood something/it is expected behavior!
{quote}
It is expected behaviour. The scheduler is not stuck. This will resolve itself.

First spark application 2: as this is gang scheduling the placeholders will 
time out (15 min by default). If not all of the placeholders were allocated at 
the point of the timeout a cleanup will be triggered. This removes all 
placeholder pods for the system. Depending on the gang style, hard or soft, we 
either fail the application or release the driver pod for scheduling. At that 
point you are unblocked.

Application 1 pods will get scheduled based on the availability of resources. 
When the placeholder pod(s) time out the existing pending pods will be 
scheduled. At that point the normal sorting rules apply. This _could_ mean that 
the re-submitted executor pod gets scheduled or some other pod that was waiting.

Gang scheduling allows you to reserve resources but it does not guarantee them 
after replacement. If you kill the executor pod and it gets restarted it is 
just another pod on the cluster that needs to be scheduled. It will thus depend 
on your config (FIFO, priority, pod definition etc) how and when that 
scheduling will happen. The newly started executor is really a new pod from the 
K8s view, different submit time etc. If you have FIFO configured it will end up 
in the back of the scheduling queue.

Gang scheduling with the soft style will also not prevent starving a cluster of 
resources. You could have the case that the total gang request is too large to 
fit into the free space on a busy cluster. First triggering reservations 
blocking resources for other applications. Then after the timeout you could 
slowly fill your cluster with driver pods that do not get what they want and 
thus only slowly progress or not progress at all. The only option you have for 
that is limit the number of applications you allow to run in a queue 
(MaxApplications). This case can easily happen in any size cluster.

None of these are real scheduler issues, they are cluster management issues. 
You cannot expect the scheduler to understand the workload you put on a cluster 
and magically adjust.

> YuniKorn doesn't schedule correctly after some pods were marked as 
> Unschedulable
> 
>
> Key: YUNIKORN-2735
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2735
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Volodymyr Kot
>Priority: Major
> Attachments: bug-logs, driver.yml, executor.yml, nodestate, podstate
>
>
> It is a bit of an edge case, but I can consistently reproduce this on master 
> - see steps and comments used below:
>  # Create a new cluster with kind, with 4 cpus/8Gb of memory
>  # Deploy YuniKorn using helm
>  # Set up service account for Spark
>  ## "kubectl create serviceaccount spark"
>  ## "kubectl create clusterrolebinding spark-role --clusterrole=edit 
> --serviceaccount=default:spark --namespace=default"
>  # Run kubectl proxy" to be able to run spark-submit
>  # Create Spark application* 1 with driver and 2 executors - fits fully, 
> placeholders are created and replaced
>  # Create Spark application 2 with driver and 2 executors - only one executor 
> placeholder is scheduled, rest of the pods are marked Unschedulable
>  # Delete one of the executors from application 1
>  # Spark driver re-creates the executor, it is marked as unschedulable
>  
> At that point scheduler is "stuck", and won't schedule either executor from 
> application 1 OR placeholder for executor from application 2 - it deems both 
> of those unschedulable. See logs below, and please let me know if I 
> misunderstood something/it is expected behavior!
>  
> *Script used to run spark-submit:
> {code:java}
> ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 
> --deploy-mode cluster --name spark-pi \
>    --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi 
> \
>    --class org.apache.spark.examples.SparkPi \
>    --conf spark.executor.instances=2 \
>    --conf spark.kubernetes.executor.request.cores=0.5 \
>    --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \
>    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>    --conf spark.kubernetes.driver.podTemplateFile=./driver.yml \
>    --conf spark.kubernetes.executor.podTemplateF

[jira] [Created] (YUNIKORN-2738) Only check failure reason once not for every pod

2024-07-10 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2738:
---

 Summary: Only check failure reason once not for every pod
 Key: YUNIKORN-2738
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2738
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The reason for an application failure does not change and can be pre-calculated 
for all pods when a failure is handled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2737) Cleanup handleFailApplicationEvent handling

2024-07-10 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2737:
---

 Summary: Cleanup handleFailApplicationEvent handling
 Key: YUNIKORN-2737
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2737
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


When we handle a failed application in the shim in 
{{handleFailApplicationEvent()}} we call the placeholder cleanup.
Three issues:
 * The cleanup needs the app lock after it takes the mgr lock. The app lock is 
already held when we process the event. Should be placing the cleanup last to 
not hold the manager lock for longer than needed
 * failing an application is triggered by the core which should do the cleanup 
already so this might be redundant to start with.
 * The failure handling also marks unassigned pods as failed which means there 
is an overlap between the failure handling and the placeholder cleanup which we 
should remove. Either ignore all placeholders in the failure or only cleanup 
assigned placeholders.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go

2024-07-10 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2734.
-
Fix Version/s: 1.6.0
   Resolution: Delivered

The TODO was removed as part of the changes in YUNIKORN-2729.

Since we do not want to make this configurable that is all we need, closing 
again with a link to the Jira that has the change.

> make configurable for pods in k8shim pkg/client/kubeclient.go
> -
>
> Key: YUNIKORN-2734
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2734
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Huang Guan Hao
>Priority: Trivial
>  Labels: newbie
> Fix For: 1.6.0
>
>
> for remove //TODO comment
> in pkg/client/kubeclient.go
> https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141
> Make the grace period for pod deletion configurable.
> Currently, the grace period for deleting pods is hardcoded to 3 seconds.
> This might not be suitable for all use cases, as some pods might require more 
> time to gracefully shut down. In the future, this value should be made 
> configurable, either through a function parameter, configuration file, or 
> environment variable, to provide more flexibility and accommodate different 
> scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go

2024-07-10 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864848#comment-17864848
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2734:
-

Can we just remove the TODO via this Jira or are we going to handle it as part 
of another jira? 

I might not have been clear in my earlier comment:
 * it should not be configurable
 * the TODO must be removed

> make configurable for pods in k8shim pkg/client/kubeclient.go
> -
>
> Key: YUNIKORN-2734
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2734
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Huang Guan Hao
>Priority: Minor
>
> for remove //TODO comment
> in pkg/client/kubeclient.go
> https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141
> Make the grace period for pod deletion configurable.
> Currently, the grace period for deleting pods is hardcoded to 3 seconds.
> This might not be suitable for all use cases, as some pods might require more 
> time to gracefully shut down. In the future, this value should be made 
> configurable, either through a function parameter, configuration file, or 
> environment variable, to provide more flexibility and accommodate different 
> scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go

2024-07-10 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2734:

Labels: newbie  (was: )

> make configurable for pods in k8shim pkg/client/kubeclient.go
> -
>
> Key: YUNIKORN-2734
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2734
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Huang Guan Hao
>Priority: Trivial
>  Labels: newbie
>
> for remove //TODO comment
> in pkg/client/kubeclient.go
> https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141
> Make the grace period for pod deletion configurable.
> Currently, the grace period for deleting pods is hardcoded to 3 seconds.
> This might not be suitable for all use cases, as some pods might require more 
> time to gracefully shut down. In the future, this value should be made 
> configurable, either through a function parameter, configuration file, or 
> environment variable, to provide more flexibility and accommodate different 
> scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go

2024-07-10 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2734:

Priority: Trivial  (was: Minor)

> make configurable for pods in k8shim pkg/client/kubeclient.go
> -
>
> Key: YUNIKORN-2734
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2734
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Huang Guan Hao
>Priority: Trivial
>
> for remove //TODO comment
> in pkg/client/kubeclient.go
> https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141
> Make the grace period for pod deletion configurable.
> Currently, the grace period for deleting pods is hardcoded to 3 seconds.
> This might not be suitable for all use cases, as some pods might require more 
> time to gracefully shut down. In the future, this value should be made 
> configurable, either through a function parameter, configuration file, or 
> environment variable, to provide more flexibility and accommodate different 
> scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2703) Scheduler does not honor default queue setting from the ConfigMap

2024-07-10 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2703:

Target Version: 1.6.0, 1.5.2  (was: 1.6.0)

> Scheduler does not honor default queue setting from the ConfigMap
> -
>
> Key: YUNIKORN-2703
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2703
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Major
>  Labels: pull-request-available
>
> YUNIKORN-1650 added an override for default queue name in the config map to 
> solve for the scenario where the provided placement rule is evaluated before 
> other rules.
> Scheduler also adds a default queue if the pod labels or annotations does not 
> define a queue name. Because this happens before the placement rules are 
> evaluated, we end up in the same situation of applications getting placed in 
> the default queue and ignoring all other placement rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Assigned] (YUNIKORN-2652) Expand getApplication() endpoint handler to optionally return resource usage

2024-07-10 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reassigned YUNIKORN-2652:
---

Assignee: Rich Scott

> Expand getApplication() endpoint handler to optionally return resource usage
> 
>
> Key: YUNIKORN-2652
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2652
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Rich Scott
>Assignee: Rich Scott
>Priority: Major
>  Labels: pull-request-available
>
> Some users would like to be able to see resource usage (preempted, 
> placeholder resource, etc) for applications that have been completed. The 
> `getApplication()` endpoint handler should be enhanced to take an optional 
> parameter specifying that the user would like details about resources 
> included in the response, and a new `ApplicationXXXDAOInfo` object that is a 
> slight superset of `ApplicationDAOInfo` should be introduced, and can be used 
> in the response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2652) Expand getApplication() endpoint handler to optionally return resource usage

2024-07-10 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2652:

Target Version: 1.6.0

> Expand getApplication() endpoint handler to optionally return resource usage
> 
>
> Key: YUNIKORN-2652
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2652
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Rich Scott
>Priority: Major
>  Labels: pull-request-available
>
> Some users would like to be able to see resource usage (preempted, 
> placeholder resource, etc) for applications that have been completed. The 
> `getApplication()` endpoint handler should be enhanced to take an optional 
> parameter specifying that the user would like details about resources 
> included in the response, and a new `ApplicationXXXDAOInfo` object that is a 
> slight superset of `ApplicationDAOInfo` should be introduced, and can be used 
> in the response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go

2024-07-10 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864611#comment-17864611
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2734:
-

This should only used for placeholders. YuniKorn should not delete any other 
pods.

Therefor It should not be configurable. The TODO should not be there.

> make configurable for pods in k8shim pkg/client/kubeclient.go
> -
>
> Key: YUNIKORN-2734
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2734
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Huang Guan Hao
>Priority: Minor
>
> for remove //TODO comment
> in pkg/client/kubeclient.go
> https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141
> Make the grace period for pod deletion configurable.
> Currently, the grace period for deleting pods is hardcoded to 3 seconds.
> This might not be suitable for all use cases, as some pods might require more 
> time to gracefully shut down. In the future, this value should be made 
> configurable, either through a function parameter, configuration file, or 
> environment variable, to provide more flexibility and accommodate different 
> scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2688) new applications get placed in draining queue

2024-07-09 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864038#comment-17864038
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2688:
-

With the use of the placement rules always we should already do most of this:
 * if the queue is not found and cannot be created by the rule no queue name is 
returned and the next rule is checked
 * managed queues (via the config) get their state reset after YUNIKORN-2527 
when added back
 * dynamic queues get cleaned up regardless of their state

For a dynamic queue to be marked as _draining_ a configured parent of that 
dynamic queue must be removed. That case needs extra work in YUNIKORN-2689

We need to check in the AppPlacementManager.PlaceApplication() method in for 
the queue state. like we do for the submit access of the user. We should also 
cleanup and dedupe the submit access check in that method.

Would be good to get this in for 1.6 inline with the changes for the default 
queue from YUNIKORN-2703 and YUNIKORN-2711

> new applications get placed in draining queue
> -
>
> Key: YUNIKORN-2688
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2688
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Hengzhe Guo
>Assignee: Hengzhe Guo
>Priority: Major
>
> The status of the queue isn't checked when placing new applications. We saw a 
> case where new applications keep getting submitted to a draining queue and 
> the queue can't be really deleted for days. 
> a unit test can confirm:
> {code:java}
> diff --git a/pkg/scheduler/placement/placement_test.go 
> b/pkg/scheduler/placement/placement_test.go
> index 14fe6ac..4f53e0b 100644
> --- a/pkg/scheduler/placement/placement_test.go
> +++ b/pkg/scheduler/placement/placement_test.go
> @@ -294,6 +294,20 @@ partitions:
>         if err == nil || queueName != "" {
>                 t.Errorf("parent queue: app should not have been placed, 
> queue: '%s', error: %v", queueName, err)
>         }
> +
> +       // user rule existing queue, the queue is draining
> +       tags = make(map[string]string)
> +       user = security.UserGroup{
> +               User:   "testchild",
> +               Groups: []string{},
> +       }
> +       app = newApplication("app1", "default", "", user, tags, nil, "")
> +       queueFunc("root.testparent.testchild").MarkQueueForRemoval()
> +       err = man.PlaceApplication(app)
> +       queueName = app.GetQueuePath()
> +       if err == nil || queueName != "" {
> +               t.Errorf("draining queue: app should not have been placed, 
> queue: '%s', error: %v", queueName, err)
> +       }
>  } func TestForcePlaceApp(t *testing.T) { {code}
> For a queue not creatable, we should expect the app to be rejected.
> For a queue creatable, we should expect the queue to be transitioned back to 
> active state, which is blocked by 
> [YUNIKORN-2689|https://issues.apache.org/jira/browse/YUNIKORN-2689]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Comment Edited] (YUNIKORN-2689) transition draining queues back to active if they are added back

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864030#comment-17864030
 ] 

Wilfred Spiegelenburg edited comment on YUNIKORN-2689 at 7/9/24 6:54 AM:
-

This might need an update for dynamic queues that were created below the 
managed queue.They would not get reset with the current fix.

Example for a dynamic queue: root.parent..

Placement rules:
 * User
 ** Tag for namespace
 *** fixed root.parent

Remove the queue root.parent and all queues below will be marked as draining. 
When we add root.parent back we should reset the state of all dynamic queues 
below it to make YUNIKORN-2688 possible. We should *not* change the queues 
below parent that are managed 


was (Author: wifreds):
This might need an update for dynamic queues that were created below the 
managed queue.They would not get reset with the current fix.

> transition draining queues back to active if they are added back
> 
>
> Key: YUNIKORN-2689
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2689
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Hengzhe Guo
>Assignee: Hengzhe Guo
>Priority: Minor
>
> When a queue is removed but still has jobs running in it, it will be in 
> 'draining' state. At this stage, if the queue is added back, we should expect 
> the queue to be transitioned back to active state. However, such transition 
> is not found in the code base. We observed a case where a queue removed and 
> soon added back eventually ended up deleted after all jobs were drained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2689) transition draining queues back to active if they are added back

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864030#comment-17864030
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2689:
-

This might need an update for dynamic queues that were created below the 
managed queue.They would not get reset with the current fix.

> transition draining queues back to active if they are added back
> 
>
> Key: YUNIKORN-2689
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2689
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Hengzhe Guo
>Assignee: Hengzhe Guo
>Priority: Minor
>
> When a queue is removed but still has jobs running in it, it will be in 
> 'draining' state. At this stage, if the queue is added back, we should expect 
> the queue to be transitioned back to active state. However, such transition 
> is not found in the code base. We observed a case where a queue removed and 
> soon added back eventually ended up deleted after all jobs were drained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2689) transition draining queues back to active if they are added back

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864018#comment-17864018
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2689:
-

I think this has been fixed in YUNIKORN-2527 for queues that are configured in 
the configmap, i.e. managed queues, the state will be reset. For dynamic queues 
we should not have a state that is important. That Jira was only fixed recently 
and will only be part of the  1.6 release.

> transition draining queues back to active if they are added back
> 
>
> Key: YUNIKORN-2689
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2689
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Hengzhe Guo
>Assignee: Hengzhe Guo
>Priority: Minor
>
> When a queue is removed but still has jobs running in it, it will be in 
> 'draining' state. At this stage, if the queue is added back, we should expect 
> the queue to be transitioned back to active state. However, such transition 
> is not found in the code base. We observed a case where a queue removed and 
> soon added back eventually ended up deleted after all jobs were drained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2714) e2e test to ensure queue name with all allowed characters

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2714:

Labels: newbie  (was: )

> e2e test to ensure queue name with all allowed characters
> -
>
> Key: YUNIKORN-2714
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2714
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes, test - e2e
>Reporter: Manikandan R
>Priority: Major
>  Labels: newbie
>
> Create a e2e test to ensure queue name with all allowed special characters 
> goes through successfully. This is mainly required to confirm there is no 
> breakage in REST API url because of special characters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2713) Use queue specific REST API directly

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2713:

Labels: newbie  (was: )

> Use queue specific REST API directly
> 
>
> Key: YUNIKORN-2713
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2713
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes, test - e2e
>Reporter: Manikandan R
>Priority: Major
>  Labels: newbie
>
> There are some places in e2e tests using old way to fetching all queues for 
> the given partition, then fetch queue specific info in next call. Instead, 
> Queue info can be fetched directly in a single call. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2717:

Labels: newbie  (was: )

> Assert invalid queue name in get queue applications handler
> ---
>
> Key: YUNIKORN-2717
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2717
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - scheduler
>Reporter: Manikandan R
>Priority: Major
>  Labels: newbie
>
> Assert invalid queue name in TestGetQueueApplicationsHandler test method 
> using 
> assertQueueInvalid(). Also cleanup the method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2712) Missing specific param error for REST API

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2712:

Priority: Minor  (was: Major)

> Missing specific param error for REST API
> -
>
> Key: YUNIKORN-2712
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2712
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Priority: Minor
>  Labels: newbie
>
> Some REST API's throw "missing specific param" kind of errors, but not all. 
> For example, user name is missing. Similarly, all mandatory parameters in 
> other REST API's can follow the same pattern. It is very clear, rather than 
> saying "doesn't exists" kind of error.
> Suggestion given in 
> [https://github.com/apache/yunikorn-core/pull/905#discussion_r1663068429] can 
> be used as reference for implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2717:

Priority: Minor  (was: Major)

> Assert invalid queue name in get queue applications handler
> ---
>
> Key: YUNIKORN-2717
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2717
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - scheduler
>Reporter: Manikandan R
>Priority: Minor
>  Labels: newbie
>
> Assert invalid queue name in TestGetQueueApplicationsHandler test method 
> using 
> assertQueueInvalid(). Also cleanup the method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2712) Missing specific param error for REST API

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2712:

Labels: newbie  (was: )

> Missing specific param error for REST API
> -
>
> Key: YUNIKORN-2712
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2712
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Priority: Major
>  Labels: newbie
>
> Some REST API's throw "missing specific param" kind of errors, but not all. 
> For example, user name is missing. Similarly, all mandatory parameters in 
> other REST API's can follow the same pattern. It is very clear, rather than 
> saying "doesn't exists" kind of error.
> Suggestion given in 
> [https://github.com/apache/yunikorn-core/pull/905#discussion_r1663068429] can 
> be used as reference for implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2719) Assert invalid group name in Get Group REST API

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2719:

Labels: newbie  (was: )

> Assert invalid group name in Get Group REST API
> ---
>
> Key: YUNIKORN-2719
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2719
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: core - scheduler
>Reporter: Manikandan R
>Priority: Major
>  Labels: newbie
>
> Assert invalid group name in Get Group REST API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2720) Use createRequest() in handlers_test.go

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2720:

Labels: newbie  (was: )

> Use createRequest() in handlers_test.go
> ---
>
> Key: YUNIKORN-2720
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2720
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Manikandan R
>Priority: Major
>  Labels: newbie
>
> Use createRequest() helper methods where ever applicable in handlers_test.go. 
> handlers_test.go is huge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2729) remove `--new-from-rev` from Makefile

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864012#comment-17864012
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2729:
-

I am all for this as long as we fix all the issues. It is not good enough to 
fix most. We need a clean  {{make lint}} result. If we do not the pre-commit 
tests will fail. When the pre-commit tests fail in the linter no unit tests are 
run which means no commit.

> remove `--new-from-rev` from Makefile
> -
>
> Key: YUNIKORN-2729
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2729
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Chia-Ping Tsai
>Assignee: Huang Guan Hao
>Priority: Minor
>  Labels: pull-request-available
>
> It is time to show the power of lint :)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-2629) Adding a node can result in a deadlock

2024-07-08 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864000#comment-17864000
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2629:
-

[~jshmchenxi] The latest stack trace you attached shows no deadlock or even 
locking inside the core or shim code. You have a different issue, not related 
to deadlocks. Please open a new Jira for this.

There are 18 occurrences of calls that reference the semaphore code (locks):
 * 9 from K8s shared informers waiting for object updates to come from K8s
 * 9 from K8s network data readers

Those are expected. If no data is transmitted and being processed by the K8s 
informers they should sit there and wait.

No other code has any locks. When I look at the YuniKorn code references in the 
stack trace I can see an idle scheduler. Nothing is being processed on the 
K8shim side, and it is sleeping waiting for changes. The core side is also not 
scheduling and sleeping.

There is one go routine that jumps out for me: 

 
{code:java}
goroutine 19661710185 [IO wait]
...
created by golang.org/x/net/http2.(*ClientConn).goRun in goroutine 19661710184 
golang.org/x/net@v0.23.0/http2/transport.go:369 +0x2d
{code}
The go routine mentioned in the created by does not exist in the dump. Not sure 
if that just means it still needs to timeout or something else is happening but 
this is not the deadlock as per this jira.

 

> Adding a node can result in a deadlock
> --
>
> Key: YUNIKORN-2629
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2629
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Affects Versions: 1.5.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.5.2
>
> Attachments: updateNode_deadlock_trace.txt, 
> yunikorn-scheduler-20240627.log, yunikorn_stuck_stack_20240708.txt
>
>
> Adding a new node after Yunikorn state initialization can result in a 
> deadlock.
> The problem is that {{Context.addNode()}} holds a lock while we're waiting 
> for the {{NodeAccepted}} event:
> {noformat}
>dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, 
> func(event interface{}) {
>   nodeEvent, ok := event.(CachedSchedulerNodeEvent)
>   if !ok {
>   return
>   }
>   [...] removed for clarity
>   wg.Done()
>   })
>   defer dispatcher.UnregisterEventHandler(handlerID, 
> dispatcher.EventTypeNode)
>   if err := 
> ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateNode(&si.NodeRequest{
>   Nodes: nodesToRegister,
>   RmID:  schedulerconf.GetSchedulerConf().ClusterID,
>   }); err != nil {
>   log.Log(log.ShimContext).Error("Failed to register nodes", 
> zap.Error(err))
>   return nil, err
>   }
>   // wait for all responses to accumulate
>   wg.Wait()  <--- shim gets stuck here
>  {noformat}
> If tasks are being processed, then the dispatcher will try to retrieve the 
> evend handler, which is returned from Context:
> {noformat}
> go func() {
>   for {
>   select {
>   case event := <-getDispatcher().eventChan:
>   switch v := event.(type) {
>   case events.TaskEvent:
>   getEventHandler(EventTypeTask)(v)  <--- 
> eventually calls Context.getTask()
>   case events.ApplicationEvent:
>   getEventHandler(EventTypeApp)(v)
>   case events.SchedulerNodeEvent:
>   getEventHandler(EventTypeNode)(v)  
> {noformat}
> Since {{addNode()}} is holding a write lock, the event processing loop gets 
> stuck, so {{registerNodes()}} will never progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-2651) Update the unchecked error for make lint warnings

2024-06-30 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-2651:

 Fix Version/s: (was: 1.6.0)
Target Version: 1.6.0

> Update the unchecked error for make lint warnings
> -
>
> Key: YUNIKORN-2651
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2651
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Chia-Ping Tsai
>Assignee: Yun Sun
>Priority: Major
>  Labels: pull-request-available
>
> fix the lint about "unhandled error"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



  1   2   3   4   5   6   7   8   9   10   >