[jira] [Commented] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps
[ https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889433#comment-17889433 ] Wilfred Spiegelenburg commented on YUNIKORN-2860: - Not possible because the placeholder pods are also used to trigger autoscaling of a cluster. You need pods for that as the cluster autoscaler needs to know the details. Without pods you will not scale up and cause all kinds of issues. > submit gang applications Simultaneously may cause unexpected pending apps > --- > > Key: YUNIKORN-2860 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2860 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2 >Reporter: shawn >Assignee: Qi Zhu >Priority: Major > Attachments: image-2024-09-11-15-41-12-142.png, > image-2024-09-11-15-42-07-739.png, image-2024-09-13-15-33-13-964.png, > image-2024-09-13-15-33-19-380.png, image-2024-09-13-15-35-26-177.png, > state-dump.txt, yunikorn-scheduler.txt > > > > I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get > pending, while two pgs get running, that's not expected. > It can be reproduced as follows: > queues > 1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n > yunikorn > * queues.yaml > {code:java} > partitions: > - name: default > queues: > - name: root > queues: > - name: my-dev > submitacl: "*" > resources: > guaranteed: { memory: 1G, vcore: 1 } > max: { memory: 2G, vcore: 2 }{code} > 2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while > gang-scheduling-job-example1-4.yaml only differ in name and applicationId > {code:java} > apiVersion: batch/v1 > kind: Job > metadata: > name: gang-scheduling-job-example1 > spec: > completions: 2 > parallelism: 2 > template: > metadata: > labels: > app: sleep > applicationId: "gang-scheduling-job-example1" > queue: root.my-dev > annotations: > yunikorn.apache.org/task-group-name: task-group-example-0 > yunikorn.apache.org/task-groups: |- > [{ > "name": "task-group-example-0", > "minMember": 2, > "minResource": { > "cpu": "1", > "memory": "1G" > }, > "nodeSelector": {}, > "tolerations": [], > "affinity": {} > }] > spec: > schedulerName: yunikorn > restartPolicy: Never > containers: > - name: sleep30 > image: "nginx:latest" > command: ["sleep", "9"] > resources: > requests: > cpu: "1" > memory: "1G" {code} > finally,kubectl get pods -n default gets unexpected result(not always > reproducible) > !image-2024-09-11-15-41-12-142.png! > > app state as follows > !image-2024-09-11-15-42-07-739.png|width=754,height=280! > full state dump as state-dump.txt, yunikorn scheduler logs are in > yunikorn-scheduler.txt > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2926) The Pod using gang scheduling is stuck in the Pending state
[ https://issues.apache.org/jira/browse/YUNIKORN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889430#comment-17889430 ] Wilfred Spiegelenburg commented on YUNIKORN-2926: - When the placeholders are released we do not immediately try to place the real pod (the larger one) on a node. We cannot do that as we need to track changes. The released placeholder must be processed before we look again. If the real pod is larger than the placeholder the large resource requirement might not fit in the queue or any node and thus never get scheduled. So we process them as normal allocations with all the checks. Depending on the difference you might be able to accomodate all real pods or just a fraction of them. These scenarios mean that in the pods stay pending and there is nothing wrong. You need to do a proper analysis of why the pod stays pending. Nothing provided here shows that we have a problem. > The Pod using gang scheduling is stuck in the Pending state > --- > > Key: YUNIKORN-2926 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2926 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: wangzhihui >Priority: Minor > Fix For: 1.5.0 > > Attachments: image-2024-10-15-11-54-33-458.png, image.png > > > desc: > The reason for the real allocation is larger than all placeholder,Then > release all allocations。Causing all Pods is Pending state. > !image-2024-10-15-11-54-33-458.png! > !image.png! > {code:java} > // code placeholder > apiVersion: batch/v1 > kind: Job > metadata: > name: simple-gang-job > spec: > completions: 2 > parallelism: 2 > template: > metadata: > labels: > app: sleep > applicationId: "simple-gang-job" > queue: root.default > annotations: > yunikorn.apache.org/schedulingPolicyParameters: > "placeholderTimeoutInSeconds=30 gangSchedulingStyle=Hard" > yunikorn.apache.org/task-group-name: task-group-example > yunikorn.apache.org/task-groups: |- > [{ > "name": "task-group-example", > "minMember": 1, > "minResource": { > "cpu": "100m", > "memory": "50M" > }, > "nodeSelector": {}, > "tolerations": [], > "affinity": {}, > "topologySpreadConstraints": [] > }] > spec: > schedulerName: yunikorn > restartPolicy: Never > containers: > - name: sleep30 > image: "alpine:latest" > command: ["sleep", ""] > resources: > requests: > cpu: "200m" > memory: "50M" {code} > solution: > If the app is in Hard mode, it will transition to a Failing state. If it is > in Soft mode, it will transition to a Resuming state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2925) Remove internal objects from application REST response
Wilfred Spiegelenburg created YUNIKORN-2925: --- Summary: Remove internal objects from application REST response Key: YUNIKORN-2925 URL: https://issues.apache.org/jira/browse/YUNIKORN-2925 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The REST api for application objects exposes an internal object type (resource) directly without conversion. That means any internal representation change will break REST compatibility. This should never have happened and needs to be reversed ASAP. All other REST calls The other problem with the exposed information is that it is only accurate for the COMPLETING or COMPLETED state of an application. The data is incomplete at any other state as it is only updated when an allocation finishes. Running allocations are not included. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2907) Queue config processing log spew
Wilfred Spiegelenburg created YUNIKORN-2907: --- Summary: Queue config processing log spew Key: YUNIKORN-2907 URL: https://issues.apache.org/jira/browse/YUNIKORN-2907 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg During configuration updates a shadow queue structure is build based on the new configuration. The shadow structure is then walked and compared to the existing queue structure. Actions are taken based on the existing queue structure: add or remove of queues that exist in new or existing structure. Update if differences are found between queues that exist in new and existing structures. During the build of the shadow structure queue creations are logged. This logs the creation of the whole queue structure. The logs do not make clear the queues are not really added but that it is the shadow structure being created. In case of large queue structures this causes a log spew, and makes the log difficult to read. The actions taken based on the comparison are logged clearly. We need to be able to distinguish between a real create and one for the shadow create in the log. The same code is executed when we create the "real" queue. The creation of the shadow queue structure should not log, log only at debug level and or log with a clear message that it is the shadow structure creation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2901) when creating new queues, queue name is used as queue path
[ https://issues.apache.org/jira/browse/YUNIKORN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2901: Target Version: 1.7.0 See the init in the {{newDynamicQueueInternal()}} the {{newDynamicQueueInternal()}} should follow the same setup for the queue path. The path should be set to the full path. It does not affect scheduling but can affect metrics and logging. > when creating new queues, queue name is used as queue path > -- > > Key: YUNIKORN-2901 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2901 > Project: Apache YuniKorn > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0 >Reporter: Hengzhe Guo >Priority: Major > > At > [https://github.com/apache/yunikorn-core/blame/master/pkg/scheduler/objects/queue.go#L121] > in NewConfiguredQueue, new queue's name is made the path. For non-root > queues, the path is later correctly set as full path at line 137. But several > actions between them use this name as path, causing issues like emitting > metrics with wrong label -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2901) when creating new queues, queue name is used as queue path
[ https://issues.apache.org/jira/browse/YUNIKORN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2901: Affects Version/s: 1.6.0 1.5.0 1.4.0 1.3.0 1.2.0 1.1.0 1.0.0 > when creating new queues, queue name is used as queue path > -- > > Key: YUNIKORN-2901 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2901 > Project: Apache YuniKorn > Issue Type: Bug >Affects Versions: 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0 >Reporter: Hengzhe Guo >Priority: Major > > At > [https://github.com/apache/yunikorn-core/blame/master/pkg/scheduler/objects/queue.go#L121] > in NewConfiguredQueue, new queue's name is made the path. For non-root > queues, the path is later correctly set as full path at line 137. But several > actions between them use this name as path, causing issues like emitting > metrics with wrong label -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2895) Don't add duplicated allocation to node when the allocation ask fails
[ https://issues.apache.org/jira/browse/YUNIKORN-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886295#comment-17886295 ] Wilfred Spiegelenburg commented on YUNIKORN-2895: - There is something more broken. We should never see an ask at that point in the scheduling cycle that has been allocated. The application was write locked, and remains locked, a number of function calls before we get here. We filter all allocations and only proceed with the allocations that have not been allocated. We do that inside the lock. The allocation should only be manipulated through the application which means the allocated flag cannot be changed while scheduling is in progress. I think the issue is located in the maintenance of the {{sortedRequests}} on the application. That list used to be rebuild each cycle but now we insert/delete from the slice. During recovery I think we broke things. Recovery is using the same path as a node addition so this *could* happen on any node add or maybe even on a simple add of an new ask. When we call {{application.AddAllocationAsk}} we check that the object is not allocated. That fact is always true as we create a new ask object from the SI. So we skip to the next step. This triggers a check for an already known outstanding ask. If that outstanding ask is not allocated we replace the object with the new one. We also make sure that we update resources if those have changed on the queues and app (pending). {*}First issue{*}: if the old ask _IS_ allocated we will still replace that allocation with the new one in the requests map. We skip adjusting the pending resources using the already registered ask. This is where it breaks down: the requests list should never contain already allocated objects. It means we have a reference leak, and thus a memory leak. Long after the allocation is removed a reference will be kept in requests that will not get removed until we clean up the application. The GC will thus not remove it. For long running applications with lots of requests this can become significant. {*}Second issue{*}: Caused by the replacement also. The new object is not marked allocated which causes a big problem as we will try and schedule it. We now could have an unallocated and an allocated object with the same key one in requests and one in allocations. After we schedule the second one the allocations list will be updated and we lose the original info. {*}Third issue{*}: independent of the state we proceed to add the ask to the requests. The requests are stored in a map based on the allocation key. Which means we are always only tracking a single ask. Never any duplicates. The sorted requests however is a sorted slice of references to objects. There is no checks in the add into the sorted request slice to replace the existing entry. We will happily add a second one to the slice. Two objects same key they are both considered when scheduling which means we can easily cause issues there. This code of adding allocations and asks needs a proper review. Over time with multiple changes on top of each other we have introduced issues here. > Don't add duplicated allocation to node when the allocation ask fails > - > > Key: YUNIKORN-2895 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2895 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > > When i try to revisit the new update allocation logic, the potential > duplicated allocation to node will happen if the allocation already > allocated. And we try to add the allocation to the node again and don't > revert it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2885) Fix security vulnerabilities in dependencies
[ https://issues.apache.org/jira/browse/YUNIKORN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885865#comment-17885865 ] Wilfred Spiegelenburg commented on YUNIKORN-2885: - Since updating to pnpm v9 dependabot, which we used as a tool to do this for us, no longer works. There is an open issue against dependabot for [pnpm v9 support.|https://github.com/dependabot/dependabot-core/issues/10534] Until that gets fixed we need to make sure that we run this kind of a check and update before each release. We need to have this documented or tracked somewhere to make sure we do not forget when get to YuniKorn 1.7 in a couple of months. [~ccondit] / [~pbacsko] for some more visibility > Fix security vulnerabilities in dependencies > > > Key: YUNIKORN-2885 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2885 > Project: Apache YuniKorn > Issue Type: Improvement > Components: webapp >Reporter: JunHong Peng >Assignee: JunHong Peng >Priority: Major > Labels: pull-request-available > > {{pnpm audit}} report: > [audit-report.md|https://github.com/user-attachments/files/17089735/audit-report.md] > 26 vulnerabilities found > Severity: 12 moderate | 14 high > After Upgrade Angular v18 (#YUNIKORN-2861) Audit Report: > [audit-report.md|https://github.com/user-attachments/files/17164041/audit-report.md] > 8 vulnerabilities found > Severity: 3 moderate | 5 high -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2886) update Spark operator documentation for YuniKorn integration
Wilfred Spiegelenburg created YUNIKORN-2886: --- Summary: update Spark operator documentation for YuniKorn integration Key: YUNIKORN-2886 URL: https://issues.apache.org/jira/browse/YUNIKORN-2886 Project: Apache YuniKorn Issue Type: New Feature Components: documentation Reporter: Wilfred Spiegelenburg Spark Operator 2.0 has been released with full YuniKorn support. We need to update the website and push this information. Spark Operator with YuniKorn details: * Support gang scheduling with Yunikorn * Set schedulerName to Yunikorn * Account for spark.executor.pyspark.memory in Yunikorn gang scheduling See [Spark Operator v2.0.0|https://github.com/kubeflow/spark-operator/releases/tag/v2.0.0] tag for details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2784) Scheduler stuck
[ https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883644#comment-17883644 ] Wilfred Spiegelenburg commented on YUNIKORN-2784: - That pod is not scheduled by YuniKorn. You would need to debug the default scheduler to figure out why that pod is not scheduled. Only pods with the {{schedulerName}} set to YuniKorn are relevant for us to look at. > Scheduler stuck > --- > > Key: YUNIKORN-2784 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2784 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Dmitry >Priority: Major > Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot > 2024-08-02 at 1.20.23 PM.png, Screenshot 2024-09-18 at 7.26.17 PM.png, > dumps.tgz, logs > > > Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending > (screenshot 1). Also all other ones, but these are the most visible and > should be running 100%. > After restarting the scheduler, all get scheduled immediately (screenshot 2). > Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and > `/debug/pprof/goroutine?debug=2` > Also logs from the scheduler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2784) Scheduler stuck
[ https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882842#comment-17882842 ] Wilfred Spiegelenburg commented on YUNIKORN-2784: - Correct there is no instant way to move. That is why we are looking at the change in YUNIKORN-2791. It will expose all pods even the ones not scheduled by YuniKorn inside YuniKorn. Instead of the pods showing up as a usage on the node only we see the pod and can look at possible preemption. This is the same case for all pod types not just daemon sets. You have a limit range set on your cluster. The pods might be tiny when you create them but they are not when you schedule them. The pod asks for 3GB of memory as each container is given a minimum of 1GB. Check the pod for details it is annotated on the pod that the container resources were changed. The limit range will be applied to every pod in the cluster. Which means that a pod with 3 containers each asking for 100MB of memory, 300MB total for the pod, after the limit range application needs 3GB when scheduling. A 10 fold increase. If that happens for all your pods you waste a huge amount of resources. It could explain also why the node is seen as "full" when you expect it to be empty. > Scheduler stuck > --- > > Key: YUNIKORN-2784 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2784 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Dmitry >Priority: Major > Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot > 2024-08-02 at 1.20.23 PM.png, dumps.tgz, logs > > > Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending > (screenshot 1). Also all other ones, but these are the most visible and > should be running 100%. > After restarting the scheduler, all get scheduled immediately (screenshot 2). > Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and > `/debug/pprof/goroutine?debug=2` > Also logs from the scheduler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2784) Scheduler stuck
[ https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882590#comment-17882590 ] Wilfred Spiegelenburg commented on YUNIKORN-2784: - I think we can get into this situation when the node is full and all pods running on the node are either daemonset pods already or have a higher priority than the daemonset pod we are preempting for. That does not seem the case here. Although it is not much different than described above the reason we cannot find pods is slightly different. What I can see is that the pod has a node selector for the node {{prp-perfsonar-1.ucsc.edu}} defined and we have reserved the node {{{}prp-perfsonar-1.ucsc.edu{}}}. That is the correct node. The allocation has the following annotation on it inside YuniKorn: {code:java} "yunikorn.apache.org/requiredNode": "prp-perfsonar-1.ucsc.edu" {code} The question is now why does the node not allow the simple allocation? Tracking back to the state dump and the node shows that we have not enough resources available to place the pod. This is the partial node detail from the dump: {code:java} "nodeID": "prp-perfsonar-1.ucsc.edu", "capacity": { "devices.kubevirt.io/kvm": 1000, "devices.kubevirt.io/tun": 1000, "devices.kubevirt.io/vhost-net": 1000, "ephemeral-storage": 609974506511, "hugepages-1Gi": 0, "hugepages-2Mi": 0, "memory": 16273350656, "pods": 110, "smarter-devices/fuse": 20, "smarter-devices/vfio": 20, "smarter-devices/vfio_vfio": 20, "vcore": 16000 }, "allocated": { "memory": 1073741824, "pods": 1, "vcore": 100 }, "occupied": { "memory": 12673089536, "pods": 15, "vcore": 1883 }, "available": { "devices.kubevirt.io/kvm": 1000, "devices.kubevirt.io/tun": 1000, "devices.kubevirt.io/vhost-net": 1000, "ephemeral-storage": 609974506511, "hugepages-1Gi": 0, "hugepages-2Mi": 0, "memory": 2526519296, "pods": 94, "smarter-devices/fuse": 20, "smarter-devices/vfio": 20, "smarter-devices/vfio_vfio": 20, "vcore": 14017 }, {code} The only other pod that YuniKorn is aware of on that node is another daemonset pod. That pod has the ID dae0ed3b-2cbd-4286-96b6-e220ffcaacb7. The pod we are trying to place is requesting 3Gi of memory and there is only 2.5Gi available. So we resrve the node and try to preempt on that specific node. The other daemon set pod is filtered out and that leaves us with "nothing" to preempt and thus stopped. This is a side effect of running multiple schedulers in the cluster. The node is occupied with pods placed by the default scheduler. YuniKorn does not see those pods (yet) as per YUNIKORN-2791. That leaves us in a state that we cannot find anything to preempt and thus not get the pod up and running. One of the main reasons not to run multiple schedulers in a cluster. > Scheduler stuck > --- > > Key: YUNIKORN-2784 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2784 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Dmitry >Priority: Major > Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot > 2024-08-02 at 1.20.23 PM.png, dumps.tgz, logs > > > Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending > (screenshot 1). Also all other ones, but these are the most visible and > should be running 100%. > After restarting the scheduler, all get scheduled immediately (screenshot 2). > Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and > `/debug/pprof/goroutine?debug=2` > Also logs from the scheduler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2868) [UMBRELLA] YuniKorn 1.6.0 release efforts
[ https://issues.apache.org/jira/browse/YUNIKORN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882562#comment-17882562 ] Wilfred Spiegelenburg commented on YUNIKORN-2868: - [~pbacsko] before we build and publish the helm chart we need the linked PR committed and applied at least locally to the code from which we build the chart. Otherwise [artifact hub|https://artifacthub.io/packages/helm/yunikorn/yunikorn] will not show the correct K8s versions we support. > [UMBRELLA] YuniKorn 1.6.0 release efforts > - > > Key: YUNIKORN-2868 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2868 > Project: Apache YuniKorn > Issue Type: Task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2871) Update website for 1.6.0
[ https://issues.apache.org/jira/browse/YUNIKORN-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2871: Description: Multiple tasks all need to be done at once: * create versioned docs * create release announcement * update downloads page * update roadmap doc * update doap file * K8s supported versions update to add 1.30 and 1.31 was: Multiple tasks all need to be done at once: * create versioned docs * create release announcement * update downloads page * update roadmap doc * update doap file > Update website for 1.6.0 > > > Key: YUNIKORN-2871 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2871 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Multiple tasks all need to be done at once: > * create versioned docs > * create release announcement > * update downloads page > * update roadmap doc > * update doap file > * K8s supported versions update to add 1.30 and 1.31 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2870) Release notes for 1.6.0
[ https://issues.apache.org/jira/browse/YUNIKORN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2870: Description: Jiras have been tagged with release-notes for this version. These jiras need to be added with a special mention in the release notes. [https://issues.apache.org/jira/issues/?filter=12352474] This Jira might require multiple people to help write the release notes for the specific jiras mentioned. > Release notes for 1.6.0 > --- > > Key: YUNIKORN-2870 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2870 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Jiras have been tagged with release-notes for this version. These jiras need > to be added with a special mention in the release notes. > [https://issues.apache.org/jira/issues/?filter=12352474] > This Jira might require multiple people to help write the release notes for > the specific jiras mentioned. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2871) Update website for 1.6.0
[ https://issues.apache.org/jira/browse/YUNIKORN-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2871: Description: Multiple tasks all need to be done at once: * create versioned docs * create release announcement * update downloads page * update roadmap doc * update doap file > Update website for 1.6.0 > > > Key: YUNIKORN-2871 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2871 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Multiple tasks all need to be done at once: > * create versioned docs > * create release announcement > * update downloads page > * update roadmap doc > * update doap file -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps
[ https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881884#comment-17881884 ] Wilfred Spiegelenburg commented on YUNIKORN-2860: - There have been 542 jiras marked as fixed between 1.3.0 and 1.5.2 [1]. So there is lots of change. 1.6.0 adds another 300+ jiras. Some of these kinds of changes came from cleanup of leaks, some were added as we new functionality. Not sure if anyone can exactly state what came from where. [1] jql search query: {{project = YUNIKORN AND status in (Resolved, Closed) AND fixVersion in (1.4.0, 1.5.0, 1,5.1, 1.5.2) ORDER BY key DESC}} > submit gang applications Simultaneously may cause unexpected pending apps > --- > > Key: YUNIKORN-2860 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2860 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2 >Reporter: shawn >Assignee: Qi Zhu >Priority: Major > Attachments: image-2024-09-11-15-41-12-142.png, > image-2024-09-11-15-42-07-739.png, image-2024-09-13-15-33-13-964.png, > image-2024-09-13-15-33-19-380.png, image-2024-09-13-15-35-26-177.png, > state-dump.txt, yunikorn-scheduler.txt > > > > I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get > pending, while two pgs get running, that's not expected. > It can be reproduced as follows: > queues > 1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n > yunikorn > * queues.yaml > {code:java} > partitions: > - name: default > queues: > - name: root > queues: > - name: my-dev > submitacl: "*" > resources: > guaranteed: { memory: 1G, vcore: 1 } > max: { memory: 2G, vcore: 2 }{code} > 2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while > gang-scheduling-job-example1-4.yaml only differ in name and applicationId > {code:java} > apiVersion: batch/v1 > kind: Job > metadata: > name: gang-scheduling-job-example1 > spec: > completions: 2 > parallelism: 2 > template: > metadata: > labels: > app: sleep > applicationId: "gang-scheduling-job-example1" > queue: root.my-dev > annotations: > yunikorn.apache.org/task-group-name: task-group-example-0 > yunikorn.apache.org/task-groups: |- > [{ > "name": "task-group-example-0", > "minMember": 2, > "minResource": { > "cpu": "1", > "memory": "1G" > }, > "nodeSelector": {}, > "tolerations": [], > "affinity": {} > }] > spec: > schedulerName: yunikorn > restartPolicy: Never > containers: > - name: sleep30 > image: "nginx:latest" > command: ["sleep", "9"] > resources: > requests: > cpu: "1" > memory: "1G" {code} > finally,kubectl get pods -n default gets unexpected result(not always > reproducible) > !image-2024-09-11-15-41-12-142.png! > > app state as follows > !image-2024-09-11-15-42-07-739.png|width=754,height=280! > full state dump as state-dump.txt, yunikorn scheduler logs are in > yunikorn-scheduler.txt > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Comment Edited] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps
[ https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881159#comment-17881159 ] Wilfred Spiegelenburg edited comment on YUNIKORN-2860 at 9/12/24 3:51 AM: -- This is partially a K8s issue and partially ours. There is no guarantee that the pods we create for gang scheduling placeholders come back in a pre-defined order. Even if and when we process thing serially there is no guarantee that we get them back in that order. I have seen before that we create 10 placeholders and the first pod that K8s returns to us is the last pod we created. That becomes worse when multiple origins (applications) are involved. We pass on from the k8shim to the core serially. We then schedule based on that ordering. I think we can improve this if we track the applications that have requested gangs in a queue in order of creation and only service the ones that fit in the queue or just the first one out of that list until all placeholders are allocated. That could be investigated for a next release. Not sure if it can work as something like this might have side effects we do not want or cause bigger issues. For instance if the placeholder pods continually fail placements due to predicates or something we need to "escape" this. was (Author: wifreds): This is partially a K8s issue and partially ours. There is no guarantee that the pods we create for gang scheduling placeholders come back in a pre-defined order. Even if and when we process thing serially there is no guarantee that we get them back in that order. I have seen before that we create 10 placeholders and the first pod that K8s returns to us is the last pod we created. That becomes worse when multiple origins (applications) are involved. We pass on from the k8shim to the core serially. I think we can improve this if we track the applications that have requested gangs in a queue in order of creation and only service the ones that fit in the queue or just the first one out of that list until all placeholders are allocated. That could be investigated for a next release. Not sure if it can work as something like this might have side effects we do not want or cause bigger issues. For instance if the placeholder pods continually fail placements due to predicates or something we need to "escape" this. > submit gang applications Simultaneously may cause unexpected pending apps > --- > > Key: YUNIKORN-2860 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2860 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2 >Reporter: shawn >Priority: Major > Attachments: image-2024-09-11-15-41-12-142.png, > image-2024-09-11-15-42-07-739.png, state-dump.txt, yunikorn-scheduler.txt > > > > I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get > pending, while two pgs get running, that's not expected. > It can be reproduced as follows: > queues > 1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n > yunikorn > * queues.yaml > {code:java} > partitions: > - name: default > queues: > - name: root > queues: > - name: my-dev > submitacl: "*" > resources: > guaranteed: { memory: 1G, vcore: 1 } > max: { memory: 2G, vcore: 2 }{code} > 2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while > gang-scheduling-job-example1-4.yaml only differ in name and applicationId > {code:java} > apiVersion: batch/v1 > kind: Job > metadata: > name: gang-scheduling-job-example1 > spec: > completions: 2 > parallelism: 2 > template: > metadata: > labels: > app: sleep > applicationId: "gang-scheduling-job-example1" > queue: root.my-dev > annotations: > yunikorn.apache.org/task-group-name: task-group-example-0 > yunikorn.apache.org/task-groups: |- > [{ > "name": "task-group-example-0", > "minMember": 2, > "minResource": { > "cpu": "1", > "memory": "1G" > }, > "nodeSelector": {}, > "tolerations": [], > "affinity": {} > }] > spec: > schedulerName: yunikorn > restartPolicy: Never > containers: > - name: sleep30 > image: "nginx:latest" > command: ["sleep", "9"] > resources: > requests: > cpu: "1" > memory: "1G" {code} > finally,kubectl get pods -n default gets unexpected result(not always > reproducible) > !image-2024-0
[jira] [Updated] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps
[ https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2860: Target Version: 1.7.0 > submit gang applications Simultaneously may cause unexpected pending apps > --- > > Key: YUNIKORN-2860 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2860 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2 >Reporter: shawn >Priority: Major > Attachments: image-2024-09-11-15-41-12-142.png, > image-2024-09-11-15-42-07-739.png, state-dump.txt, yunikorn-scheduler.txt > > > > I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get > pending, while two pgs get running, that's not expected. > It can be reproduced as follows: > queues > 1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n > yunikorn > * queues.yaml > {code:java} > partitions: > - name: default > queues: > - name: root > queues: > - name: my-dev > submitacl: "*" > resources: > guaranteed: { memory: 1G, vcore: 1 } > max: { memory: 2G, vcore: 2 }{code} > 2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while > gang-scheduling-job-example1-4.yaml only differ in name and applicationId > {code:java} > apiVersion: batch/v1 > kind: Job > metadata: > name: gang-scheduling-job-example1 > spec: > completions: 2 > parallelism: 2 > template: > metadata: > labels: > app: sleep > applicationId: "gang-scheduling-job-example1" > queue: root.my-dev > annotations: > yunikorn.apache.org/task-group-name: task-group-example-0 > yunikorn.apache.org/task-groups: |- > [{ > "name": "task-group-example-0", > "minMember": 2, > "minResource": { > "cpu": "1", > "memory": "1G" > }, > "nodeSelector": {}, > "tolerations": [], > "affinity": {} > }] > spec: > schedulerName: yunikorn > restartPolicy: Never > containers: > - name: sleep30 > image: "nginx:latest" > command: ["sleep", "9"] > resources: > requests: > cpu: "1" > memory: "1G" {code} > finally,kubectl get pods -n default gets unexpected result(not always > reproducible) > !image-2024-09-11-15-41-12-142.png! > > app state as follows > !image-2024-09-11-15-42-07-739.png|width=754,height=280! > full state dump as state-dump.txt, yunikorn scheduler logs are in > yunikorn-scheduler.txt > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2860) submit gang applications Simultaneously may cause unexpected pending apps
[ https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881159#comment-17881159 ] Wilfred Spiegelenburg commented on YUNIKORN-2860: - This is partially a K8s issue and partially ours. There is no guarantee that the pods we create for gang scheduling placeholders come back in a pre-defined order. Even if and when we process thing serially there is no guarantee that we get them back in that order. I have seen before that we create 10 placeholders and the first pod that K8s returns to us is the last pod we created. That becomes worse when multiple origins (applications) are involved. We pass on from the k8shim to the core serially. I think we can improve this if we track the applications that have requested gangs in a queue in order of creation and only service the ones that fit in the queue or just the first one out of that list until all placeholders are allocated. That could be investigated for a next release. Not sure if it can work as something like this might have side effects we do not want or cause bigger issues. For instance if the placeholder pods continually fail placements due to predicates or something we need to "escape" this. > submit gang applications Simultaneously may cause unexpected pending apps > --- > > Key: YUNIKORN-2860 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2860 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2 >Reporter: shawn >Priority: Major > Attachments: image-2024-09-11-15-41-12-142.png, > image-2024-09-11-15-42-07-739.png, state-dump.txt, yunikorn-scheduler.txt > > > > I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get > pending, while two pgs get running, that's not expected. > It can be reproduced as follows: > queues > 1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n > yunikorn > * queues.yaml > {code:java} > partitions: > - name: default > queues: > - name: root > queues: > - name: my-dev > submitacl: "*" > resources: > guaranteed: { memory: 1G, vcore: 1 } > max: { memory: 2G, vcore: 2 }{code} > 2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while > gang-scheduling-job-example1-4.yaml only differ in name and applicationId > {code:java} > apiVersion: batch/v1 > kind: Job > metadata: > name: gang-scheduling-job-example1 > spec: > completions: 2 > parallelism: 2 > template: > metadata: > labels: > app: sleep > applicationId: "gang-scheduling-job-example1" > queue: root.my-dev > annotations: > yunikorn.apache.org/task-group-name: task-group-example-0 > yunikorn.apache.org/task-groups: |- > [{ > "name": "task-group-example-0", > "minMember": 2, > "minResource": { > "cpu": "1", > "memory": "1G" > }, > "nodeSelector": {}, > "tolerations": [], > "affinity": {} > }] > spec: > schedulerName: yunikorn > restartPolicy: Never > containers: > - name: sleep30 > image: "nginx:latest" > command: ["sleep", "9"] > resources: > requests: > cpu: "1" > memory: "1G" {code} > finally,kubectl get pods -n default gets unexpected result(not always > reproducible) > !image-2024-09-11-15-41-12-142.png! > > app state as follows > !image-2024-09-11-15-42-07-739.png|width=754,height=280! > full state dump as state-dump.txt, yunikorn scheduler logs are in > yunikorn-scheduler.txt > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2850) Watch configmap only in yunikorn's deployed namespace
[ https://issues.apache.org/jira/browse/YUNIKORN-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2850: Target Version: 1.7.0 Labels: newbie (was: ) > Watch configmap only in yunikorn's deployed namespace > - > > Key: YUNIKORN-2850 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2850 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Tae-kyeom, Kim >Priority: Major > Labels: newbie > > Currently, Yunikorn uses configmap informer to handle configuration hot > reload. > However, In current implementation informer watches all namespaces even only > need to watch namespace in which yunikorn is deployed. It causes in efficient > behavior when sync and cache configmap states. If there is too many unrelated > configmap in other namespace cause long recovery time to list and memory > presure to handle configmap caches which is redundant. > So, If we could replace configmap informer to namespace restricted one, it > would improve startup / recovery time and reduce memory usage -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2772) Scheduler restart does not preserve app start time
[ https://issues.apache.org/jira/browse/YUNIKORN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879088#comment-17879088 ] Wilfred Spiegelenburg commented on YUNIKORN-2772: - This is a multi step issue. We do not communicate a timestamps when we create an application or an allocation. The issue does not just exist for an application. The allocations are also involved. Sorting apps on a queue is one side of the problem but sorting allocations within in application could also be off. The k8shim creates the application based on what is considered the oldest pod it finds (allocated or still pending). That originator pod create time should set as the application create time. Second point is that each task which converts into an allocation should have a create time set based on the pod detail. These two changes made on the k8shim side need to be communicated into the core and the create steps should pickup these two new values and not use a new timestamp. The create time is currently communicated through a tag on the application as per YUNIKORN-1155 changes to support placeholder timeout fix on recovery. That tag is always set and could be used to set the create time. The allocation can follow the same principal. Starting work on this > Scheduler restart does not preserve app start time > -- > > Key: YUNIKORN-2772 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2772 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Mit Desai >Assignee: Wilfred Spiegelenburg >Priority: Critical > > Whenever the scheduler is restarted, all the applications create time is set > to the current time, ignoring the original value that comes from the API > server. > Due to this, FIFO sorting can show irregularity in scheduling. > If there is an App1 that started 2 days ago and App2 that started 1 day ago, > during scheduler restart, both the apps will get almost same created time > (nano seconds apart). App2 create time can be just a few nano seconds ahead > of App1 and hence App2 gets priority over App1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2846) Throw a warning if a pod has inconsistent metadata in admission controller
[ https://issues.apache.org/jira/browse/YUNIKORN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2846: Fix Version/s: (was: 1.6.0) > Throw a warning if a pod has inconsistent metadata in admission controller > -- > > Key: YUNIKORN-2846 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2846 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Yu-Lin Chen >Assignee: Yu-Lin Chen >Priority: Major > Labels: release-notes > > Similar to YUNIKORN-2810, > If the same metadata (such as queue or applicationID) is configured > inconsistently when submitting a pod request, admission controller should > reject the request in 1.7.0 > > In 1.6.0, we only throw a warning. The rejection will be implemented in 1.7.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2840) sortQueues: fair max performance and correctness change
Wilfred Spiegelenburg created YUNIKORN-2840: --- Summary: sortQueues: fair max performance and correctness change Key: YUNIKORN-2840 URL: https://issues.apache.org/jira/browse/YUNIKORN-2840 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler Reporter: Wilfred Spiegelenburg In YUNIKORN-2678 the fair queue sorting was improved to take guaranteed quota into account correctly. During the review there were two minor points left over that would need improving: * performance * correctness on change Currently {{GetFairMaxResource()}} gets called for each child this does a recursive call back up the queue hierarchy. This is a performance loss specially when sorting a deep hierarchy or a larger number of children. The parent details for a real fair comparison between the children should also not change. When they do, as in the current implementation, two children might use different inputs when sorted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2809) Fix layout of node transition diagram
[ https://issues.apache.org/jira/browse/YUNIKORN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2809. - Fix Version/s: 1.6.0 Resolution: Fixed Thank you [~blue.tzuhua] for your first contribution to the SI repo. Committed the change > Fix layout of node transition diagram > - > > Key: YUNIKORN-2809 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2809 > Project: Apache YuniKorn > Issue Type: Improvement > Components: scheduler-interface >Reporter: Wilfred Spiegelenburg >Assignee: Tzu-Hua Lan >Priority: Trivial > Labels: pull-request-available > Fix For: 1.6.0 > > Attachments: image-2024-08-16-15-57-12-928.png > > > Fix formatting of the node state transition diagram. It is missing white > space and the diagram is not readable at the moment. Screenshot taken from > file after the > [commit|https://github.com/apache/yunikorn-scheduler-interface/blob/38a38685cd4ee2d108f28f6e749ce06cf5db96ce/scheduler-interface-spec.md] > !image-2024-08-16-15-57-12-928.png|width=321,height=184! > {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2838) SI: Update protobuf dependencies
[ https://issues.apache.org/jira/browse/YUNIKORN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2838. - Fix Version/s: 1.6.0 Resolution: Fixed protobuf and grpc updated to current latest versions > SI: Update protobuf dependencies > > > Key: YUNIKORN-2838 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2838 > Project: Apache YuniKorn > Issue Type: Task > Components: scheduler-interface >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Kubernetes 1.31.0 has moved to grpc v1.65.0 and protobuf v1.34.2 upstream. We > should update our own dependencies in the scheduler interface to match. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2818) 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps leave the state
[ https://issues.apache.org/jira/browse/YUNIKORN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875034#comment-17875034 ] Wilfred Spiegelenburg commented on YUNIKORN-2818: - The metrics should be accurate and they are not at all. This is what we track at the moment: !image-2024-08-20-11-36-06-479.png! Looks like Running is the only one we track correctly. We do not even track New, Accepted is broken. Failed and Completed are handled differently which should not be the case. We seem to have a similar issue in the scheduler metrics. Rejected is not tracked in all cases. We do not know really how many applications were submitted as nothing tracks New and Accepted is broken also in that case. > 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps > leave the state > -- > > Key: YUNIKORN-2818 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2818 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Hengzhe Guo >Priority: Major > Attachments: image-2024-08-20-11-36-06-479.png > > > currently its behavior is the same as the applicationSubmission counter > metric, which is increase only, but I think it should reflect the current > number of app in the state in the queue. Like 'running', the metric should be > decrease when an app leave the state -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2818) 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps leave the state
[ https://issues.apache.org/jira/browse/YUNIKORN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2818: Attachment: image-2024-08-20-11-36-06-479.png > 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps > leave the state > -- > > Key: YUNIKORN-2818 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2818 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Hengzhe Guo >Priority: Major > Attachments: image-2024-08-20-11-36-06-479.png > > > currently its behavior is the same as the applicationSubmission counter > metric, which is increase only, but I think it should reflect the current > number of app in the state in the queue. Like 'running', the metric should be > decrease when an app leave the state -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2818) 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps leave the state
[ https://issues.apache.org/jira/browse/YUNIKORN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2818: Attachment: (was: image-2024-08-20-11-35-06-312.png) > 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps > leave the state > -- > > Key: YUNIKORN-2818 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2818 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Hengzhe Guo >Priority: Major > > currently its behavior is the same as the applicationSubmission counter > metric, which is increase only, but I think it should reflect the current > number of app in the state in the queue. Like 'running', the metric should be > decrease when an app leave the state -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2818) 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps leave the state
[ https://issues.apache.org/jira/browse/YUNIKORN-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2818: Attachment: image-2024-08-20-11-35-06-312.png > 'accepted' state of appMetrics of Queue Metrics doesn't decrease when apps > leave the state > -- > > Key: YUNIKORN-2818 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2818 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Hengzhe Guo >Priority: Major > > currently its behavior is the same as the applicationSubmission counter > metric, which is increase only, but I think it should reflect the current > number of app in the state in the queue. Like 'running', the metric should be > decrease when an app leave the state -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2803) Use FitIn for node check
[ https://issues.apache.org/jira/browse/YUNIKORN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2803: Fix Version/s: (was: 1.6.0) > Use FitIn for node check > > > Key: YUNIKORN-2803 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2803 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > > Use FitIn instead of FitInMaxUndef to know whether ask fits in node or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2803) Use FitIn for node check
[ https://issues.apache.org/jira/browse/YUNIKORN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2803: Target Version: 1.6.0 > Use FitIn for node check > > > Key: YUNIKORN-2803 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2803 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > > Use FitIn instead of FitInMaxUndef to know whether ask fits in node or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2253) Support retry when bind volume failed case instead of failing the task
[ https://issues.apache.org/jira/browse/YUNIKORN-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874131#comment-17874131 ] Wilfred Spiegelenburg commented on YUNIKORN-2253: - I think we need to abandon a code change for this and add to the troubleshooting the recommendation to increase the timeout. We then start work on YUNIKORN-2804 as soon as we have forked YuniKorn 1.6. Adding code for something that we can do as well with an already supported configuration value is a bad idea. > Support retry when bind volume failed case instead of failing the task > -- > > Key: YUNIKORN-2253 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2253 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: pull-request-available > > Currently, we support bind volume to pass the time out parameter, but we'd > better support retry bind volume, because the timeout is one of the error for > bind volume fails. > We will benefit a lot if we can retry successfully, it will make task not > failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2809) Fix layout of node transition diagram
Wilfred Spiegelenburg created YUNIKORN-2809: --- Summary: Fix layout of node transition diagram Key: YUNIKORN-2809 URL: https://issues.apache.org/jira/browse/YUNIKORN-2809 Project: Apache YuniKorn Issue Type: Improvement Components: scheduler-interface Reporter: Wilfred Spiegelenburg Attachments: image-2024-08-16-15-57-12-928.png Fix formatting of the node state transition diagram. It is missing white space and the diagram is not readable at the moment. Screenshot taken from file after the [commit|https://github.com/apache/yunikorn-scheduler-interface/blob/38a38685cd4ee2d108f28f6e749ce06cf5db96ce/scheduler-interface-spec.md] !image-2024-08-16-15-57-12-928.png|width=321,height=184! {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2806) Deadlock in preemption after YUNIKORN-2769
[ https://issues.apache.org/jira/browse/YUNIKORN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2806: Summary: Deadlock in preemption after YUNIKORN-2769 (was: Deadlock in preemption) > Deadlock in preemption after YUNIKORN-2769 > -- > > Key: YUNIKORN-2806 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2806 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0 > > > A deadlock exists in TryPreemption() where the current app gets locked twice: > once in TryAllocate(), and again in findEligiblePreemptionVictims() where > apps are iterated to find victims. The current app is not excluded like it > should be, resulting in a deadlock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2806) Deadlock in preemption
[ https://issues.apache.org/jira/browse/YUNIKORN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874105#comment-17874105 ] Wilfred Spiegelenburg commented on YUNIKORN-2806: - To limit comments and requests for backports etc: This was introduced in as part of YUNIKORN-2769 up until that point we had this specific check as part of the victim list. This issue has not been part of any release. It has only existed in master for ~3 days. [PR diff snippet|https://github.com/apache/yunikorn-core/pull/923/files#diff-27632d48eb925e150a33bc92370ceaa66c31048018d11ca7a53a0b50ab7250acL1753] > Deadlock in preemption > -- > > Key: YUNIKORN-2806 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2806 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0 > > > A deadlock exists in TryPreemption() where the current app gets locked twice: > once in TryAllocate(), and again in findEligiblePreemptionVictims() where > apps are iterated to find victims. The current app is not excluded like it > should be, resulting in a deadlock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2098) Change go lint SHA detection (following)
[ https://issues.apache.org/jira/browse/YUNIKORN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2098. - Fix Version/s: 1.6.0 Resolution: Delivered As a side effect of all the clean up work around the linter we no longer use a SHA detection as we are clean. The lint command has been updated as part of other changes to remove all SHA detection code. > Change go lint SHA detection (following) > > > Key: YUNIKORN-2098 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2098 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Dong-Lin Hsieh >Assignee: Dong-Lin Hsieh >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Following https://issues.apache.org/jira/browse/YUNIKORN-285 > Currently, we will always use the "ORIGIN/HEAD" ref. Fallback to "HEAD^" when > "ORIGIN/HEAD" doesn't exist. > This will avoid the 'fatal: Needed a single revision' error in forked repos. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2802) Consider more resources and types which can be pruned when it is zero
[ https://issues.apache.org/jira/browse/YUNIKORN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873414#comment-17873414 ] Wilfred Spiegelenburg commented on YUNIKORN-2802: - Pruning can be looked at for non configuration based values on the queues for instance: * allocated * pending * preempting However we cannot prune max or guaranteed as that changes the semantics of the value. In general: computed values could be pruned, configured values must never be pruned > Consider more resources and types which can be pruned when it is zero > - > > Key: YUNIKORN-2802 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2802 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Qi Zhu >Priority: Minor > > We rethink to support more resources and types which can be pruned, see > details: > [https://github.com/apache/yunikorn-core/pull/943#discussion_r1716250452] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2796) Root queue and partition should not have resource types with 0 values
[ https://issues.apache.org/jira/browse/YUNIKORN-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871937#comment-17871937 ] Wilfred Spiegelenburg commented on YUNIKORN-2796: - BTW: this is more of a display issue than an enforcement issue. Scheduling of resource types that are not known or registered will fail to find a node to run on and will never make it to the point of the quota changes on a queue. > Root queue and partition should not have resource types with 0 values > - > > Key: YUNIKORN-2796 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2796 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > > When we register a node the node available resources get added to the > partition and root queue. When we remove the node the resources get removed > again. Updates do a similar action. > When we no longer have nodes that expose a specific resource we leave the > resource type in the root queue and partition with a 0. It looks strange to > have a maximum with 0 set for the partition or root and contradicts the quota > interpretation documented. > A resource we do not have at a certain point in time should not have a quota > of 0 assigned in the root or partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2796) Root queue and partition should not have resource types with 0 values
Wilfred Spiegelenburg created YUNIKORN-2796: --- Summary: Root queue and partition should not have resource types with 0 values Key: YUNIKORN-2796 URL: https://issues.apache.org/jira/browse/YUNIKORN-2796 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg When we register a node the node available resources get added to the partition and root queue. When we remove the node the resources get removed again. Updates do a similar action. When we no longer have nodes that expose a specific resource we leave the resource type in the root queue and partition with a 0. It looks strange to have a maximum with 0 set for the partition or root and contradicts the quota interpretation documented. A resource we do not have at a certain point in time should not have a quota of 0 assigned in the root or partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2789) Queue internalGetMax should use permissive calculator
[ https://issues.apache.org/jira/browse/YUNIKORN-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871861#comment-17871861 ] Wilfred Spiegelenburg commented on YUNIKORN-2789: - The only place were we use the {{ComponentWiseMin()}} function is in the queue call that we do not want anymore. Pushing through a refactor at the same time: rename {{ComponentWiseMinPermissive()}} to become just {{ComponentWiseMin()}} > Queue internalGetMax should use permissive calculator > - > > Key: YUNIKORN-2789 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2789 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > We have documented for queue resources that: > {quote}Resources that are not specified in the list are not limited, for max > resources, or guaranteed in the case of guaranteed resources. > {quote} > However in the implementation on the queue, internalGetMax, we call > resources.ComponentWiseMin(). This returns 0 values for each type that is not > defined in the two resources passed in. That does not line up. > Example for getting the maximum resources of a queue using GetMaxQueueSet > what I would expect based on the documentation: > > {code:java} > parent: max{memory: 100G} > parent.child: max{vcore: 100} > => result child max{memory: 100G, vcore: 100}{code} > > > currently we get: > {code:java} > parent: max{memory: 100G} > parent.child: max{vcore: 100} > => result child max{memory: 0, vcore: 0}{code} > Similar when we add the root and call GetMaxResource: > {code:java} > root: max{memory: 100G, vcore: 200} > root.parent: max{vcore: 100} > root.parent.child: max{nvidia.com/gpu: 10} >=> result parent max{memory: 0, vcore: 100} > => result child max{memory: 0, vcore: 0, nvidia.com/gpu: 0}{code} > The fact that the resource type does not exist, even in the root, should not > mean a zero set. The nodes that expose the specific resource might not have > been registered or scaled up yet. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2789) Queue internalGetMax should use permissive calculator
[ https://issues.apache.org/jira/browse/YUNIKORN-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2789: Summary: Queue internalGetMax should use permissive calculator (was: Queue internalGetMax should not use permissive calculator) > Queue internalGetMax should use permissive calculator > - > > Key: YUNIKORN-2789 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2789 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > We have documented for queue resources that: > {quote}Resources that are not specified in the list are not limited, for max > resources, or guaranteed in the case of guaranteed resources. > {quote} > However in the implementation on the queue, internalGetMax, we call > resources.ComponentWiseMin(). This returns 0 values for each type that is not > defined in the two resources passed in. That does not line up. > Example for getting the maximum resources of a queue using GetMaxQueueSet > what I would expect based on the documentation: > > {code:java} > parent: max{memory: 100G} > parent.child: max{vcore: 100} > => result child max{memory: 100G, vcore: 100}{code} > > > currently we get: > {code:java} > parent: max{memory: 100G} > parent.child: max{vcore: 100} > => result child max{memory: 0, vcore: 0}{code} > Similar when we add the root and call GetMaxResource: > {code:java} > root: max{memory: 100G, vcore: 200} > root.parent: max{vcore: 100} > root.parent.child: max{nvidia.com/gpu: 10} >=> result parent max{memory: 0, vcore: 100} > => result child max{memory: 0, vcore: 0, nvidia.com/gpu: 0}{code} > The fact that the resource type does not exist, even in the root, should not > mean a zero set. The nodes that expose the specific resource might not have > been registered or scaled up yet. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota
[ https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871845#comment-17871845 ] Wilfred Spiegelenburg commented on YUNIKORN-2790: - Examples 2: kubelet restart with delayed custom resource registration each node has 2 GPUs * root queue max is 1 vcore, 2 GPU (one node with GPU is registered) * root queue usage is 8000 vcore, 3 GPU (old GPU job from before the kubelet restart on second node) * request is for 1000 vcore, 1 GPU Currently: all allocations in the system are *blocked* as the root queue is considered over quota always New behaviour: allocation is *blocked* as allocation requests another GPU while already over quota > GPU node restart could leave root queue always out of quota > --- > > Key: YUNIKORN-2790 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2790 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: pull-request-available, release-notes > > On a node restart the pods assigned and running on a node are not checked > against the quota of the queue(s) they run in. This has multiple reasons. > Pods on a node that are scheduled by YuniKorn and already running must not be > rejected. Rejecting pods could cause lots of side effects. > The combination of a node restart and the reconfiguring a GPU driver could > however cause a secondary issue. The node on restart might not expose the GPU > resource yet. Pods that ran before the restart can be using the GPU resource. > After those pods are added, ignoring quotas, the root queue will show a usage > for a resource that has not been registered yet. > This fact prevents all scheduling from progressing. Even for pods not > requesting the GPU resource. Each scheduling action will check the root queue > quota and fail. This prevents the GPU driver pods to be placed and the GPU to > be registered by the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2794) Resource: Change SubOnlyExisting() to same signature as AddOnlyExisting()
Wilfred Spiegelenburg created YUNIKORN-2794: --- Summary: Resource: Change SubOnlyExisting() to same signature as AddOnlyExisting() Key: YUNIKORN-2794 URL: https://issues.apache.org/jira/browse/YUNIKORN-2794 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg The AddOnlyExisting function takes two resource objects and returns a new object. The SubOnlyExisting method is called on a resource receiver modifying the receiver object. These two should use the same kind of signature taking two resource objects and returning a new object. In most use cases for SubOnlyExisting we do a clone before we call within a locked method on an object that contains the resource. This clone becomes obsolete when we make the change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota
[ https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871829#comment-17871829 ] Wilfred Spiegelenburg commented on YUNIKORN-2790: - To clarify what will change with this fix: Examples 1: lowering a configured queue quota * queue max is 1 vcore, 5 GPU (changed from 10 GPU to 5 GPU) * queue usage is 8000 vcore, 6 GPU * request is for 1000 vcore Currently: allocation is *blocked* (queue is always considered over quota) New behaviour: allocation is allowed Examples 2: kubelet restart with delayed custom resource registration * root queue max is 1 vcore, 0 GPU (no nodes with GPUs are registered yet) * root queue usage is 8000 vcore, 1 GPU (old GPU job from before the kubelet restart) * request is for 1000 vcore Currently: all allocations in the system are *blocked* as the root queue is considered over quota always New behaviour: allocation is allowed > GPU node restart could leave root queue always out of quota > --- > > Key: YUNIKORN-2790 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2790 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: pull-request-available, release-notes > > On a node restart the pods assigned and running on a node are not checked > against the quota of the queue(s) they run in. This has multiple reasons. > Pods on a node that are scheduled by YuniKorn and already running must not be > rejected. Rejecting pods could cause lots of side effects. > The combination of a node restart and the reconfiguring a GPU driver could > however cause a secondary issue. The node on restart might not expose the GPU > resource yet. Pods that ran before the restart can be using the GPU resource. > After those pods are added, ignoring quotas, the root queue will show a usage > for a resource that has not been registered yet. > This fact prevents all scheduling from progressing. Even for pods not > requesting the GPU resource. Each scheduling action will check the root queue > quota and fail. This prevents the GPU driver pods to be placed and the GPU to > be registered by the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2782) Cleanup dead code in cache/context
[ https://issues.apache.org/jira/browse/YUNIKORN-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871821#comment-17871821 ] Wilfred Spiegelenburg commented on YUNIKORN-2782: - thanks [~chia7712] for clarifying what I tried to say :) > Cleanup dead code in cache/context > -- > > Key: YUNIKORN-2782 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2782 > Project: Apache YuniKorn > Issue Type: Task > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Tzu-Hua Lan >Priority: Major > Labels: newbie, pull-request-available > Fix For: 1.6.0 > > > In the cache context we have a number of functions that only get called from > tests. We need to clean up and only use one version: > * RemoveApplication & RemoveApplicationInternal > We should only have RemoveApplication but the internal version is used > everywhere > * UpdateApplication is not used at all -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2678) Fair queue sorting is inconsistent
[ https://issues.apache.org/jira/browse/YUNIKORN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2678: Labels: pull-request-available release-notes (was: pull-request-available) > Fair queue sorting is inconsistent > -- > > Key: YUNIKORN-2678 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2678 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.5.1 > Environment: EKS 1.29 >Reporter: Paul Santa Clara >Assignee: Paul Santa Clara >Priority: Major > Labels: pull-request-available, release-notes > Attachments: Screenshot 2024-08-06 at 5.18.18 PM.png, Screenshot > 2024-08-06 at 5.18.21 PM.png, Screenshot 2024-08-06 at 5.18.30 PM.png, > jira-queues.yaml, jira-tier0-screenshot.png, jira-tier1-screenshot.png, > jira-tier2-screenshot.png, jira-tier3-screenshot.png, > yunikorn-fair-4-tiers-complete.png, yunikorn-fair-4-tiers.png > > > Please see the attached queue configuration(jira-queues.yaml). > I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 > pods in Tier3. Each Pod will require 1 VCore. Initially, there will be 0 > suitable nodes to run the Pods and all will be Pending. Karpenter will soon > provision Nodes and Yunikorn will react by binding the Pods. > Given this > [code|https://github.com/apache/yunikorn-core/blob/a786feb5761be28e802d08976d224c40639cd86b/pkg/scheduler/objects/sorters.go#L81C74-L81C95], > I would expect Yunikorn to distribute the allocations such that each of the > Tier’ed queues reaches its Guarantees. Instead, I observed a roughly even > distribution of allocation across all of the queues. > Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically > overshoots them. > > {code:java} > > kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l > 86 > > kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l > 83 > > kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l > 78 > > kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l > 77 > {code} > Please see attached screen shots for queue usage. > Note, this situation can also be reproduced without the use of Karpenter by > simply setting Yunikorn's `service.schedulingInterval` to a high duration, > say 1m. Doing so will force Yunikorn to react to 400 Pods -across 4 queues- > at roughly the same time forcing prioritization of queue allocations. > Test code to generate Pods: > {code:java} > from kubernetes import client, config > config.load_kube_config() > v1 = client.CoreV1Api() > def create_pod_manifest(tier, exec,): > pod_manifest = { > 'apiVersion': 'v1', > 'kind': 'Pod', > 'metadata': { > 'name': f"rolling-test-tier-{tier}-exec-{exec}", > 'namespace': 'finance', > 'labels': { > 'applicationId': f"MyOwnApplicationId-tier-{tier}", > 'queue': f"root.tiers.{tier}" > }, > "yunikorn.apache.org/user.info": > '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}' > }, > 'spec': { > "affinity": { > "nodeAffinity" : { > "requiredDuringSchedulingIgnoredDuringExecution" : { > "nodeSelectorTerms" : [ > { > "matchExpressions" : [ > { > "key" : "di.rbx.com/dedicated", > "operator" : "In", > "values" : ["spark"] > } > ] > } > ] > } > }, > }, > "tolerations" : [ > { > "effect" : "NoSchedule", > "key": "dedicated", > "operator" : "Equal", > "value" : "spark" > }, > ], > "schedulerName": "yunikorn", > 'restartPolicy': 'Always', > 'containers': [{ > "name": "ubuntu", > 'image': 'ubuntu', > "command": ["sleep", "604800"], > "imagePullPolicy": "IfNotPresent", > "resources" : { > "limits" : { > 'cpu' : "1" > }, > "requests" : { > 'cpu' : "1" > } > } >
[jira] [Updated] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota
[ https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2790: Labels: release-notes (was: ) > GPU node restart could leave root queue always out of quota > --- > > Key: YUNIKORN-2790 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2790 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: release-notes > > On a node restart the pods assigned and running on a node are not checked > against the quota of the queue(s) they run in. This has multiple reasons. > Pods on a node that are scheduled by YuniKorn and already running must not be > rejected. Rejecting pods could cause lots of side effects. > The combination of a node restart and the reconfiguring a GPU driver could > however cause a secondary issue. The node on restart might not expose the GPU > resource yet. Pods that ran before the restart can be using the GPU resource. > After those pods are added, ignoring quotas, the root queue will show a usage > for a resource that has not been registered yet. > This fact prevents all scheduling from progressing. Even for pods not > requesting the GPU resource. Each scheduling action will check the root queue > quota and fail. This prevents the GPU driver pods to be placed and the GPU to > be registered by the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Comment Edited] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota
[ https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871556#comment-17871556 ] Wilfred Spiegelenburg edited comment on YUNIKORN-2790 at 8/7/24 7:28 AM: - Solution is to not check resource types that are not requested by pods when we check for a fit in the queue. This will allow a pod asking for memory and vcores to be scheduled even if the root queue is out of GPU or storage. This should not happen on any other queue but the root queue for node registration delays. It could happen for a different queue in the hierarchy if the quota on a queue has been changed and set to a lower value than the currently running workload. Lower the GPU quota on a queue should still allow memory and vcore only pods to be scheduled. This makes scheduling more resilient for configuration changes and custom resource registration delays. was (Author: wifreds): Solution is to not check resource types that are not requested by pods when we check for a fit in the queue. This will allow a pod asking for memory and vcores to be scheduled even if the root queue is out of GPU or storage. This should not happen on any other queue in the hierarchy unless the quota has been changed to become lower than the currently running workload. This makes scheduling more resilient for configuration changes and custom resource registration delays. > GPU node restart could leave root queue always out of quota > --- > > Key: YUNIKORN-2790 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2790 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > > On a node restart the pods assigned and running on a node are not checked > against the quota of the queue(s) they run in. This has multiple reasons. > Pods on a node that are scheduled by YuniKorn and already running must not be > rejected. Rejecting pods could cause lots of side effects. > The combination of a node restart and the reconfiguring a GPU driver could > however cause a secondary issue. The node on restart might not expose the GPU > resource yet. Pods that ran before the restart can be using the GPU resource. > After those pods are added, ignoring quotas, the root queue will show a usage > for a resource that has not been registered yet. > This fact prevents all scheduling from progressing. Even for pods not > requesting the GPU resource. Each scheduling action will check the root queue > quota and fail. This prevents the GPU driver pods to be placed and the GPU to > be registered by the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota
[ https://issues.apache.org/jira/browse/YUNIKORN-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871556#comment-17871556 ] Wilfred Spiegelenburg commented on YUNIKORN-2790: - Solution is to not check resource types that are not requested by pods when we check for a fit in the queue. This will allow a pod asking for memory and vcores to be scheduled even if the root queue is out of GPU or storage. This should not happen on any other queue in the hierarchy unless the quota has been changed to become lower than the currently running workload. This makes scheduling more resilient for configuration changes and custom resource registration delays. > GPU node restart could leave root queue always out of quota > --- > > Key: YUNIKORN-2790 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2790 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > > On a node restart the pods assigned and running on a node are not checked > against the quota of the queue(s) they run in. This has multiple reasons. > Pods on a node that are scheduled by YuniKorn and already running must not be > rejected. Rejecting pods could cause lots of side effects. > The combination of a node restart and the reconfiguring a GPU driver could > however cause a secondary issue. The node on restart might not expose the GPU > resource yet. Pods that ran before the restart can be using the GPU resource. > After those pods are added, ignoring quotas, the root queue will show a usage > for a resource that has not been registered yet. > This fact prevents all scheduling from progressing. Even for pods not > requesting the GPU resource. Each scheduling action will check the root queue > quota and fail. This prevents the GPU driver pods to be placed and the GPU to > be registered by the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota
Wilfred Spiegelenburg created YUNIKORN-2790: --- Summary: GPU node restart could leave root queue always out of quota Key: YUNIKORN-2790 URL: https://issues.apache.org/jira/browse/YUNIKORN-2790 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg On a node restart the pods assigned and running on a node are not checked against the quota of the queue(s) they run in. This has multiple reasons. Pods on a node that are scheduled by YuniKorn and already running must not be rejected. Rejecting pods could cause lots of side effects. The combination of a node restart and the reconfiguring a GPU driver could however cause a secondary issue. The node on restart might not expose the GPU resource yet. Pods that ran before the restart can be using the GPU resource. After those pods are added, ignoring quotas, the root queue will show a usage for a resource that has not been registered yet. This fact prevents all scheduling from progressing. Even for pods not requesting the GPU resource. Each scheduling action will check the root queue quota and fail. This prevents the GPU driver pods to be placed and the GPU to be registered by the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2789) Queue internalGetMax should not use permissive calculator
Wilfred Spiegelenburg created YUNIKORN-2789: --- Summary: Queue internalGetMax should not use permissive calculator Key: YUNIKORN-2789 URL: https://issues.apache.org/jira/browse/YUNIKORN-2789 Project: Apache YuniKorn Issue Type: Bug Components: core - common Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg We have documented for queue resources that: {quote}Resources that are not specified in the list are not limited, for max resources, or guaranteed in the case of guaranteed resources. {quote} However in the implementation on the queue, internalGetMax, we call resources.ComponentWiseMin(). This returns 0 values for each type that is not defined in the two resources passed in. That does not line up. Example for getting the maximum resources of a queue using GetMaxQueueSet what I would expect based on the documentation: {code:java} parent: max{memory: 100G} parent.child: max{vcore: 100} => result child max{memory: 100G, vcore: 100}{code} currently we get: {code:java} parent: max{memory: 100G} parent.child: max{vcore: 100} => result child max{memory: 0, vcore: 0}{code} Similar when we add the root and call GetMaxResource: {code:java} root: max{memory: 100G, vcore: 200} root.parent: max{vcore: 100} root.parent.child: max{nvidia.com/gpu: 10} => result parent max{memory: 0, vcore: 100} => result child max{memory: 0, vcore: 0, nvidia.com/gpu: 0}{code} The fact that the resource type does not exist, even in the root, should not mean a zero set. The nodes that expose the specific resource might not have been registered or scaled up yet. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2678) Yunikorn does not appear to be considering Guaranteed resources when allocating Pending Pods.
[ https://issues.apache.org/jira/browse/YUNIKORN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871541#comment-17871541 ] Wilfred Spiegelenburg commented on YUNIKORN-2678: - The current calculation is broken and I have gone through the slack discussion. I can see why we would want to use the max resource as a substitute for guaranteed resource. Looking forward to a PR. One point I would already make is that the max used should only rely on the configured values in the hierarchy. The current cluster size must not be taken into account. So root maximum must be ignored when we look at this. Besides that looking at the {{internalGetMax()}} code there is a bug there for which I will file a jira. That will most likely influence this sorting as it revolves around setting 0 values. > Yunikorn does not appear to be considering Guaranteed resources when > allocating Pending Pods. > - > > Key: YUNIKORN-2678 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2678 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.5.1 > Environment: EKS 1.29 >Reporter: Paul Santa Clara >Assignee: Paul Santa Clara >Priority: Major > Attachments: Screenshot 2024-08-06 at 5.18.18 PM.png, Screenshot > 2024-08-06 at 5.18.21 PM.png, Screenshot 2024-08-06 at 5.18.30 PM.png, > jira-queues.yaml, jira-tier0-screenshot.png, jira-tier1-screenshot.png, > jira-tier2-screenshot.png, jira-tier3-screenshot.png > > > Please see the attached queue configuration(jira-queues.yaml). > I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 > pods in Tier3. Each Pod will require 1 VCore. Initially, there will be 0 > suitable nodes to run the Pods and all will be Pending. Karpenter will soon > provision Nodes and Yunikorn will react by binding the Pods. > Given this > [code|https://github.com/apache/yunikorn-core/blob/a786feb5761be28e802d08976d224c40639cd86b/pkg/scheduler/objects/sorters.go#L81C74-L81C95], > I would expect Yunikorn to distribute the allocations such that each of the > Tier’ed queues reaches its Guarantees. Instead, I observed a roughly even > distribution of allocation across all of the queues. > Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically > overshoots them. > > {code:java} > > kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l > 86 > > kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l > 83 > > kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l > 78 > > kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l > 77 > {code} > Please see attached screen shots for queue usage. > Note, this situation can also be reproduced without the use of Karpenter by > simply setting Yunikorn's `service.schedulingInterval` to a high duration, > say 1m. Doing so will force Yunikorn to react to 400 Pods -across 4 queues- > at roughly the same time forcing prioritization of queue allocations. > Test code to generate Pods: > {code:java} > from kubernetes import client, config > config.load_kube_config() > v1 = client.CoreV1Api() > def create_pod_manifest(tier, exec,): > pod_manifest = { > 'apiVersion': 'v1', > 'kind': 'Pod', > 'metadata': { > 'name': f"rolling-test-tier-{tier}-exec-{exec}", > 'namespace': 'finance', > 'labels': { > 'applicationId': f"MyOwnApplicationId-tier-{tier}", > 'queue': f"root.tiers.{tier}" > }, > "yunikorn.apache.org/user.info": > '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}' > }, > 'spec': { > "affinity": { > "nodeAffinity" : { > "requiredDuringSchedulingIgnoredDuringExecution" : { > "nodeSelectorTerms" : [ > { > "matchExpressions" : [ > { > "key" : "di.rbx.com/dedicated", > "operator" : "In", > "values" : ["spark"] > } > ] > } > ] > } > }, > }, > "tolerations" : [ > { > "effect" : "NoSchedule", > "key": "dedicated", > "operator" : "Equal", > "value" : "s
[jira] [Commented] (YUNIKORN-2678) Yunikorn does not appear to be considering Guaranteed resources when allocating Pending Pods.
[ https://issues.apache.org/jira/browse/YUNIKORN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870913#comment-17870913 ] Wilfred Spiegelenburg commented on YUNIKORN-2678: - I have not looked at this code in years. However when I look at the it now I think the issue is in the {{nil || zero}} check when we set the "[used|https://github.com/apache/yunikorn-core/blob/v1.5.2/pkg/common/resources/resources.go#L488]"; value in the shares. That does not take into account that we have a large discrepancy between resource types in absolute values. Resources like memory or storage will always dominate above pods or GPUs. Introducing max in the mix with guarantee will have side effects. I create a queue with max memory set to 1TB, no guaranteed. I create a second queue with max set to 1TB but a guaranteed memory of 100GB. Both queues use 50GB. In that case share of queue 1 will be 0.05, queue 2 will have a share of 0.5 Queue 1 will win and get scheduled until it uses 500GB, with a guaranteed of 0. Queue 1 should not have a smaller share than queue 2 until all guaranteed is used. That looks as broken as what we have now. I could see two options: # setting a fixed share value if not specified in guaranteed # not adding anything to the shares unless set in guaranteed Both options above will fix that same issue. I think option 2 above is the better solution. We want to schedule on guaranteed setting. We need to test if that still distributes fairly between the queues when one queue has a usage over its guaranteed compared to a second queue with no guaranteed. If we really want to have a policy for "least used queue" we can build one based on the maximum resource and the usage. The other option which would be nice to have would be adding a configurable resource weights option like we have in the node sorting. That would be a new feature... > Yunikorn does not appear to be considering Guaranteed resources when > allocating Pending Pods. > - > > Key: YUNIKORN-2678 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2678 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.5.1 > Environment: EKS 1.29 >Reporter: Paul Santa Clara >Assignee: Paul Santa Clara >Priority: Major > Attachments: jira-queues.yaml, jira-tier0-screenshot.png, > jira-tier1-screenshot.png, jira-tier2-screenshot.png, > jira-tier3-screenshot.png > > > Please see the attached queue configuration(jira-queues.yaml). > I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 > pods in Tier3. Each Pod will require 1 VCore. Initially, there will be 0 > suitable nodes to run the Pods and all will be Pending. Karpenter will soon > provision Nodes and Yunikorn will react by binding the Pods. > Given this > [code|https://github.com/apache/yunikorn-core/blob/a786feb5761be28e802d08976d224c40639cd86b/pkg/scheduler/objects/sorters.go#L81C74-L81C95], > I would expect Yunikorn to distribute the allocations such that each of the > Tier’ed queues reaches its Guarantees. Instead, I observed a roughly even > distribution of allocation across all of the queues. > Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically > overshoots them. > > {code:java} > > kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l > 86 > > kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l > 83 > > kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l > 78 > > kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l > 77 > {code} > Please see attached screen shots for queue usage. > Note, this situation can also be reproduced without the use of Karpenter by > simply setting Yunikorn's `service.schedulingInterval` to a high duration, > say 1m. Doing so will force Yunikorn to react to 400 Pods -across 4 queues- > at roughly the same time forcing prioritization of queue allocations. > Test code to generate Pods: > {code:java} > from kubernetes import client, config > config.load_kube_config() > v1 = client.CoreV1Api() > def create_pod_manifest(tier, exec,): > pod_manifest = { > 'apiVersion': 'v1', > 'kind': 'Pod', > 'metadata': { > 'name': f"rolling-test-tier-{tier}-exec-{exec}", > 'namespace': 'finance', > 'labels': { > 'applicationId': f"MyOwnApplicationId-tier-{tier}", > 'queue': f"root.tiers.{tier}" > }, > "yunikorn.apache.org/user.info": > '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}' >
[jira] [Updated] (YUNIKORN-2281) Support OIDC credentials in YuniKorn
[ https://issues.apache.org/jira/browse/YUNIKORN-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2281: Labels: release-notes (was: ) > Support OIDC credentials in YuniKorn > > > Key: YUNIKORN-2281 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2281 > Project: Apache YuniKorn > Issue Type: New Feature >Reporter: Dmitry >Assignee: Manikandan R >Priority: Major > Labels: release-notes > > Currently only alphanumeric chars are allowed in usernames. We're using > CiLogon OIDC users, in the form of "http://cilogon.org/serverA/users/123456";, > which is denied in configuration by the admission controller: > > error: configmaps "yunikorn-configs" could not be patched: admission > > webhook "admission-webhook.yunikorn.validate-conf" denied the request: > > invalid limit user name 'http://cilogon.org/serverA/users/123456' in limit > > definition > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2646) Deadlock detected during preemption
[ https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870395#comment-17870395 ] Wilfred Spiegelenburg commented on YUNIKORN-2646: - This is not the cause, it cannot be. YuniKorn 1.5.2 has a deadlock fix as per YUNIKORN-2629. If there is a lock up left we should see it for others too, specially when you say it happens really often. We have not got the evidence that confirms this, we cannot fix or change without understanding what is broken. We need logs or a reproduction that shows the issue. When you get to the "stuck" state collect the details and open a *_new_* jira: * scheduler logs * state dump via /ws/v1/fullstatedump * pprof output of /debug/pprof/goroutine?debug=2 If it really is a deadlock in the code the state dump will most likely fail. Logs and pprof never fail so we should have a full routine dump. You can even collect two in a row to . > Deadlock detected during preemption > --- > > Key: YUNIKORN-2646 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2646 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Dmitry >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > Attachments: yunikorn-logs-lock.txt.gz, yunikorn-logs.txt.gz > > > Hitting deadlocks in 1.5.1 > The log is attached -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Comment Edited] (YUNIKORN-2646) Deadlock detected during preemption
[ https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870352#comment-17870352 ] Wilfred Spiegelenburg edited comment on YUNIKORN-2646 at 8/2/24 4:29 AM: - It is a false positive detection. The code explicitly prevents the case from happening. See [this comment|https://issues.apache.org/jira/browse/YUNIKORN-2646?focusedCommentId=17850240&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17850240] here again worded slightly differently: The detector is not smart enough to understand this part of the logic and just sees the order: # FIRST: Application A lock taken followed by Application B. # SECOND: Application B lock is taken followed by Application A. That triggers the detection. The fact this sequence is only possible because we have a guarantee in our code that between FIRST and SECOND all locks are released without exception cannot be expressed in rules. BTW: running with deadlock detection in production is a really bad idea. It causes a lot of overhead. was (Author: wifreds): It is a false positive detection. The code explicitly prevents the case from happening. See this comment here again worded slightly differently: The detector is not smart enough to understand this part of the logic and just sees the order: # FIRST: Application A lock taken followed by Application B. # SECOND: Application B lock is taken followed by Application A. That triggers the detection. The fact this sequence is only possible because we have a guarantee in our code that between FIRST and SECOND all locks are released without exception cannot be expressed in rules. BTW: running with deadlock detection in production is a really bad idea. It causes a lot of overhead. > Deadlock detected during preemption > --- > > Key: YUNIKORN-2646 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2646 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Dmitry >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > Attachments: yunikorn-logs-lock.txt.gz, yunikorn-logs.txt.gz > > > Hitting deadlocks in 1.5.1 > The log is attached -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Comment Edited] (YUNIKORN-2646) Deadlock detected during preemption
[ https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870352#comment-17870352 ] Wilfred Spiegelenburg edited comment on YUNIKORN-2646 at 8/2/24 4:26 AM: - It is a false positive detection. The code explicitly prevents the case from happening. See this comment here again worded slightly differently: The detector is not smart enough to understand this part of the logic and just sees the order: # FIRST: Application A lock taken followed by Application B. # SECOND: Application B lock is taken followed by Application A. That triggers the detection. The fact this sequence is only possible because we have a guarantee in our code that between FIRST and SECOND all locks are released without exception cannot be expressed in rules. BTW: running with deadlock detection in production is a really bad idea. It causes a lot of overhead. was (Author: wifreds): It is a false positive detection. The code explicitly prevents the case from happening. See this comment here again worded slightly differently: The detector is not smart enough to understand this part of the logic and just sees the order: # FIRST: Application A lock taken followed by Application B. # SECOND: Application B lock is taken followed by Application A. That triggers the detection. The fact this sequence is only possible because we have a guarantee in our code that between FIRST and SECOND all locks are released without exception cannot be expressed in rules. BTW: running with deadlock detection in production is a really bad idea. It causes a lot of overhead. > Deadlock detected during preemption > --- > > Key: YUNIKORN-2646 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2646 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Dmitry >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > Attachments: yunikorn-logs-lock.txt.gz, yunikorn-logs.txt.gz > > > Hitting deadlocks in 1.5.1 > The log is attached -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2646) Deadlock detected during preemption
[ https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870352#comment-17870352 ] Wilfred Spiegelenburg commented on YUNIKORN-2646: - It is a false positive detection. The code explicitly prevents the case from happening. See this comment here again worded slightly differently: The detector is not smart enough to understand this part of the logic and just sees the order: # FIRST: Application A lock taken followed by Application B. # SECOND: Application B lock is taken followed by Application A. That triggers the detection. The fact this sequence is only possible because we have a guarantee in our code that between FIRST and SECOND all locks are released without exception cannot be expressed in rules. BTW: running with deadlock detection in production is a really bad idea. It causes a lot of overhead. > Deadlock detected during preemption > --- > > Key: YUNIKORN-2646 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2646 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Dmitry >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > Attachments: yunikorn-logs-lock.txt.gz, yunikorn-logs.txt.gz > > > Hitting deadlocks in 1.5.1 > The log is attached -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2782) Cleanup dead code in cache/context
Wilfred Spiegelenburg created YUNIKORN-2782: --- Summary: Cleanup dead code in cache/context Key: YUNIKORN-2782 URL: https://issues.apache.org/jira/browse/YUNIKORN-2782 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg In the cache context we have a number of functions that only get called from tests. We need to clean up and only use one version: * RemoveApplication & RemoveApplicationInternal We should only have RemoveApplication but the internal version is used everywhere * UpdateApplication is not used at all -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2709) Update website for 1.5.2
[ https://issues.apache.org/jira/browse/YUNIKORN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2709. - Fix Version/s: 1.5.2 Resolution: Fixed release is done > Update website for 1.5.2 > > > Key: YUNIKORN-2709 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2709 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2708) Release notes for 1.5.2
[ https://issues.apache.org/jira/browse/YUNIKORN-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2708. - Fix Version/s: 1.5.2 Resolution: Fixed release is done > Release notes for 1.5.2 > --- > > Key: YUNIKORN-2708 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2708 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available, release > Fix For: 1.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2764) Consider to log explicit placeholder release reason to originator pod
[ https://issues.apache.org/jira/browse/YUNIKORN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868536#comment-17868536 ] Wilfred Spiegelenburg commented on YUNIKORN-2764: - This requires a change in the event processing in the k8shim. The current event does not allow us to add the message. The way the events are created uses a fixed list of values. So while the core sends the detail the app event does not have the option to add this. We need to have a good look at these app and task events and states in the next release as most are not really used. > Consider to log explicit placeholder release reason to originator pod > - > > Key: YUNIKORN-2764 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2764 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Yu-Lin Chen >Priority: Major > Attachments: image-2024-07-19-21-48-54-829.png > > > When placeholders allocation are released with terminationType > `si.TerminationType_TIMEOUT`. The reason could be one of the following: > # "releasing allocated placeholders on placeholder timeout" > ([Link-1|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L434]) > > ([Link-2|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L456]) > # "releasing placeholders on app complete" > ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L360]) > # “cancel placeholder: resource incompatible” > ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L1148]) > Those reasons are encapsulated in > *si.AllocationResponse([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/context.go#L901]) > and passes to shim. However, the shim doesn’t expose them, it simply logs an > event to the originator pod with a generic reason > ([Link|https://github.com/apache/yunikorn-k8shim/blob/f2819084f8720aa0eec8e1f41a886413b22d93b2/pkg/cache/application.go#L695-L696]): > * Type: Warning > * Reason: GangScheduling > * Message: Application XX placeholder has been timed out > We could consider to expose the true reason to originator pod. Ex: (In > originator pod.) > * Type: Warning > * Reason: GangScheduling > * Message: placeholder xxx has been released. (reason: cancel placeholder: > resource incompatible) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2764) Consider to log explicit placeholder release reason to originator pod
[ https://issues.apache.org/jira/browse/YUNIKORN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2764: Target Version: 1.7.0 > Consider to log explicit placeholder release reason to originator pod > - > > Key: YUNIKORN-2764 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2764 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Yu-Lin Chen >Priority: Major > Attachments: image-2024-07-19-21-48-54-829.png > > > When placeholders allocation are released with terminationType > `si.TerminationType_TIMEOUT`. The reason could be one of the following: > # "releasing allocated placeholders on placeholder timeout" > ([Link-1|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L434]) > > ([Link-2|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L456]) > # "releasing placeholders on app complete" > ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L360]) > # “cancel placeholder: resource incompatible” > ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L1148]) > Those reasons are encapsulated in > *si.AllocationResponse([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/context.go#L901]) > and passes to shim. However, the shim doesn’t expose them, it simply logs an > event to the originator pod with a generic reason > ([Link|https://github.com/apache/yunikorn-k8shim/blob/f2819084f8720aa0eec8e1f41a886413b22d93b2/pkg/cache/application.go#L695-L696]): > * Type: Warning > * Reason: GangScheduling > * Message: Application XX placeholder has been timed out > We could consider to expose the true reason to originator pod. Ex: (In > originator pod.) > * Type: Warning > * Reason: GangScheduling > * Message: placeholder xxx has been released. (reason: cancel placeholder: > resource incompatible) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2687) Placeholder Timeout and Replacement Failure in Gang Scheduling
[ https://issues.apache.org/jira/browse/YUNIKORN-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865844#comment-17865844 ] Wilfred Spiegelenburg commented on YUNIKORN-2687: - I agree with [~blue.tzuhua] this is normal gang scheduling. Using YuniKorn 1.3 and K8s 1.21, you are working with really old releases. The K8s version is out of support and since 1.3 we have fixed numerous jiras. Analysis: Looks like K8s never came back with a response that the placeholder for the driver was released when we tried to: {code:java} 2024-06-20T17:48:27.093ZINFOobjects/application.go:668 ask added successfully to application{"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask": "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "placeholder": false, "pendingDelta": "map[memory:1975517184 pods:1 vcore:1000]"} ... 2024-06-20T17:48:27.093ZINFOscheduler/partition.go:828 scheduler replace placeholder processed{"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey": "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "uuid": "a53e9cbb-931d-4d1d-95e5-8e2425ba95be", "placeholder released uuid": "bdb020e8-708c-4eb7-b48c-fba16155941c"} ... 2024-06-20T17:48:27.094ZINFOcache/application.go:637try to release pod from application{"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID": "bdb020e8-708c-4eb7-b48c-fba16155941c", "terminationType": "PLACEHOLDER_REPLACED"} {code} That means we just wait for that to happen. We cannot do more than that. Looks like you had an issue on the K8s side... > Placeholder Timeout and Replacement Failure in Gang Scheduling > -- > > Key: YUNIKORN-2687 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2687 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: huangzhir >Assignee: Tzu-Hua Lan >Priority: Blocker > > h1. *Description:* > When using gang scheduling with YuniKorn, the driver pod encounters a > placeholder timeout, leading to a failure in replacement. The pod shows a > pending status for approximately 60 seconds. > h2. *Observed Behavior:* > * The driver pod ({{{}spark-pi-d86d1d9036b8e8e9-driver{}}}) is queued and > waiting for allocation. > * The pod belongs to the {{spark-driver}} task group and is scheduled as a > gang member. > * A warning indicating "Placeholder timed out" is logged, and the > placeholder is not replaced successfully. > * The pod is eventually assigned and bound to a node, and the task completes. > * There is a 60-second pending period observed for the driver pod. > h2. *Pod Status:* > {code:java} > kubectl get pod -n spark > NAME READY STATUS > RESTARTS AGE > spark-pi-6d2eea9036f9c838-driver 0/1 Pending 0 > 61s > tg-spark-driver-spark-b459ba53c0654abe8fe6c7-0 1/1 Terminating 0 > 60s > tg-spark-executor-spark-b459ba53c0654abe8fe6c7-0 1/1 Running 0 > 60s > kubectl describe pod spark-pi-6d2eea9036f9c838-driver -n spark > .. > Type Reason AgeFrom Message > -- --- > Normal Scheduling 2m52s yunikorn > spark/spark-pi-d86d1d9036b8e8e9-driver is queued and waiting for allocation > Normal GangScheduling 2m52s yunikorn Pod belongs to the > taskGroup spark-driver, it will be scheduled as a gang member > Warning Placeholder timed out 113s yunikorn Application > spark-37606583a9174b1886d039c353fe5be5 placeholder has been timed out > Normal Scheduled 100s yunikorn Successfully assigned > spark/spark-pi-d86d1d9036b8e8e9-driver to node 10.10.10.66 > Normal PodBindSuccessful 100s yunikorn Pod > spark/spark-pi-d86d1d9036b8e8e9-driver is successfully bound to node > 10.10.10.66 > Normal TaskCompleted 50syunikorn Task > spark/spark-pi-d86d1d9036b8e8e9-driver is completed > Normal Pulled 99skubelet Container image > "apache/spark:v3.3.2" already present on machine > Normal Created99skubelet Created container > spark-kubernetes-driver > Normal Started99skubelet Started container > spark-kubernetes-driver{code} > h2. *Scheduler Logs:* > {code:java} > 2024-06-20T17:49:26.093ZINFOobjects/application.go:440 > Placeholder timeout, releasing placeholders{"AppID": > "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1, > "releasing placeholders": 1} > 2024-06-20T17:49:26.093ZDEBUGrmproxy/rmproxy.go:59 > enq
[jira] [Comment Edited] (YUNIKORN-2262) propagate the error message when queue creation gets failed
[ https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865832#comment-17865832 ] Wilfred Spiegelenburg edited comment on YUNIKORN-2262 at 7/15/24 2:30 AM: -- The implementation in fmt is smarter than I expected. It works because {{fmt.Errorf()}} under the hood does what Join does and creates an object that implements Unwrap etc. It does so by interpreting the format string and when it sees a %w it takes the arg and places it in an array. Adding format string scanning overhead etc. So why not make it explicit and make the code more performant and readable. Unless you need a combination of multiple format directives using {{fmt.Errorf}} will be slower than just {{errors.Join()}} was (Author: wifreds): That works because {{fmt.Errorf()}} under the hood does what the join does and creates an object that implements Unwrap etc. It does so by interpreting the format string and when it sees a %w it takes the arg and places it in an array. Adding format string scanning overhead etc. So why not make it explicit and make the code more performant and readable. Unless you need a combination of multiple format directives using {{fmt.Errorf}} will be slower than just {{errors.Join()}} > propagate the error message when queue creation gets failed > --- > > Key: YUNIKORN-2262 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2262 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Minor > > [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334] > the error message of root cause is swallowed, so it is hard to be inspired by > the common message "failed to create rule based queue ..." > BTW, the error I met is the parent queue "is already a leaf". The error > message is helpful and it makes us catch up the root cause easily. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2262) propagate the error message when queue creation gets failed
[ https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865832#comment-17865832 ] Wilfred Spiegelenburg commented on YUNIKORN-2262: - That works because {{fmt.Errorf()}} under the hood does what the join does and creates an object that implements Unwrap etc. It does so by interpreting the format string and when it sees a %w it takes the arg and places it in an array. Adding format string scanning overhead etc. So why not make it explicit and make the code more performant and readable. Unless you need a combination of multiple format directives using {{fmt.Errorf}} will be slower than just {{errors.Join()}} > propagate the error message when queue creation gets failed > --- > > Key: YUNIKORN-2262 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2262 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Minor > > [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334] > the error message of root cause is swallowed, so it is hard to be inspired by > the common message "failed to create rule based queue ..." > BTW, the error I met is the parent queue "is already a leaf". The error > message is helpful and it makes us catch up the root cause easily. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2757) Consider adding new field `resolvedMaxResource` to queue dao to show the true limit
[ https://issues.apache.org/jira/browse/YUNIKORN-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865830#comment-17865830 ] Wilfred Spiegelenburg commented on YUNIKORN-2757: - Not sure this adds anything. Max resources is either set or not. Max resources of a child can never be more than what is set on the parent. Without taking usage into account it does not really matter what max is and where it came from. Lets take an example: I have a hierarchy with 4 levels, level 1 being the root and level 4 the leaf. A max is set at level 2 only. The root has a dynamic maximum based on the cluster size. There is no way of interpreting a maximum at the leaf level 4. What is the impact of that maximum if I do not know what my queue structure looks like? Do I have 1 or 10 queues at level 3? How many children has each level 3 queue? Are there any maximums sets at level 3? Are there any siblings under the parent queue of the level 4 queue I am looking at? Do sibling queues of my level 4 queue have maximums set or not... The only time the resolved maximum from the parent would come into play is when a gang is submitted. We reject the application is it is larger than this resolved maximum. If that rejections is not clear enough we can improve that. For scheduling the resolved maximum is also irrelevant. We use the headroom of a queue to decide if the allocation fits. Current usage linked to maximums gives the correct picture. That only works at the specific queue level. > Consider adding new field `resolvedMaxResource` to queue dao to show the true > limit > --- > > Key: YUNIKORN-2757 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2757 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Priority: Major > > The true max resources of queue is based on all parents. It could be hard to > see/understand the true "max resources" of queue by human eyes if there is a > huge queue trees. > Hence, it would be nice to add the "resolved" max resources to restful APIs. > Also, our UI can leverages the field to help users to understand which max > resource will be used by this queue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2323) Gang scheduling user experience issues
[ https://issues.apache.org/jira/browse/YUNIKORN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865361#comment-17865361 ] Wilfred Spiegelenburg commented on YUNIKORN-2323: - The issue #2 as mentioned has been fixed via [github PR #876|https://github.com/apache/yunikorn-k8shim/pull/876] We now send additional events for gang scheduling covering: * placeholder timeout (resuming state) * placeholder creation * placeholder create failure(s) I think with that we can close this jira. > Gang scheduling user experience issues > -- > > Key: YUNIKORN-2323 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2323 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Affects Versions: 1.4.0 >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > > In case of any issues, users are finding it bit difficult to understand what > is going on with the gang app. > Issue 1: > "driver pod is getting struck" > At times, when driver pod is not able to run successfully for some reasons, > users are getting the perspective that pod is getting struck and app is > hanged, not moving further. Users are waiting for some time and don't > understand the clear picture. How do we close the gap quickly and communicate > accordingly through events? > Issue 2: > ResumeApplication is fired when all ph's are timed out. Do we need to inform > the users about this event as they may not clue any about this significant > change? > Issue 3: > When Gang app ph's are in progress (and allocated), when there is request for > real asks and there is resource crunch, do we need to trigger auto scaling? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2738) Only check failure reason once not for every pod
[ https://issues.apache.org/jira/browse/YUNIKORN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2738. - Fix Version/s: 1.6.0 Resolution: Fixed > Only check failure reason once not for every pod > > > Key: YUNIKORN-2738 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2738 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > The reason for an application failure does not change and can be > pre-calculated for all pods when a failure is handled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2262) propagate the error message when queue creation gets failed
[ https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865345#comment-17865345 ] Wilfred Spiegelenburg commented on YUNIKORN-2262: - The join is a more generic way to handle it. It makes it really easy to use a predefined error like "parent rule returned a leaf" or "queue not found" etc and test for them. In other projects I have used similar constructs in test and production code. This is an example to check if the exit of a http server was a crash or a normal shutdown: {code:java} if httpError != nil && !errors.Is(httpError, http.ErrServerClosed) { log.Logger().Errorw("Failed to start web server", "error", httpError) } {code} If you use the %w you would need to use string contains etc for these kinds of checks. Really fragile, the slightest change can break that. Using the join makes code readable and if we use predefined errors it will not break. It also make checks in tests simple: did we really fail for the right reason or did someone break the code and we failed unexpectedly > propagate the error message when queue creation gets failed > --- > > Key: YUNIKORN-2262 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2262 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Minor > > [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334] > the error message of root cause is swallowed, so it is hard to be inspired by > the common message "failed to create rule based queue ..." > BTW, the error I met is the parent queue "is already a leaf". The error > message is helpful and it makes us catch up the root cause easily. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2262) propagate the error message when queue creation gets failed
[ https://issues.apache.org/jira/browse/YUNIKORN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865303#comment-17865303 ] Wilfred Spiegelenburg commented on YUNIKORN-2262: - Please use [error joining|https://pkg.go.dev/errors#Join] for this to show we have wrapped the error. It will make testing etc easier than re-writing using %w > propagate the error message when queue creation gets failed > --- > > Key: YUNIKORN-2262 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2262 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chenchen Lai >Priority: Minor > > [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/partition.go#L334] > the error message of root cause is swallowed, so it is hard to be inspired by > the common message "failed to create rule based queue ..." > BTW, the error I met is the parent queue "is already a leaf". The error > message is helpful and it makes us catch up the root cause easily. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2743) Core: Remove TODO regarding time out waiting for draining and removal
[ https://issues.apache.org/jira/browse/YUNIKORN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865301#comment-17865301 ] Wilfred Spiegelenburg commented on YUNIKORN-2743: - yes and with the proper implementation of YUNIKORN-2688 which should prevent new workloads from being added and it will all work. > Core: Remove TODO regarding time out waiting for draining and removal > - > > Key: YUNIKORN-2743 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2743 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chenchen Lai >Priority: Minor > Labels: newbie > > for remove //TODO comment > in pkg/scheduler/partition_manager.go > [https://github.com/apache/yunikorn-core/blob/f82113c1cac5ff40d424413e7c100f55261ece01/pkg/scheduler/partition_manager.go#L126] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2735) YuniKorn doesn't schedule correctly after some pods were marked as Unschedulable
[ https://issues.apache.org/jira/browse/YUNIKORN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865271#comment-17865271 ] Wilfred Spiegelenburg commented on YUNIKORN-2735: - Reservation is not an under the hood optimisation it prevents starvation for large(r) allocations or allocations with specific resource requests that fit only on specific nodes. If you turn it off you will see a large impact in real world scenarios. Disabling reservation will cause large allocations to be starved in busy clusters and we should never do that. The variable for turning it off was introduced during the development of the code and should have been removed. Surprised that it has survived this long. We have/had a TODO in the code to make this configurable. Currently it is fixed to 2 seconds. It should be a reloadable configuration value. I would also argue that the current 2 seconds is too quick and 30 seconds would allow us to be a bit more eager. I would propose the following setup: * configuration name: service.ReservationDelay * granularity: seconds * default: 30 seconds * minimum: 2 seconds (allow current behaviour) * maximum: 3600 seconds (prevent starvation and turning off reservations) * reloadable: true * notes: ** old reservations are not re-evaluated when the value is changed ** settings outside the minimum..maximum range will use the default ** when reloading the value is not changed if outside the range > YuniKorn doesn't schedule correctly after some pods were marked as > Unschedulable > > > Key: YUNIKORN-2735 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2735 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Volodymyr Kot >Priority: Major > Attachments: bug-logs, driver.yml, executor.yml, nodestate, podstate > > > It is a bit of an edge case, but I can consistently reproduce this on master > - see steps and comments used below: > # Create a new cluster with kind, with 4 cpus/8Gb of memory > # Deploy YuniKorn using helm > # Set up service account for Spark > ## "kubectl create serviceaccount spark" > ## "kubectl create clusterrolebinding spark-role --clusterrole=edit > --serviceaccount=default:spark --namespace=default" > # Run kubectl proxy" to be able to run spark-submit > # Create Spark application* 1 with driver and 2 executors - fits fully, > placeholders are created and replaced > # Create Spark application 2 with driver and 2 executors - only one executor > placeholder is scheduled, rest of the pods are marked Unschedulable > # Delete one of the executors from application 1 > # Spark driver re-creates the executor, it is marked as unschedulable > > At that point scheduler is "stuck", and won't schedule either executor from > application 1 OR placeholder for executor from application 2 - it deems both > of those unschedulable. See logs below, and please let me know if I > misunderstood something/it is expected behavior! > > *Script used to run spark-submit: > {code:java} > ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 > --deploy-mode cluster --name spark-pi \ > --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi > \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=2 \ > --conf spark.kubernetes.executor.request.cores=0.5 \ > --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf spark.kubernetes.driver.podTemplateFile=./driver.yml \ > --conf spark.kubernetes.executor.podTemplateFile=./executor.yml \ > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0.jar 3 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2735) YuniKorn doesn't schedule correctly after some pods were marked as Unschedulable
[ https://issues.apache.org/jira/browse/YUNIKORN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865266#comment-17865266 ] Wilfred Spiegelenburg commented on YUNIKORN-2735: - {quote}At that point scheduler is "stuck", and won't schedule either executor from application 1 OR placeholder for executor from application 2 - it deems both of those unschedulable. See logs below, and please let me know if I misunderstood something/it is expected behavior! {quote} It is expected behaviour. The scheduler is not stuck. This will resolve itself. First spark application 2: as this is gang scheduling the placeholders will time out (15 min by default). If not all of the placeholders were allocated at the point of the timeout a cleanup will be triggered. This removes all placeholder pods for the system. Depending on the gang style, hard or soft, we either fail the application or release the driver pod for scheduling. At that point you are unblocked. Application 1 pods will get scheduled based on the availability of resources. When the placeholder pod(s) time out the existing pending pods will be scheduled. At that point the normal sorting rules apply. This _could_ mean that the re-submitted executor pod gets scheduled or some other pod that was waiting. Gang scheduling allows you to reserve resources but it does not guarantee them after replacement. If you kill the executor pod and it gets restarted it is just another pod on the cluster that needs to be scheduled. It will thus depend on your config (FIFO, priority, pod definition etc) how and when that scheduling will happen. The newly started executor is really a new pod from the K8s view, different submit time etc. If you have FIFO configured it will end up in the back of the scheduling queue. Gang scheduling with the soft style will also not prevent starving a cluster of resources. You could have the case that the total gang request is too large to fit into the free space on a busy cluster. First triggering reservations blocking resources for other applications. Then after the timeout you could slowly fill your cluster with driver pods that do not get what they want and thus only slowly progress or not progress at all. The only option you have for that is limit the number of applications you allow to run in a queue (MaxApplications). This case can easily happen in any size cluster. None of these are real scheduler issues, they are cluster management issues. You cannot expect the scheduler to understand the workload you put on a cluster and magically adjust. > YuniKorn doesn't schedule correctly after some pods were marked as > Unschedulable > > > Key: YUNIKORN-2735 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2735 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Volodymyr Kot >Priority: Major > Attachments: bug-logs, driver.yml, executor.yml, nodestate, podstate > > > It is a bit of an edge case, but I can consistently reproduce this on master > - see steps and comments used below: > # Create a new cluster with kind, with 4 cpus/8Gb of memory > # Deploy YuniKorn using helm > # Set up service account for Spark > ## "kubectl create serviceaccount spark" > ## "kubectl create clusterrolebinding spark-role --clusterrole=edit > --serviceaccount=default:spark --namespace=default" > # Run kubectl proxy" to be able to run spark-submit > # Create Spark application* 1 with driver and 2 executors - fits fully, > placeholders are created and replaced > # Create Spark application 2 with driver and 2 executors - only one executor > placeholder is scheduled, rest of the pods are marked Unschedulable > # Delete one of the executors from application 1 > # Spark driver re-creates the executor, it is marked as unschedulable > > At that point scheduler is "stuck", and won't schedule either executor from > application 1 OR placeholder for executor from application 2 - it deems both > of those unschedulable. See logs below, and please let me know if I > misunderstood something/it is expected behavior! > > *Script used to run spark-submit: > {code:java} > ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 > --deploy-mode cluster --name spark-pi \ > --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi > \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=2 \ > --conf spark.kubernetes.executor.request.cores=0.5 \ > --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf spark.kubernetes.driver.podTemplateFile=./driver.yml \ > --conf spark.kubernetes.executor.podTemplateF
[jira] [Created] (YUNIKORN-2738) Only check failure reason once not for every pod
Wilfred Spiegelenburg created YUNIKORN-2738: --- Summary: Only check failure reason once not for every pod Key: YUNIKORN-2738 URL: https://issues.apache.org/jira/browse/YUNIKORN-2738 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The reason for an application failure does not change and can be pre-calculated for all pods when a failure is handled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2737) Cleanup handleFailApplicationEvent handling
Wilfred Spiegelenburg created YUNIKORN-2737: --- Summary: Cleanup handleFailApplicationEvent handling Key: YUNIKORN-2737 URL: https://issues.apache.org/jira/browse/YUNIKORN-2737 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg When we handle a failed application in the shim in {{handleFailApplicationEvent()}} we call the placeholder cleanup. Three issues: * The cleanup needs the app lock after it takes the mgr lock. The app lock is already held when we process the event. Should be placing the cleanup last to not hold the manager lock for longer than needed * failing an application is triggered by the core which should do the cleanup already so this might be redundant to start with. * The failure handling also marks unassigned pods as failed which means there is an overlap between the failure handling and the placeholder cleanup which we should remove. Either ignore all placeholders in the failure or only cleanup assigned placeholders. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2734. - Fix Version/s: 1.6.0 Resolution: Delivered The TODO was removed as part of the changes in YUNIKORN-2729. Since we do not want to make this configurable that is all we need, closing again with a link to the Jira that has the change. > make configurable for pods in k8shim pkg/client/kubeclient.go > - > > Key: YUNIKORN-2734 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2734 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Huang Guan Hao >Priority: Trivial > Labels: newbie > Fix For: 1.6.0 > > > for remove //TODO comment > in pkg/client/kubeclient.go > https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141 > Make the grace period for pod deletion configurable. > Currently, the grace period for deleting pods is hardcoded to 3 seconds. > This might not be suitable for all use cases, as some pods might require more > time to gracefully shut down. In the future, this value should be made > configurable, either through a function parameter, configuration file, or > environment variable, to provide more flexibility and accommodate different > scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864848#comment-17864848 ] Wilfred Spiegelenburg commented on YUNIKORN-2734: - Can we just remove the TODO via this Jira or are we going to handle it as part of another jira? I might not have been clear in my earlier comment: * it should not be configurable * the TODO must be removed > make configurable for pods in k8shim pkg/client/kubeclient.go > - > > Key: YUNIKORN-2734 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2734 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Huang Guan Hao >Priority: Minor > > for remove //TODO comment > in pkg/client/kubeclient.go > https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141 > Make the grace period for pod deletion configurable. > Currently, the grace period for deleting pods is hardcoded to 3 seconds. > This might not be suitable for all use cases, as some pods might require more > time to gracefully shut down. In the future, this value should be made > configurable, either through a function parameter, configuration file, or > environment variable, to provide more flexibility and accommodate different > scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2734: Labels: newbie (was: ) > make configurable for pods in k8shim pkg/client/kubeclient.go > - > > Key: YUNIKORN-2734 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2734 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Huang Guan Hao >Priority: Trivial > Labels: newbie > > for remove //TODO comment > in pkg/client/kubeclient.go > https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141 > Make the grace period for pod deletion configurable. > Currently, the grace period for deleting pods is hardcoded to 3 seconds. > This might not be suitable for all use cases, as some pods might require more > time to gracefully shut down. In the future, this value should be made > configurable, either through a function parameter, configuration file, or > environment variable, to provide more flexibility and accommodate different > scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2734: Priority: Trivial (was: Minor) > make configurable for pods in k8shim pkg/client/kubeclient.go > - > > Key: YUNIKORN-2734 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2734 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Huang Guan Hao >Priority: Trivial > > for remove //TODO comment > in pkg/client/kubeclient.go > https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141 > Make the grace period for pod deletion configurable. > Currently, the grace period for deleting pods is hardcoded to 3 seconds. > This might not be suitable for all use cases, as some pods might require more > time to gracefully shut down. In the future, this value should be made > configurable, either through a function parameter, configuration file, or > environment variable, to provide more flexibility and accommodate different > scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2703) Scheduler does not honor default queue setting from the ConfigMap
[ https://issues.apache.org/jira/browse/YUNIKORN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2703: Target Version: 1.6.0, 1.5.2 (was: 1.6.0) > Scheduler does not honor default queue setting from the ConfigMap > - > > Key: YUNIKORN-2703 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2703 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Mit Desai >Assignee: Mit Desai >Priority: Major > Labels: pull-request-available > > YUNIKORN-1650 added an override for default queue name in the config map to > solve for the scenario where the provided placement rule is evaluated before > other rules. > Scheduler also adds a default queue if the pod labels or annotations does not > define a queue name. Because this happens before the placement rules are > evaluated, we end up in the same situation of applications getting placed in > the default queue and ignoring all other placement rules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Assigned] (YUNIKORN-2652) Expand getApplication() endpoint handler to optionally return resource usage
[ https://issues.apache.org/jira/browse/YUNIKORN-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg reassigned YUNIKORN-2652: --- Assignee: Rich Scott > Expand getApplication() endpoint handler to optionally return resource usage > > > Key: YUNIKORN-2652 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2652 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common >Reporter: Rich Scott >Assignee: Rich Scott >Priority: Major > Labels: pull-request-available > > Some users would like to be able to see resource usage (preempted, > placeholder resource, etc) for applications that have been completed. The > `getApplication()` endpoint handler should be enhanced to take an optional > parameter specifying that the user would like details about resources > included in the response, and a new `ApplicationXXXDAOInfo` object that is a > slight superset of `ApplicationDAOInfo` should be introduced, and can be used > in the response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2652) Expand getApplication() endpoint handler to optionally return resource usage
[ https://issues.apache.org/jira/browse/YUNIKORN-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2652: Target Version: 1.6.0 > Expand getApplication() endpoint handler to optionally return resource usage > > > Key: YUNIKORN-2652 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2652 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common >Reporter: Rich Scott >Priority: Major > Labels: pull-request-available > > Some users would like to be able to see resource usage (preempted, > placeholder resource, etc) for applications that have been completed. The > `getApplication()` endpoint handler should be enhanced to take an optional > parameter specifying that the user would like details about resources > included in the response, and a new `ApplicationXXXDAOInfo` object that is a > slight superset of `ApplicationDAOInfo` should be introduced, and can be used > in the response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864611#comment-17864611 ] Wilfred Spiegelenburg commented on YUNIKORN-2734: - This should only used for placeholders. YuniKorn should not delete any other pods. Therefor It should not be configurable. The TODO should not be there. > make configurable for pods in k8shim pkg/client/kubeclient.go > - > > Key: YUNIKORN-2734 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2734 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Huang Guan Hao >Priority: Minor > > for remove //TODO comment > in pkg/client/kubeclient.go > https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141 > Make the grace period for pod deletion configurable. > Currently, the grace period for deleting pods is hardcoded to 3 seconds. > This might not be suitable for all use cases, as some pods might require more > time to gracefully shut down. In the future, this value should be made > configurable, either through a function parameter, configuration file, or > environment variable, to provide more flexibility and accommodate different > scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2688) new applications get placed in draining queue
[ https://issues.apache.org/jira/browse/YUNIKORN-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864038#comment-17864038 ] Wilfred Spiegelenburg commented on YUNIKORN-2688: - With the use of the placement rules always we should already do most of this: * if the queue is not found and cannot be created by the rule no queue name is returned and the next rule is checked * managed queues (via the config) get their state reset after YUNIKORN-2527 when added back * dynamic queues get cleaned up regardless of their state For a dynamic queue to be marked as _draining_ a configured parent of that dynamic queue must be removed. That case needs extra work in YUNIKORN-2689 We need to check in the AppPlacementManager.PlaceApplication() method in for the queue state. like we do for the submit access of the user. We should also cleanup and dedupe the submit access check in that method. Would be good to get this in for 1.6 inline with the changes for the default queue from YUNIKORN-2703 and YUNIKORN-2711 > new applications get placed in draining queue > - > > Key: YUNIKORN-2688 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2688 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Hengzhe Guo >Assignee: Hengzhe Guo >Priority: Major > > The status of the queue isn't checked when placing new applications. We saw a > case where new applications keep getting submitted to a draining queue and > the queue can't be really deleted for days. > a unit test can confirm: > {code:java} > diff --git a/pkg/scheduler/placement/placement_test.go > b/pkg/scheduler/placement/placement_test.go > index 14fe6ac..4f53e0b 100644 > --- a/pkg/scheduler/placement/placement_test.go > +++ b/pkg/scheduler/placement/placement_test.go > @@ -294,6 +294,20 @@ partitions: > if err == nil || queueName != "" { > t.Errorf("parent queue: app should not have been placed, > queue: '%s', error: %v", queueName, err) > } > + > + // user rule existing queue, the queue is draining > + tags = make(map[string]string) > + user = security.UserGroup{ > + User: "testchild", > + Groups: []string{}, > + } > + app = newApplication("app1", "default", "", user, tags, nil, "") > + queueFunc("root.testparent.testchild").MarkQueueForRemoval() > + err = man.PlaceApplication(app) > + queueName = app.GetQueuePath() > + if err == nil || queueName != "" { > + t.Errorf("draining queue: app should not have been placed, > queue: '%s', error: %v", queueName, err) > + } > } func TestForcePlaceApp(t *testing.T) { {code} > For a queue not creatable, we should expect the app to be rejected. > For a queue creatable, we should expect the queue to be transitioned back to > active state, which is blocked by > [YUNIKORN-2689|https://issues.apache.org/jira/browse/YUNIKORN-2689] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Comment Edited] (YUNIKORN-2689) transition draining queues back to active if they are added back
[ https://issues.apache.org/jira/browse/YUNIKORN-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864030#comment-17864030 ] Wilfred Spiegelenburg edited comment on YUNIKORN-2689 at 7/9/24 6:54 AM: - This might need an update for dynamic queues that were created below the managed queue.They would not get reset with the current fix. Example for a dynamic queue: root.parent.. Placement rules: * User ** Tag for namespace *** fixed root.parent Remove the queue root.parent and all queues below will be marked as draining. When we add root.parent back we should reset the state of all dynamic queues below it to make YUNIKORN-2688 possible. We should *not* change the queues below parent that are managed was (Author: wifreds): This might need an update for dynamic queues that were created below the managed queue.They would not get reset with the current fix. > transition draining queues back to active if they are added back > > > Key: YUNIKORN-2689 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2689 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Hengzhe Guo >Assignee: Hengzhe Guo >Priority: Minor > > When a queue is removed but still has jobs running in it, it will be in > 'draining' state. At this stage, if the queue is added back, we should expect > the queue to be transitioned back to active state. However, such transition > is not found in the code base. We observed a case where a queue removed and > soon added back eventually ended up deleted after all jobs were drained. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2689) transition draining queues back to active if they are added back
[ https://issues.apache.org/jira/browse/YUNIKORN-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864030#comment-17864030 ] Wilfred Spiegelenburg commented on YUNIKORN-2689: - This might need an update for dynamic queues that were created below the managed queue.They would not get reset with the current fix. > transition draining queues back to active if they are added back > > > Key: YUNIKORN-2689 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2689 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Hengzhe Guo >Assignee: Hengzhe Guo >Priority: Minor > > When a queue is removed but still has jobs running in it, it will be in > 'draining' state. At this stage, if the queue is added back, we should expect > the queue to be transitioned back to active state. However, such transition > is not found in the code base. We observed a case where a queue removed and > soon added back eventually ended up deleted after all jobs were drained. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2689) transition draining queues back to active if they are added back
[ https://issues.apache.org/jira/browse/YUNIKORN-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864018#comment-17864018 ] Wilfred Spiegelenburg commented on YUNIKORN-2689: - I think this has been fixed in YUNIKORN-2527 for queues that are configured in the configmap, i.e. managed queues, the state will be reset. For dynamic queues we should not have a state that is important. That Jira was only fixed recently and will only be part of the 1.6 release. > transition draining queues back to active if they are added back > > > Key: YUNIKORN-2689 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2689 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Hengzhe Guo >Assignee: Hengzhe Guo >Priority: Minor > > When a queue is removed but still has jobs running in it, it will be in > 'draining' state. At this stage, if the queue is added back, we should expect > the queue to be transitioned back to active state. However, such transition > is not found in the code base. We observed a case where a queue removed and > soon added back eventually ended up deleted after all jobs were drained. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2714) e2e test to ensure queue name with all allowed characters
[ https://issues.apache.org/jira/browse/YUNIKORN-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2714: Labels: newbie (was: ) > e2e test to ensure queue name with all allowed characters > - > > Key: YUNIKORN-2714 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2714 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes, test - e2e >Reporter: Manikandan R >Priority: Major > Labels: newbie > > Create a e2e test to ensure queue name with all allowed special characters > goes through successfully. This is mainly required to confirm there is no > breakage in REST API url because of special characters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2713) Use queue specific REST API directly
[ https://issues.apache.org/jira/browse/YUNIKORN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2713: Labels: newbie (was: ) > Use queue specific REST API directly > > > Key: YUNIKORN-2713 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2713 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes, test - e2e >Reporter: Manikandan R >Priority: Major > Labels: newbie > > There are some places in e2e tests using old way to fetching all queues for > the given partition, then fetch queue specific info in next call. Instead, > Queue info can be fetched directly in a single call. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler
[ https://issues.apache.org/jira/browse/YUNIKORN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2717: Labels: newbie (was: ) > Assert invalid queue name in get queue applications handler > --- > > Key: YUNIKORN-2717 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2717 > Project: Apache YuniKorn > Issue Type: Test > Components: core - scheduler >Reporter: Manikandan R >Priority: Major > Labels: newbie > > Assert invalid queue name in TestGetQueueApplicationsHandler test method > using > assertQueueInvalid(). Also cleanup the method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2712) Missing specific param error for REST API
[ https://issues.apache.org/jira/browse/YUNIKORN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2712: Priority: Minor (was: Major) > Missing specific param error for REST API > - > > Key: YUNIKORN-2712 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2712 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Manikandan R >Priority: Minor > Labels: newbie > > Some REST API's throw "missing specific param" kind of errors, but not all. > For example, user name is missing. Similarly, all mandatory parameters in > other REST API's can follow the same pattern. It is very clear, rather than > saying "doesn't exists" kind of error. > Suggestion given in > [https://github.com/apache/yunikorn-core/pull/905#discussion_r1663068429] can > be used as reference for implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2717) Assert invalid queue name in get queue applications handler
[ https://issues.apache.org/jira/browse/YUNIKORN-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2717: Priority: Minor (was: Major) > Assert invalid queue name in get queue applications handler > --- > > Key: YUNIKORN-2717 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2717 > Project: Apache YuniKorn > Issue Type: Test > Components: core - scheduler >Reporter: Manikandan R >Priority: Minor > Labels: newbie > > Assert invalid queue name in TestGetQueueApplicationsHandler test method > using > assertQueueInvalid(). Also cleanup the method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2712) Missing specific param error for REST API
[ https://issues.apache.org/jira/browse/YUNIKORN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2712: Labels: newbie (was: ) > Missing specific param error for REST API > - > > Key: YUNIKORN-2712 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2712 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Manikandan R >Priority: Major > Labels: newbie > > Some REST API's throw "missing specific param" kind of errors, but not all. > For example, user name is missing. Similarly, all mandatory parameters in > other REST API's can follow the same pattern. It is very clear, rather than > saying "doesn't exists" kind of error. > Suggestion given in > [https://github.com/apache/yunikorn-core/pull/905#discussion_r1663068429] can > be used as reference for implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2719) Assert invalid group name in Get Group REST API
[ https://issues.apache.org/jira/browse/YUNIKORN-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2719: Labels: newbie (was: ) > Assert invalid group name in Get Group REST API > --- > > Key: YUNIKORN-2719 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2719 > Project: Apache YuniKorn > Issue Type: Test > Components: core - scheduler >Reporter: Manikandan R >Priority: Major > Labels: newbie > > Assert invalid group name in Get Group REST API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2720) Use createRequest() in handlers_test.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2720: Labels: newbie (was: ) > Use createRequest() in handlers_test.go > --- > > Key: YUNIKORN-2720 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2720 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Manikandan R >Priority: Major > Labels: newbie > > Use createRequest() helper methods where ever applicable in handlers_test.go. > handlers_test.go is huge. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2729) remove `--new-from-rev` from Makefile
[ https://issues.apache.org/jira/browse/YUNIKORN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864012#comment-17864012 ] Wilfred Spiegelenburg commented on YUNIKORN-2729: - I am all for this as long as we fix all the issues. It is not good enough to fix most. We need a clean {{make lint}} result. If we do not the pre-commit tests will fail. When the pre-commit tests fail in the linter no unit tests are run which means no commit. > remove `--new-from-rev` from Makefile > - > > Key: YUNIKORN-2729 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2729 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Chia-Ping Tsai >Assignee: Huang Guan Hao >Priority: Minor > Labels: pull-request-available > > It is time to show the power of lint :) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-2629) Adding a node can result in a deadlock
[ https://issues.apache.org/jira/browse/YUNIKORN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864000#comment-17864000 ] Wilfred Spiegelenburg commented on YUNIKORN-2629: - [~jshmchenxi] The latest stack trace you attached shows no deadlock or even locking inside the core or shim code. You have a different issue, not related to deadlocks. Please open a new Jira for this. There are 18 occurrences of calls that reference the semaphore code (locks): * 9 from K8s shared informers waiting for object updates to come from K8s * 9 from K8s network data readers Those are expected. If no data is transmitted and being processed by the K8s informers they should sit there and wait. No other code has any locks. When I look at the YuniKorn code references in the stack trace I can see an idle scheduler. Nothing is being processed on the K8shim side, and it is sleeping waiting for changes. The core side is also not scheduling and sleeping. There is one go routine that jumps out for me: {code:java} goroutine 19661710185 [IO wait] ... created by golang.org/x/net/http2.(*ClientConn).goRun in goroutine 19661710184 golang.org/x/net@v0.23.0/http2/transport.go:369 +0x2d {code} The go routine mentioned in the created by does not exist in the dump. Not sure if that just means it still needs to timeout or something else is happening but this is not the deadlock as per this jira. > Adding a node can result in a deadlock > -- > > Key: YUNIKORN-2629 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2629 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Affects Versions: 1.5.0 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Blocker > Labels: pull-request-available > Fix For: 1.5.2 > > Attachments: updateNode_deadlock_trace.txt, > yunikorn-scheduler-20240627.log, yunikorn_stuck_stack_20240708.txt > > > Adding a new node after Yunikorn state initialization can result in a > deadlock. > The problem is that {{Context.addNode()}} holds a lock while we're waiting > for the {{NodeAccepted}} event: > {noformat} >dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, > func(event interface{}) { > nodeEvent, ok := event.(CachedSchedulerNodeEvent) > if !ok { > return > } > [...] removed for clarity > wg.Done() > }) > defer dispatcher.UnregisterEventHandler(handlerID, > dispatcher.EventTypeNode) > if err := > ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateNode(&si.NodeRequest{ > Nodes: nodesToRegister, > RmID: schedulerconf.GetSchedulerConf().ClusterID, > }); err != nil { > log.Log(log.ShimContext).Error("Failed to register nodes", > zap.Error(err)) > return nil, err > } > // wait for all responses to accumulate > wg.Wait() <--- shim gets stuck here > {noformat} > If tasks are being processed, then the dispatcher will try to retrieve the > evend handler, which is returned from Context: > {noformat} > go func() { > for { > select { > case event := <-getDispatcher().eventChan: > switch v := event.(type) { > case events.TaskEvent: > getEventHandler(EventTypeTask)(v) <--- > eventually calls Context.getTask() > case events.ApplicationEvent: > getEventHandler(EventTypeApp)(v) > case events.SchedulerNodeEvent: > getEventHandler(EventTypeNode)(v) > {noformat} > Since {{addNode()}} is holding a write lock, the event processing loop gets > stuck, so {{registerNodes()}} will never progress. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-2651) Update the unchecked error for make lint warnings
[ https://issues.apache.org/jira/browse/YUNIKORN-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-2651: Fix Version/s: (was: 1.6.0) Target Version: 1.6.0 > Update the unchecked error for make lint warnings > - > > Key: YUNIKORN-2651 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2651 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Chia-Ping Tsai >Assignee: Yun Sun >Priority: Major > Labels: pull-request-available > > fix the lint about "unhandled error" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org