[jira] [Resolved] (YUNIKORN-2498) Implement force create flag in k8shim for recovery queue

2024-04-01 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2498.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

> Implement force create flag in k8shim for recovery queue
> 
>
> Key: YUNIKORN-2498
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2498
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> As part of the initialisation changes a new recovery queue was added to allow 
> already running allocation to be restored even if the queue config was 
> changed. The implementation on the k8shim side needs to be added to leverage 
> the forced create flag from YUNIKORN-1887.
> Without that the changes added for the recovery queue will not be used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2526) Discrepancy between shim cache and core app/task list after scheduler restart

2024-04-01 Thread Shravan Achar (Jira)
Shravan Achar created YUNIKORN-2526:
---

 Summary: Discrepancy between shim cache and core app/task list 
after scheduler restart
 Key: YUNIKORN-2526
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2526
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Shravan Achar
 Attachments: log-snippet.txt, state-dump-4-1-3.json

When scheduler restarts, occasionally it gets into a situation where the 
application is still in Running state despite the application getting 
terminated in the cluster. This is confirmed with the attached state dump.

 

The scheduler core logs indicate all nodes are being evaluated for non-existing 
application (also attached). The CPU is being used up doing this unneeded 
evaluation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2525) Make dispatcher.Stop() shut down quicker

2024-04-01 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2525:
--

 Summary: Make dispatcher.Stop() shut down quicker
 Key: YUNIKORN-2525
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2525
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Peter Bacsko
Assignee: Peter Bacsko


{{dispatcher.Stop()}} takes sometimes an extra 1 second to shut down properly. 
This slows down unit tests. On my machine, {{context_test.go}} runs for 19-20 
seconds. With some improvements, this can be improved to 1 second.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org