[ 
https://issues.apache.org/jira/browse/YUNIKORN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870395#comment-17870395
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2646:
-------------------------------------------------

This is not the cause, it cannot be.

YuniKorn 1.5.2 has a deadlock fix as per YUNIKORN-2629.

If there is a lock up left we should see it for others too, specially when you 
say it happens really often. We have not got the evidence that confirms this, 
we cannot fix or change without understanding what is broken. We need logs or a 
reproduction that shows the issue.

When you get to the "stuck" state collect the details and open a *_new_* jira:
 * scheduler logs
 * state dump via /ws/v1/fullstatedump
 * pprof output of /debug/pprof/goroutine?debug=2

If it really is a deadlock in the code the state dump will most likely fail. 
Logs and pprof never fail so we should have a full routine dump. You can even 
collect two in a row to .

> Deadlock detected during preemption
> -----------------------------------
>
>                 Key: YUNIKORN-2646
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2646
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Dmitry
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.6.0, 1.5.2
>
>         Attachments: yunikorn-logs-lock.txt.gz, yunikorn-logs.txt.gz
>
>
> Hitting deadlocks in 1.5.1
> The log is attached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to