[ 
https://issues.apache.org/jira/browse/SPARK-39283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Pal updated SPARK-39283:
--------------------------------
    Description: 
We are seems this deadlock between {{TaskMemoryManager}} and 
{{UnsafeExternalSorter}} pretty often on our workload. Sometime, the retry is 
successful but sometimes we have to do hacky ways to break the deadlocks such 
as turning down the worker machines explicitly. 

Below is the thread dump from the Spark UI showing the deadlock :



!DeadlockSparkTasks.png!

I believe there was a related Jira on the similar deadlock between the same 
threads and it was resolved. 
https://issues.apache.org/jira/browse/SPARK-27338

 

 

  was:
We are seems this deadlock between {{TaskMemoryManager}} and 
{{UnsafeExternalSorter}} pretty often on our workload. Sometime, the retry is 
successful but sometimes we have to do hacky ways to break the deadlocks such 
as turning down the worker machines explicitly. 

Below is the thread dump from the Spark UI showing the deadlock :
!image-2022-05-24-20-03-35-287.png!

 

I believe there was a related Jira on the similar deadlock between the same 
threads and it was resolved. 
https://issues.apache.org/jira/browse/SPARK-27338

 

 


> Spark tasks stuck forever due to deadlock between TaskMemoryManager and 
> UnsafeExternalSorter
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39283
>                 URL: https://issues.apache.org/jira/browse/SPARK-39283
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Sandeep Pal
>            Priority: Critical
>         Attachments: DeadlockSparkTasks.png
>
>
> We are seems this deadlock between {{TaskMemoryManager}} and 
> {{UnsafeExternalSorter}} pretty often on our workload. Sometime, the retry is 
> successful but sometimes we have to do hacky ways to break the deadlocks such 
> as turning down the worker machines explicitly. 
> Below is the thread dump from the Spark UI showing the deadlock :
> !DeadlockSparkTasks.png!
> I believe there was a related Jira on the similar deadlock between the same 
> threads and it was resolved. 
> https://issues.apache.org/jira/browse/SPARK-27338
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to