[ 
https://issues.apache.org/jira/browse/SPARK-45057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-45057.
-----------------------------------------
    Fix Version/s: 3.3.4
                   3.5.1
                   4.0.0
                   3.4.2
       Resolution: Fixed

Issue resolved by pull request 43067
[https://github.com/apache/spark/pull/43067]

> Deadlock caused by rdd replication level of 2
> ---------------------------------------------
>
>                 Key: SPARK-45057
>                 URL: https://issues.apache.org/jira/browse/SPARK-45057
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.4.1
>            Reporter: Zhongwei Zhu
>            Assignee: Zhongwei Zhu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2
>
>
>  
> When 2 tasks try to compute same rdd with replication level of 2 and running 
> on only 2 executors. Deadlock will happen.
> Task only release lock after writing into local machine and replicate to 
> remote executor.
>  
> ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
> Thread T3)||Exe 2 (Shuffle Server Thread T4)||
> |T0|write lock of rdd| | | |
> |T1| | |write lock of rdd| |
> |T2|replicate -> UploadBlockSync (blocked by T4)| | | |
> |T3| | | |Received UploadBlock request from T1 (blocked by T4)|
> |T4| | |replicate -> UploadBlockSync (blocked by T2)| |
> |T5| |Received UploadBlock request from T3 (blocked by T1)| | |
> |T6|Deadlock|Deadlock|Deadlock|Deadlock|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to