Zhongwei Zhu created SPARK-45057:
------------------------------------

             Summary: Deadlock caused by rdd replication level of 2
                 Key: SPARK-45057
                 URL: https://issues.apache.org/jira/browse/SPARK-45057
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.4.1
            Reporter: Zhongwei Zhu


 
When 2 tasks try to compute same rdd with replication level of 2 and running on 
only 2 executors. Deadlock will happen.

 
||Time||Exe 1 (Task Thread 1)||Exe 1 (Shuffle Server Thread 2)||Exe 2 (Task 
Thread 3)||Exe 2 (Shuffle Server Thread 4)||
|T0|write lock of rdd| | | |
|T1| | |write lock of rdd| |
|T2|replicate -> UploadBlockSync (blocked by shuffle server thread 4)| | | |
|T3| | | |Received UploadBlock request(blocked by task thread 3)|
|T4| | |replicate -> UploadBlockSync (blocked by shuffle server thread 2)| |
|T5| |Received UploadBlock request(blocked by task thread 1)| | |
|T6|Deadlock|Deadlock|Deadlock|Deadlock|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to