[jira] [Updated] (SPARK-45057) Deadlock caused by rdd replication level of 2

2023-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45057:
---
Labels: pull-request-available  (was: )

> Deadlock caused by rdd replication level of 2
> -
>
> Key: SPARK-45057
> URL: https://issues.apache.org/jira/browse/SPARK-45057
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Zhongwei Zhu
>Priority: Major
>  Labels: pull-request-available
>
>  
> When 2 tasks try to compute same rdd with replication level of 2 and running 
> on only 2 executors. Deadlock will happen.
> Task only release lock after writing into local machine and replicate to 
> remote executor.
>  
> ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
> Thread T3)||Exe 2 (Shuffle Server Thread T4)||
> |T0|write lock of rdd| | | |
> |T1| | |write lock of rdd| |
> |T2|replicate -> UploadBlockSync (blocked by T4)| | | |
> |T3| | | |Received UploadBlock request from T1 (blocked by T4)|
> |T4| | |replicate -> UploadBlockSync (blocked by T2)| |
> |T5| |Received UploadBlock request from T3 (blocked by T1)| | |
> |T6|Deadlock|Deadlock|Deadlock|Deadlock|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45057) Deadlock caused by rdd replication level of 2

2023-09-05 Thread Zhongwei Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongwei Zhu updated SPARK-45057:
-
Description: 
 
When 2 tasks try to compute same rdd with replication level of 2 and running on 
only 2 executors. Deadlock will happen.

Task only release lock after writing into local machine and replicate to remote 
executor.

 
||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
Thread T3)||Exe 2 (Shuffle Server Thread T4)||
|T0|write lock of rdd| | | |
|T1| | |write lock of rdd| |
|T2|replicate -> UploadBlockSync (blocked by T4)| | | |
|T3| | | |Received UploadBlock request from T1 (blocked by T4)|
|T4| | |replicate -> UploadBlockSync (blocked by T2)| |
|T5| |Received UploadBlock request from T3 (blocked by T1)| | |
|T6|Deadlock|Deadlock|Deadlock|Deadlock|

  was:
 
When 2 tasks try to compute same rdd with replication level of 2 and running on 
only 2 executors. Deadlock will happen.

 
||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
Thread T3)||Exe 2 (Shuffle Server Thread T4)||
|T0|write lock of rdd| | | |
|T1| | |write lock of rdd| |
|T2|replicate -> UploadBlockSync (blocked by T4)| | | |
|T3| | | |Received UploadBlock request from T1 (blocked by T4)|
|T4| | |replicate -> UploadBlockSync (blocked by T2)| |
|T5| |Received UploadBlock request from T3 (blocked by T1)| | |
|T6|Deadlock|Deadlock|Deadlock|Deadlock|


> Deadlock caused by rdd replication level of 2
> -
>
> Key: SPARK-45057
> URL: https://issues.apache.org/jira/browse/SPARK-45057
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Zhongwei Zhu
>Priority: Major
>
>  
> When 2 tasks try to compute same rdd with replication level of 2 and running 
> on only 2 executors. Deadlock will happen.
> Task only release lock after writing into local machine and replicate to 
> remote executor.
>  
> ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
> Thread T3)||Exe 2 (Shuffle Server Thread T4)||
> |T0|write lock of rdd| | | |
> |T1| | |write lock of rdd| |
> |T2|replicate -> UploadBlockSync (blocked by T4)| | | |
> |T3| | | |Received UploadBlock request from T1 (blocked by T4)|
> |T4| | |replicate -> UploadBlockSync (blocked by T2)| |
> |T5| |Received UploadBlock request from T3 (blocked by T1)| | |
> |T6|Deadlock|Deadlock|Deadlock|Deadlock|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45057) Deadlock caused by rdd replication level of 2

2023-09-01 Thread Zhongwei Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongwei Zhu updated SPARK-45057:
-
Description: 
 
When 2 tasks try to compute same rdd with replication level of 2 and running on 
only 2 executors. Deadlock will happen.

 
||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
Thread T3)||Exe 2 (Shuffle Server Thread T4)||
|T0|write lock of rdd| | | |
|T1| | |write lock of rdd| |
|T2|replicate -> UploadBlockSync (blocked by T4)| | | |
|T3| | | |Received UploadBlock request from T1 (blocked by T4)|
|T4| | |replicate -> UploadBlockSync (blocked by T2)| |
|T5| |Received UploadBlock request from T3 (blocked by T1)| | |
|T6|Deadlock|Deadlock|Deadlock|Deadlock|

  was:
 
When 2 tasks try to compute same rdd with replication level of 2 and running on 
only 2 executors. Deadlock will happen.

 
||Time||Exe 1 (Task Thread 1)||Exe 1 (Shuffle Server Thread 2)||Exe 2 (Task 
Thread 3)||Exe 2 (Shuffle Server Thread 4)||
|T0|write lock of rdd| | | |
|T1| | |write lock of rdd| |
|T2|replicate -> UploadBlockSync (blocked by shuffle server thread 4)| | | |
|T3| | | |Received UploadBlock request(blocked by task thread 3)|
|T4| | |replicate -> UploadBlockSync (blocked by shuffle server thread 2)| |
|T5| |Received UploadBlock request(blocked by task thread 1)| | |
|T6|Deadlock|Deadlock|Deadlock|Deadlock|


> Deadlock caused by rdd replication level of 2
> -
>
> Key: SPARK-45057
> URL: https://issues.apache.org/jira/browse/SPARK-45057
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Zhongwei Zhu
>Priority: Major
>
>  
> When 2 tasks try to compute same rdd with replication level of 2 and running 
> on only 2 executors. Deadlock will happen.
>  
> ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
> Thread T3)||Exe 2 (Shuffle Server Thread T4)||
> |T0|write lock of rdd| | | |
> |T1| | |write lock of rdd| |
> |T2|replicate -> UploadBlockSync (blocked by T4)| | | |
> |T3| | | |Received UploadBlock request from T1 (blocked by T4)|
> |T4| | |replicate -> UploadBlockSync (blocked by T2)| |
> |T5| |Received UploadBlock request from T3 (blocked by T1)| | |
> |T6|Deadlock|Deadlock|Deadlock|Deadlock|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org