[jira] [Updated] (SPARK-45057) Deadlock caused by rdd replication level of 2
[ https://issues.apache.org/jira/browse/SPARK-45057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45057: --- Labels: pull-request-available (was: ) > Deadlock caused by rdd replication level of 2 > - > > Key: SPARK-45057 > URL: https://issues.apache.org/jira/browse/SPARK-45057 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Zhongwei Zhu >Priority: Major > Labels: pull-request-available > > > When 2 tasks try to compute same rdd with replication level of 2 and running > on only 2 executors. Deadlock will happen. > Task only release lock after writing into local machine and replicate to > remote executor. > > ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task > Thread T3)||Exe 2 (Shuffle Server Thread T4)|| > |T0|write lock of rdd| | | | > |T1| | |write lock of rdd| | > |T2|replicate -> UploadBlockSync (blocked by T4)| | | | > |T3| | | |Received UploadBlock request from T1 (blocked by T4)| > |T4| | |replicate -> UploadBlockSync (blocked by T2)| | > |T5| |Received UploadBlock request from T3 (blocked by T1)| | | > |T6|Deadlock|Deadlock|Deadlock|Deadlock| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45057) Deadlock caused by rdd replication level of 2
[ https://issues.apache.org/jira/browse/SPARK-45057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhongwei Zhu updated SPARK-45057: - Description: When 2 tasks try to compute same rdd with replication level of 2 and running on only 2 executors. Deadlock will happen. Task only release lock after writing into local machine and replicate to remote executor. ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task Thread T3)||Exe 2 (Shuffle Server Thread T4)|| |T0|write lock of rdd| | | | |T1| | |write lock of rdd| | |T2|replicate -> UploadBlockSync (blocked by T4)| | | | |T3| | | |Received UploadBlock request from T1 (blocked by T4)| |T4| | |replicate -> UploadBlockSync (blocked by T2)| | |T5| |Received UploadBlock request from T3 (blocked by T1)| | | |T6|Deadlock|Deadlock|Deadlock|Deadlock| was: When 2 tasks try to compute same rdd with replication level of 2 and running on only 2 executors. Deadlock will happen. ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task Thread T3)||Exe 2 (Shuffle Server Thread T4)|| |T0|write lock of rdd| | | | |T1| | |write lock of rdd| | |T2|replicate -> UploadBlockSync (blocked by T4)| | | | |T3| | | |Received UploadBlock request from T1 (blocked by T4)| |T4| | |replicate -> UploadBlockSync (blocked by T2)| | |T5| |Received UploadBlock request from T3 (blocked by T1)| | | |T6|Deadlock|Deadlock|Deadlock|Deadlock| > Deadlock caused by rdd replication level of 2 > - > > Key: SPARK-45057 > URL: https://issues.apache.org/jira/browse/SPARK-45057 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Zhongwei Zhu >Priority: Major > > > When 2 tasks try to compute same rdd with replication level of 2 and running > on only 2 executors. Deadlock will happen. > Task only release lock after writing into local machine and replicate to > remote executor. > > ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task > Thread T3)||Exe 2 (Shuffle Server Thread T4)|| > |T0|write lock of rdd| | | | > |T1| | |write lock of rdd| | > |T2|replicate -> UploadBlockSync (blocked by T4)| | | | > |T3| | | |Received UploadBlock request from T1 (blocked by T4)| > |T4| | |replicate -> UploadBlockSync (blocked by T2)| | > |T5| |Received UploadBlock request from T3 (blocked by T1)| | | > |T6|Deadlock|Deadlock|Deadlock|Deadlock| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45057) Deadlock caused by rdd replication level of 2
[ https://issues.apache.org/jira/browse/SPARK-45057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhongwei Zhu updated SPARK-45057: - Description: When 2 tasks try to compute same rdd with replication level of 2 and running on only 2 executors. Deadlock will happen. ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task Thread T3)||Exe 2 (Shuffle Server Thread T4)|| |T0|write lock of rdd| | | | |T1| | |write lock of rdd| | |T2|replicate -> UploadBlockSync (blocked by T4)| | | | |T3| | | |Received UploadBlock request from T1 (blocked by T4)| |T4| | |replicate -> UploadBlockSync (blocked by T2)| | |T5| |Received UploadBlock request from T3 (blocked by T1)| | | |T6|Deadlock|Deadlock|Deadlock|Deadlock| was: When 2 tasks try to compute same rdd with replication level of 2 and running on only 2 executors. Deadlock will happen. ||Time||Exe 1 (Task Thread 1)||Exe 1 (Shuffle Server Thread 2)||Exe 2 (Task Thread 3)||Exe 2 (Shuffle Server Thread 4)|| |T0|write lock of rdd| | | | |T1| | |write lock of rdd| | |T2|replicate -> UploadBlockSync (blocked by shuffle server thread 4)| | | | |T3| | | |Received UploadBlock request(blocked by task thread 3)| |T4| | |replicate -> UploadBlockSync (blocked by shuffle server thread 2)| | |T5| |Received UploadBlock request(blocked by task thread 1)| | | |T6|Deadlock|Deadlock|Deadlock|Deadlock| > Deadlock caused by rdd replication level of 2 > - > > Key: SPARK-45057 > URL: https://issues.apache.org/jira/browse/SPARK-45057 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Zhongwei Zhu >Priority: Major > > > When 2 tasks try to compute same rdd with replication level of 2 and running > on only 2 executors. Deadlock will happen. > > ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task > Thread T3)||Exe 2 (Shuffle Server Thread T4)|| > |T0|write lock of rdd| | | | > |T1| | |write lock of rdd| | > |T2|replicate -> UploadBlockSync (blocked by T4)| | | | > |T3| | | |Received UploadBlock request from T1 (blocked by T4)| > |T4| | |replicate -> UploadBlockSync (blocked by T2)| | > |T5| |Received UploadBlock request from T3 (blocked by T1)| | | > |T6|Deadlock|Deadlock|Deadlock|Deadlock| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org