[jira] [Assigned] (SPARK-42784) Fix the problem of incomplete creation of subdirectories in push merged localDir

2023-06-30 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-42784:
---

Assignee: Fencheng Mei

> Fix the problem of incomplete creation of subdirectories in push merged 
> localDir
> 
>
> Key: SPARK-42784
> URL: https://issues.apache.org/jira/browse/SPARK-42784
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.3.2
>Reporter: Fencheng Mei
>Assignee: Fencheng Mei
>Priority: Major
>
> After we massively enabled push-based shuffle in our production environment, 
> we found some warn messages appearing in the server-side log messages.
> the warning log like:
> ShuffleBlockPusher: Pushing block shufflePush_3_0_5352_935 to 
> BlockManagerId(shuffle-push-merger, zw06-data-hdp-dn08251.mt, 7337, None) 
> failed.
> java.lang.RuntimeException: java.lang.RuntimeException: Cannot initialize 
> merged shuffle partition for appId application_1671244879475_44020960 
> shuffleId 3 shuffleMergeId 0 reduceId 935.
> After investigation, we identified the triggering mechanism of the bug。
> The driver requested two different containers on the same physical machine. 
> During the creation of the 'push-merged' directory in the first container 
> (container_1), the mergeDir was created first, then the subDir were created 
> based on the value of the "spark.diskStore.subDirectories" parameter. 
> However, the resources of container_1 were preempted during the creation of 
> the sub-directories, resulting in subDir not being created (only part of it 
> was created ). As the mergeDir still existed, the second container 
> (container_2) was unable to create further subDir (as it assumed that all 
> directories had already been created).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42784) Fix the problem of incomplete creation of subdirectories in push merged localDir

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42784:


Assignee: (was: Apache Spark)

> Fix the problem of incomplete creation of subdirectories in push merged 
> localDir
> 
>
> Key: SPARK-42784
> URL: https://issues.apache.org/jira/browse/SPARK-42784
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.3.2
>Reporter: Fencheng Mei
>Priority: Major
>
> After we massively enabled push-based shuffle in our production environment, 
> we found some warn messages appearing in the server-side log messages.
> the warning log like:
> ShuffleBlockPusher: Pushing block shufflePush_3_0_5352_935 to 
> BlockManagerId(shuffle-push-merger, zw06-data-hdp-dn08251.mt, 7337, None) 
> failed.
> java.lang.RuntimeException: java.lang.RuntimeException: Cannot initialize 
> merged shuffle partition for appId application_1671244879475_44020960 
> shuffleId 3 shuffleMergeId 0 reduceId 935.
> After investigation, we identified the triggering mechanism of the bug。
> The driver requested two different containers on the same physical machine. 
> During the creation of the 'push-merged' directory in the first container 
> (container_1), the mergeDir was created first, then the subDir were created 
> based on the value of the "spark.diskStore.subDirectories" parameter. 
> However, the resources of container_1 were preempted during the creation of 
> the sub-directories, resulting in subDir not being created (only part of it 
> was created ). As the mergeDir still existed, the second container 
> (container_2) was unable to create further subDir (as it assumed that all 
> directories had already been created).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42784) Fix the problem of incomplete creation of subdirectories in push merged localDir

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42784:


Assignee: Apache Spark

> Fix the problem of incomplete creation of subdirectories in push merged 
> localDir
> 
>
> Key: SPARK-42784
> URL: https://issues.apache.org/jira/browse/SPARK-42784
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.3.2
>Reporter: Fencheng Mei
>Assignee: Apache Spark
>Priority: Major
>
> After we massively enabled push-based shuffle in our production environment, 
> we found some warn messages appearing in the server-side log messages.
> the warning log like:
> ShuffleBlockPusher: Pushing block shufflePush_3_0_5352_935 to 
> BlockManagerId(shuffle-push-merger, zw06-data-hdp-dn08251.mt, 7337, None) 
> failed.
> java.lang.RuntimeException: java.lang.RuntimeException: Cannot initialize 
> merged shuffle partition for appId application_1671244879475_44020960 
> shuffleId 3 shuffleMergeId 0 reduceId 935.
> After investigation, we identified the triggering mechanism of the bug。
> The driver requested two different containers on the same physical machine. 
> During the creation of the 'push-merged' directory in the first container 
> (container_1), the mergeDir was created first, then the subDir were created 
> based on the value of the "spark.diskStore.subDirectories" parameter. 
> However, the resources of container_1 were preempted during the creation of 
> the sub-directories, resulting in subDir not being created (only part of it 
> was created ). As the mergeDir still existed, the second container 
> (container_2) was unable to create further subDir (as it assumed that all 
> directories had already been created).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org