[ https://issues.apache.org/jira/browse/SPARK-42784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mridul Muralidharan resolved SPARK-42784. ----------------------------------------- Resolution: Fixed > Fix the problem of incomplete creation of subdirectories in push merged > localDir > -------------------------------------------------------------------------------- > > Key: SPARK-42784 > URL: https://issues.apache.org/jira/browse/SPARK-42784 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core > Affects Versions: 3.3.2 > Reporter: Fencheng Mei > Assignee: Fencheng Mei > Priority: Major > Fix For: 3.3.3, 3.5.0, 3.4.2 > > > After we massively enabled push-based shuffle in our production environment, > we found some warn messages appearing in the server-side log messages. > the warning log like: > ShuffleBlockPusher: Pushing block shufflePush_3_0_5352_935 to > BlockManagerId(shuffle-push-merger, zw06-data-hdp-dn08251.mt, 7337, None) > failed. > java.lang.RuntimeException: java.lang.RuntimeException: Cannot initialize > merged shuffle partition for appId application_1671244879475_44020960 > shuffleId 3 shuffleMergeId 0 reduceId 935. > After investigation, we identified the triggering mechanism of the bug。 > The driver requested two different containers on the same physical machine. > During the creation of the 'push-merged' directory in the first container > (container_1), the mergeDir was created first, then the subDir were created > based on the value of the "spark.diskStore.subDirectories" parameter. > However, the resources of container_1 were preempted during the creation of > the sub-directories, resulting in subDir not being created (only part of it > was created ). As the mergeDir still existed, the second container > (container_2) was unable to create further subDir (as it assumed that all > directories had already been created). > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org