[jira] [Commented] (SPARK-37023) Avoid fetching merge status when shuffleMergeEnabled is false for a shuffleDependency during retry
[ https://issues.apache.org/jira/browse/SPARK-37023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437042#comment-17437042 ] Apache Spark commented on SPARK-37023: -- User 'rmcyang' has created a pull request for this issue: https://github.com/apache/spark/pull/34461 > Avoid fetching merge status when shuffleMergeEnabled is false for a > shuffleDependency during retry > -- > > Key: SPARK-37023 > URL: https://issues.apache.org/jira/browse/SPARK-37023 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Ye Zhou >Priority: Major > > The assertion below inĀ MapOutoutputTracker.getMapSizesByExecutorId is not > guaranteed > {code:java} > assert(mapSizesByExecutorId.enableBatchFetch == true){code} > The reason is during some stage retry cases, the > shuffleDependency.shuffleMergeEnabled is set to false, but there will be > mergeStatus since the Driver has collected the merged status for its shuffle > dependency. If this is the case, the current implementation would set the > enableBatchFetch to false, since there are mergeStatus. > Details can be found here: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L1492] > We should improve the implementation here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37023) Avoid fetching merge status when shuffleMergeEnabled is false for a shuffleDependency during retry
[ https://issues.apache.org/jira/browse/SPARK-37023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437041#comment-17437041 ] Apache Spark commented on SPARK-37023: -- User 'rmcyang' has created a pull request for this issue: https://github.com/apache/spark/pull/34461 > Avoid fetching merge status when shuffleMergeEnabled is false for a > shuffleDependency during retry > -- > > Key: SPARK-37023 > URL: https://issues.apache.org/jira/browse/SPARK-37023 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Ye Zhou >Priority: Major > > The assertion below inĀ MapOutoutputTracker.getMapSizesByExecutorId is not > guaranteed > {code:java} > assert(mapSizesByExecutorId.enableBatchFetch == true){code} > The reason is during some stage retry cases, the > shuffleDependency.shuffleMergeEnabled is set to false, but there will be > mergeStatus since the Driver has collected the merged status for its shuffle > dependency. If this is the case, the current implementation would set the > enableBatchFetch to false, since there are mergeStatus. > Details can be found here: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L1492] > We should improve the implementation here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org