[ https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422800#comment-17422800 ]
Min Shen commented on SPARK-36892: ---------------------------------- [~Gengliang.Wang] This issue and the ones fixed earlier are surfaced as we started having a variety of LinkedIn's internal workloads testing against a version of Spark based on 3.2.0 RC. Notice that when we previously productionized push-based shuffle internally at LinkedIn, it was developed based on top of Spark 2.3/2.4. We are fairly certain about the major functionalities of push-based shuffle since they have been in production at LinkedIn for a year now. However, some of the code for push-based shuffle in Spark 3.2.0 RC are new and hadn't been tested with our internal workloads until recently. So hopefully this explains the context around the few recent issues, and in terms of testing with real workloads we are already doing it and will continue so to help with 3.2.0 release. > Disable batch fetch for a shuffle when push based shuffle is enabled > -------------------------------------------------------------------- > > Key: SPARK-36892 > URL: https://issues.apache.org/jira/browse/SPARK-36892 > Project: Spark > Issue Type: Bug > Components: Shuffle > Affects Versions: 3.2.0 > Reporter: Mridul Muralidharan > Priority: Blocker > > When push based shuffle is enabled, efficient fetch of merged mapper shuffle > output happens. > Unfortunately, this currently interacts badly with > spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle > fetch to hang and/or duplicate data to be fetched, causing correctness issues. > Given batch fetch does not benefit spark stages reading merged blocks when > push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can > be disabled when push based shuffle is enabled. > Thx to [~Ngone51] for surfacing this issue. > +CC [~Gengliang.Wang] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org