[jira] [Commented] (SPARK-30602) SPIP: Support push-based shuffle to improve shuffle efficiency

Min Shen (Jira) Wed, 24 Jun 2020 10:01:27 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-30602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144060#comment-17144060
 ]


Min Shen commented on SPARK-30602:
----------------------------------

Also want to share the production results we have so far.

We have deployed Magnet, the new push-based shuffle service, to our biggest 
production cluster at LinkedIn.

This is a very large-scale and busy cluster. We run 30K+ Spark applications 
daily which shuffle ~5PB data.

We have enabled a few fairly complex production flows in our cluster to start 
using the new push-based shuffle.

The result of using the new push-based shuffle mechanism is shown in the 
screenshot below.

Here, we used an internal performance comparison tooling to compare the effect 
of enabling push-based shuffle.

We saw huge reduction in shuffle read wait time, which also significantly 
brought down the total executor runtime.

!Screen Shot 2020-06-23 at 11.31.22 AM.jpg!

> SPIP: Support push-based shuffle to improve shuffle efficiency
> --------------------------------------------------------------
>
>                 Key: SPARK-30602
>                 URL: https://issues.apache.org/jira/browse/SPARK-30602
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Min Shen
>            Priority: Major
>         Attachments: Screen Shot 2020-06-23 at 11.31.22 AM.jpg, 
> vldb_2020_magnet_shuffle.pdf
>
>
> In a large deployment of a Spark compute infrastructure, Spark shuffle is 
> becoming a potential scaling bottleneck and a source of inefficiency in the 
> cluster. When doing Spark on YARN for a large-scale deployment, people 
> usually enable Spark external shuffle service and store the intermediate 
> shuffle files on HDD. Because the number of blocks generated for a particular 
> shuffle grows quadratically compared to the size of shuffled data (# mappers 
> and reducers grows linearly with the size of shuffled data, but # blocks is # 
> mappers * # reducers), one general trend we have observed is that the more 
> data a Spark application processes, the smaller the block size becomes. In a 
> few production clusters we have seen, the average shuffle block size is only 
> 10s of KBs. Because of the inefficiency of performing random reads on HDD for 
> small amount of data, the overall efficiency of the Spark external shuffle 
> services serving the shuffle blocks degrades as we see an increasing # of 
> Spark applications processing an increasing amount of data. In addition, 
> because Spark external shuffle service is a shared service in a multi-tenancy 
> cluster, the inefficiency with one Spark application could propagate to other 
> applications as well.
> In this ticket, we propose a solution to improve Spark shuffle efficiency in 
> above mentioned environments with push-based shuffle. With push-based 
> shuffle, shuffle is performed at the end of mappers and blocks get pre-merged 
> and move towards reducers. In our prototype implementation, we have seen 
> significant efficiency improvements when performing large shuffles. We take a 
> Spark-native approach to achieve this, i.e., extending Spark’s existing 
> shuffle netty protocol, and the behaviors of Spark mappers, reducers and 
> drivers. This way, we can bring the benefits of more efficient shuffle in 
> Spark without incurring the dependency or overhead of either specialized 
> storage layer or external infrastructure pieces.
>  
> Link to dev mailing list discussion: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Enabling-push-based-shuffle-in-Spark-td28732.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30602) SPIP: Support push-based shuffle to improve shuffle efficiency

Reply via email to