Enabling push-based shuffle in Spark

mshen Tue, 21 Jan 2020 18:13:53 -0800

I'd like to start a discussion on enabling push-based shuffle in Spark.
This is meant to address issues with existing shuffle inefficiency in a
large-scale Spark compute infra deployment.
Facebook's previous talks on  SOS shuffle
<https://databricks.com/session/sos-optimizing-shuffle-i-o>   and  Cosco
shuffle service
<https://databricks.com/session/cosco-an-efficient-facebook-scale-shuffle-service>
  
are solutions dealing with a similar problem.
Note that this is somewhat orthogonal to the work in  SPARK-25299
<https://issues.apache.org/jira/browse/SPARK-25299>  , which is to use
remote storage to store shuffle data.
More details of our proposed design is in  SPARK-30602
<https://issues.apache.org/jira/browse/SPARK-30602>  , with SPIP attached.
Would appreciate comments and discussions from the community.




-----
Min Shen
Staff Software Engineer
LinkedIn
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Enabling push-based shuffle in Spark

Reply via email to