I'd like to start a discussion on enabling push-based shuffle in Spark. This is meant to address issues with existing shuffle inefficiency in a large-scale Spark compute infra deployment. Facebook's previous talks on SOS shuffle <https://databricks.com/session/sos-optimizing-shuffle-i-o> and Cosco shuffle service <https://databricks.com/session/cosco-an-efficient-facebook-scale-shuffle-service> are solutions dealing with a similar problem. Note that this is somewhat orthogonal to the work in SPARK-25299 <https://issues.apache.org/jira/browse/SPARK-25299> , which is to use remote storage to store shuffle data. More details of our proposed design is in SPARK-30602 <https://issues.apache.org/jira/browse/SPARK-30602> , with SPIP attached. Would appreciate comments and discussions from the community.
----- Min Shen Staff Software Engineer LinkedIn -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org