[Feedback Requested] SPARK-25299: Using Distributed Storage for Persisting Shuffle Data

Matt Cheah Fri, 31 Aug 2018 17:42:27 -0700

Hi everyone,


I filed SPARK-25299 to promote discussion on how we can improve the shuffle 
operation in Spark. The basic premise is to discuss the ways we can leverage 
distributed storage to improve the reliability and isolation of Spark’s shuffle 
architecture.

 

A few designs and a full problem statement are outlined in this architecture 
discussion document.

 

This is a complex problem and it would be great to get feedback from the 
community about the right direction to take this work in. Note that we have not 
yet committed to a specific implementation and architecture – there’s a lot 
that needs to be discussed for this improvement, so we hope to get as much 
input as possible before moving forward with a design.

 

Please feel free to leave comments and suggestions on the JIRA ticket or on the 
discussion document.

 

Thank you!

 

-Matt Cheah

smime.p7s
Description: S/MIME cryptographic signature

[Feedback Requested] SPARK-25299: Using Distributed Storage for Persisting Shuffle Data

Reply via email to