Hi everyone,

 

I filed SPARK-25299 to promote discussion on how we can improve the shuffle 
operation in Spark. The basic premise is to discuss the ways we can leverage 
distributed storage to improve the reliability and isolation of Spark’s shuffle 
architecture.

 

A few designs and a full problem statement are outlined in this architecture 
discussion document.

 

This is a complex problem and it would be great to get feedback from the 
community about the right direction to take this work in. Note that we have not 
yet committed to a specific implementation and architecture – there’s a lot 
that needs to be discussed for this improvement, so we hope to get as much 
input as possible before moving forward with a design.

 

Please feel free to leave comments and suggestions on the JIRA ticket or on the 
discussion document.

 

Thank you!

 

-Matt Cheah

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to