Hi everyone,
I filed SPARK-25299 to promote discussion on how we can improve the shuffle operation in Spark. The basic premise is to discuss the ways we can leverage distributed storage to improve the reliability and isolation of Spark’s shuffle architecture. A few designs and a full problem statement are outlined in this architecture discussion document. This is a complex problem and it would be great to get feedback from the community about the right direction to take this work in. Note that we have not yet committed to a specific implementation and architecture – there’s a lot that needs to be discussed for this improvement, so we hope to get as much input as possible before moving forward with a design. Please feel free to leave comments and suggestions on the JIRA ticket or on the discussion document. Thank you! -Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature