Hi everyone,

A working group in the community have been having ongoing discussions regarding 
how we can allow for flexible storage solutions for shuffle data that is 
compatible with containerized systems, more resilient to node failures, and can 
support disaggregated storage architectures.

One of the core challenges we have been trying to overcome is navigating the 
space of shuffle metadata tracking, and reasoning about how we approach 
recomputing lost shuffle blocks in the case when the shuffle file storage 
system is not resilient.

I have written a design document on the subject, and a proposed set of APIs to 
fix it. These should be considered as part of the APIs for 
SPARK-25299<https://issues.apache.org/jira/browse/SPARK-25299>. Once we have 
reached some common conclusion on the proper APIs to build, I can modify the 
original SPARK-25299 SPIP to reflect the choices we’ve made. But I wanted to 
write more extensively on this topic separately to encourage focused discussion 
on this subset of the problem space.

You can find the design document here: 
https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit?usp=sharing

If you would like to catch up on the discussions we have had so far, I give 
some background to the subject matter and have linked to other relevant design 
documents and discussion threads in this document.

Feedback is definitely appreciated – I acknowledge that this is a fairly 
complex space with lots of potential viable options, so I’m looking forward to 
engaging with dialogue moving forward.

Thanks!

-Matt Cheah

Reply via email to