Hi everyone,

    we are facing same problems as Facebook had, where shuffle service is a
bottleneck. For now we solved that with large task size (2g) to reduce
shuffle I/O.

I saw very nice presentation from Brian Cho on Optimizing shuffle I/O at 
large scale[1]. It is a implementation of white paper[2].
Brian Cho at the end of the lecture kindly mentioned about plans to
contribute it back to Spark[3]. I checked mailing list and spark JIRA and 
didn't find any ticket on this topic.

Please, does anyone has a contact on someone from Facebook who could know 
more about this? Or are there some plans to bring similar optimization to 
Spark?

[1] https://databricks.com/session/sos-optimizing-shuffle-i-o
[2] https://haoyuzhang.org/publications/riffle-eurosys18.pdf
[3] https://image.slidesharecdn.com/5brianchoerginseyfe-180613004126/95/sos-
optimizing-shuffle-io-with-brian-cho-and-ergin-seyfe-30-638.jpg?cb=
1528850545

Reply via email to