Re: Spark-optimized Shuffle (SOS) any update?

2019-01-02 Thread marek-simunek
Hi, thanks for reply. I finally got time and glanced through the design doc. It seems that it has nothing to do with the paper I mentioned. The paper is trying to solve the problem of I/O ops required for shuffle are growing quadratically with number of tasks (shuffle files), therefore we

Re: Spark-optimized Shuffle (SOS) any update?

2018-12-19 Thread Ilan Filonenko
Recently, the community has actively been working on this. The JIRA to follow is: https://issues.apache.org/jira/browse/SPARK-25299. A group of various companies including Bloomberg and Palantir are in the works of a WIP solution that implements a varied version of Option #5 (which is elaborated

Spark-optimized Shuffle (SOS) any update?

2018-12-19 Thread marek-simunek
Hi everyone,     we are facing same problems as Facebook had, where shuffle service is a bottleneck. For now we solved that with large task size (2g) to reduce shuffle I/O. I saw very nice presentation from Brian Cho on Optimizing shuffle I/O at large scale[1]. It is a implementation of white