Re: better compression codecs for shuffle blocks?

Reynold Xin Mon, 14 Jul 2014 16:10:21 -0700

Copying Jon here since he worked on the lzf library at Ning.

Jon - any comments on this topic?



On Mon, Jul 14, 2014 at 3:54 PM, Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> You can actually turn off shuffle compression by setting
> spark.shuffle.compress to false. Try that out, there will still be some
> buffers for the various OutputStreams, but they should be smaller.
>
> Matei
>
> On Jul 14, 2014, at 3:30 PM, Stephen Haberman <stephen.haber...@gmail.com>
> wrote:
>
> >
> > Just a comment from the peanut gallery, but these buffers are a real
> > PITA for us as well. Probably 75% of our non-user-error job failures
> > are related to them.
> >
> > Just naively, what about not doing compression on the fly? E.g. during
> > the shuffle just write straight to disk, uncompressed?
> >
> > For us, we always have plenty of disk space, and if you're concerned
> > about network transmission, you could add a separate compress step
> > after the blocks have been written to disk, but before being sent over
> > the wire.
> >
> > Granted, IANAE, so perhaps this is a bad idea; either way, awesome to
> > see work in this area!
> >
> > - Stephen
> >
>
>

Re: better compression codecs for shuffle blocks?

Reply via email to