Copying Jon here since he worked on the lzf library at Ning. Jon - any comments on this topic?
On Mon, Jul 14, 2014 at 3:54 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > You can actually turn off shuffle compression by setting > spark.shuffle.compress to false. Try that out, there will still be some > buffers for the various OutputStreams, but they should be smaller. > > Matei > > On Jul 14, 2014, at 3:30 PM, Stephen Haberman <stephen.haber...@gmail.com> > wrote: > > > > > Just a comment from the peanut gallery, but these buffers are a real > > PITA for us as well. Probably 75% of our non-user-error job failures > > are related to them. > > > > Just naively, what about not doing compression on the fly? E.g. during > > the shuffle just write straight to disk, uncompressed? > > > > For us, we always have plenty of disk space, and if you're concerned > > about network transmission, you could add a separate compress step > > after the blocks have been written to disk, but before being sent over > > the wire. > > > > Granted, IANAE, so perhaps this is a bad idea; either way, awesome to > > see work in this area! > > > > - Stephen > > > >