Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

Reynold Xin Fri, 01 Apr 2016 15:45:54 -0700

It's spark.local.dir.


On Fri, Apr 1, 2016 at 3:37 PM, Yong Zhang <[email protected]> wrote:

> Is there a configuration in the Spark of location of "shuffle spilling"? I
> didn't recall ever see that one. Can you share it out?
>
> It will be good for a test writing to RAM Disk if that configuration is
> available.
>
> Thanks
>
> Yong
>
> ------------------------------
> From: [email protected]
> Date: Fri, 1 Apr 2016 15:32:23 -0700
> Subject: Re: Eliminating shuffle write and spill disk IO reads/writes in
> Spark
> To: [email protected]
> CC: [email protected]; [email protected]; [email protected]
>
>
> Michael - I'm not sure if you actually read my email, but spill has
> nothing to do with the shuffle files on disk. It was for the partitioning
> (i.e. sorting) process. If that flag is off, Spark will just run out of
> memory when data doesn't fit in memory.
>
>
> On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch <[email protected]>
> wrote:
>
> RAMdisk is a fine interim step but there is a lot of layers eliminated by
> keeping things in memory unless there is need for spillover.   At one time
> there was support for turning off spilling.  That was eliminated.  Why?
>
>
> On Fri, Apr 1, 2016, 6:05 PM Mridul Muralidharan <[email protected]> wrote:
>
> I think Reynold's suggestion of using ram disk would be a good way to
> test if these are the bottlenecks or something else is.
> For most practical purposes, pointing local dir to ramdisk should
> effectively give you 'similar' performance as shuffling from memory.
>
> Are there concerns with taking that approach to test ? (I dont see
> any, but I am not sure if I missed something).
>
>
> Regards,
> Mridul
>
>
>
>
> On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch <[email protected]>
> wrote:
> > I totally disagree that it’s not a problem.
> >
> > - Network fetch throughput on 40G Ethernet exceeds the throughput of NVME
> > drives.
> > - What Spark is depending on is Linux’s IO cache as an effective buffer
> pool
> > This is fine for small jobs but not for jobs with datasets in the TB/node
> > range.
> > - On larger jobs flushing the cache causes Linux to block.
> > - On a modern 56-hyperthread 2-socket host the latency caused by multiple
> > executors writing out to disk increases greatly.
> >
> > I thought the whole point of Spark was in-memory computing?  It’s in fact
> > in-memory for some things but  use spark.local.dir as a buffer pool of
> > others.
> >
> > Hence, the performance of  Spark is gated by the performance of
> > spark.local.dir, even on large memory systems.
> >
> > "Currently it is not possible to not write shuffle files to disk.”
> >
> > What changes >would< make it possible?
> >
> > The only one that seems possible is to clone the shuffle service and
> make it
> > in-memory.
> >
> >
> >
> >
> >
> > On Apr 1, 2016, at 4:57 PM, Reynold Xin <[email protected]> wrote:
> >
> > spark.shuffle.spill actually has nothing to do with whether we write
> shuffle
> > files to disk. Currently it is not possible to not write shuffle files to
> > disk, and typically it is not a problem because the network fetch
> throughput
> > is lower than what disks can sustain. In most cases, especially with
> SSDs,
> > there is little difference between putting all of those in memory and on
> > disk.
> >
> > However, it is becoming more common to run Spark on a few number of beefy
> > nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into
> improving
> > performance for those. Meantime, you can setup local ramdisks on each
> node
> > for shuffle writes.
> >
> >
> >
> > On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <[email protected]>
> > wrote:
> >>
> >> Hello;
> >>
> >> I’m working on spark with very large memory systems (2TB+) and notice
> that
> >> Spark spills to disk in shuffle.  Is there a way to force spark to stay
> in
> >> memory when doing shuffle operations?   The goal is to keep the shuffle
> data
> >> either in the heap or in off-heap memory (in 1.6.x) and never touch the
> IO
> >> subsystem.  I am willing to have the job fail if it runs out of RAM.
> >>
> >> spark.shuffle.spill true  is deprecated in 1.6 and does not work in
> >> Tungsten sort in 1.5.x
> >>
> >> "WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but
> this
> >> is ignored by the tungsten-sort shuffle manager; its optimized shuffles
> will
> >> continue to spill to disk when necessary.”
> >>
> >> If this is impossible via configuration changes what code changes would
> be
> >> needed to accomplish this?
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
> >
>
> --
> Michael Slavitch
> 62 Renfrew Ave.
> Ottawa Ontario
> K1S 1Z5
>
>
>

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

Reply via email to