Yes we see it on final write. Our preference is to eliminate this. On Fri, Apr 1, 2016, 7:25 PM Saisai Shao <sai.sai.s...@gmail.com> wrote:
> Hi Michael, shuffle data (mapper output) have to be materialized into disk > finally, no matter how large memory you have, it is the design purpose of > Spark. In you scenario, since you have a big memory, shuffle spill should > not happen frequently, most of the disk IO you see might be final shuffle > file write. > > So if you want to avoid this disk IO, you could use ramdisk as Reynold > suggested. If you want to avoid FS overhead of ramdisk, you could try to > hack a new shuffle implementation, since shuffle framework is pluggable. > > > On Sat, Apr 2, 2016 at 6:48 AM, Michael Slavitch <slavi...@gmail.com> > wrote: > >> As I mentioned earlier this flag is now ignored. >> >> >> On Fri, Apr 1, 2016, 6:39 PM Michael Slavitch <slavi...@gmail.com> wrote: >> >>> Shuffling a 1tb set of keys and values (aka sort by key) results in >>> about 500gb of io to disk if compression is enabled. Is there any way to >>> eliminate shuffling causing io? >>> >>> On Fri, Apr 1, 2016, 6:32 PM Reynold Xin <r...@databricks.com> wrote: >>> >>>> Michael - I'm not sure if you actually read my email, but spill has >>>> nothing to do with the shuffle files on disk. It was for the partitioning >>>> (i.e. sorting) process. If that flag is off, Spark will just run out of >>>> memory when data doesn't fit in memory. >>>> >>>> >>>> On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch <slavi...@gmail.com> >>>> wrote: >>>> >>>>> RAMdisk is a fine interim step but there is a lot of layers eliminated >>>>> by keeping things in memory unless there is need for spillover. At one >>>>> time there was support for turning off spilling. That was eliminated. >>>>> Why? >>>>> >>>>> >>>>> On Fri, Apr 1, 2016, 6:05 PM Mridul Muralidharan <mri...@gmail.com> >>>>> wrote: >>>>> >>>>>> I think Reynold's suggestion of using ram disk would be a good way to >>>>>> test if these are the bottlenecks or something else is. >>>>>> For most practical purposes, pointing local dir to ramdisk should >>>>>> effectively give you 'similar' performance as shuffling from memory. >>>>>> >>>>>> Are there concerns with taking that approach to test ? (I dont see >>>>>> any, but I am not sure if I missed something). >>>>>> >>>>>> >>>>>> Regards, >>>>>> Mridul >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch <slavi...@gmail.com> >>>>>> wrote: >>>>>> > I totally disagree that it’s not a problem. >>>>>> > >>>>>> > - Network fetch throughput on 40G Ethernet exceeds the throughput >>>>>> of NVME >>>>>> > drives. >>>>>> > - What Spark is depending on is Linux’s IO cache as an effective >>>>>> buffer pool >>>>>> > This is fine for small jobs but not for jobs with datasets in the >>>>>> TB/node >>>>>> > range. >>>>>> > - On larger jobs flushing the cache causes Linux to block. >>>>>> > - On a modern 56-hyperthread 2-socket host the latency caused by >>>>>> multiple >>>>>> > executors writing out to disk increases greatly. >>>>>> > >>>>>> > I thought the whole point of Spark was in-memory computing? It’s >>>>>> in fact >>>>>> > in-memory for some things but use spark.local.dir as a buffer pool >>>>>> of >>>>>> > others. >>>>>> > >>>>>> > Hence, the performance of Spark is gated by the performance of >>>>>> > spark.local.dir, even on large memory systems. >>>>>> > >>>>>> > "Currently it is not possible to not write shuffle files to disk.” >>>>>> > >>>>>> > What changes >would< make it possible? >>>>>> > >>>>>> > The only one that seems possible is to clone the shuffle service >>>>>> and make it >>>>>> > in-memory. >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Apr 1, 2016, at 4:57 PM, Reynold Xin <r...@databricks.com> >>>>>> wrote: >>>>>> > >>>>>> > spark.shuffle.spill actually has nothing to do with whether we >>>>>> write shuffle >>>>>> > files to disk. Currently it is not possible to not write shuffle >>>>>> files to >>>>>> > disk, and typically it is not a problem because the network fetch >>>>>> throughput >>>>>> > is lower than what disks can sustain. In most cases, especially >>>>>> with SSDs, >>>>>> > there is little difference between putting all of those in memory >>>>>> and on >>>>>> > disk. >>>>>> > >>>>>> > However, it is becoming more common to run Spark on a few number of >>>>>> beefy >>>>>> > nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into >>>>>> improving >>>>>> > performance for those. Meantime, you can setup local ramdisks on >>>>>> each node >>>>>> > for shuffle writes. >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch < >>>>>> slavi...@gmail.com> >>>>>> > wrote: >>>>>> >> >>>>>> >> Hello; >>>>>> >> >>>>>> >> I’m working on spark with very large memory systems (2TB+) and >>>>>> notice that >>>>>> >> Spark spills to disk in shuffle. Is there a way to force spark to >>>>>> stay in >>>>>> >> memory when doing shuffle operations? The goal is to keep the >>>>>> shuffle data >>>>>> >> either in the heap or in off-heap memory (in 1.6.x) and never >>>>>> touch the IO >>>>>> >> subsystem. I am willing to have the job fail if it runs out of >>>>>> RAM. >>>>>> >> >>>>>> >> spark.shuffle.spill true is deprecated in 1.6 and does not work in >>>>>> >> Tungsten sort in 1.5.x >>>>>> >> >>>>>> >> "WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, >>>>>> but this >>>>>> >> is ignored by the tungsten-sort shuffle manager; its optimized >>>>>> shuffles will >>>>>> >> continue to spill to disk when necessary.” >>>>>> >> >>>>>> >> If this is impossible via configuration changes what code changes >>>>>> would be >>>>>> >> needed to accomplish this? >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> --------------------------------------------------------------------- >>>>>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> >> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >> >>>>>> > >>>>>> > >>>>>> >>>>> -- >>>>> Michael Slavitch >>>>> 62 Renfrew Ave. >>>>> Ottawa Ontario >>>>> K1S 1Z5 >>>>> >>>> >>>> -- >>> Michael Slavitch >>> 62 Renfrew Ave. >>> Ottawa Ontario >>> K1S 1Z5 >>> >> -- >> Michael Slavitch >> 62 Renfrew Ave. >> Ottawa Ontario >> K1S 1Z5 >> > > -- Michael Slavitch 62 Renfrew Ave. Ottawa Ontario K1S 1Z5