I've already done that: >From SparkUI Environment Spark properties has:
spark.shuffle.spillfalse On Wed, Mar 18, 2015 at 6:34 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > I think you can disable it with spark.shuffle.spill=false > > Thanks > Best Regards > > On Wed, Mar 18, 2015 at 3:39 PM, Darren Hoo <darren....@gmail.com> wrote: > >> Thanks, Shao >> >> On Wed, Mar 18, 2015 at 3:34 PM, Shao, Saisai <saisai.s...@intel.com> >> wrote: >> >>> Yeah, as I said your job processing time is much larger than the >>> sliding window, and streaming job is executed one by one in sequence, so >>> the next job will wait until the first job is finished, so the total >>> latency will be accumulated. >>> >>> >>> >>> I think you need to identify the bottleneck of your job at first. If the >>> shuffle is so slow, you could enlarge the shuffle fraction of memory to >>> reduce the spill, but finally the shuffle data will be written to disk, >>> this cannot be disabled, unless you mount your spark.tmp.dir on ramdisk. >>> >>> >>> >> I have increased spark.shuffle.memoryFraction to 0.8 which I can see >> from SparKUI's environment variables >> >> But spill always happens even from start when latency is less than slide >> window(I changed it to 10 seconds), >> the shuflle data disk written is really a snow ball effect, it slows >> down eventually. >> >> I noticed that the files spilled to disk are all very small in size but >> huge in numbers: >> >> total 344K >> >> drwxr-xr-x 2 root root 4.0K Mar 18 16:55 . >> >> drwxr-xr-x 66 root root 4.0K Mar 18 16:39 .. >> >> -rw-r--r-- 1 root root 80K Mar 18 16:54 shuffle_47_519_0.data >> >> -rw-r--r-- 1 root root 75K Mar 18 16:54 shuffle_48_419_0.data >> >> -rw-r--r-- 1 root root 36K Mar 18 16:54 shuffle_48_518_0.data >> >> -rw-r--r-- 1 root root 69K Mar 18 16:55 shuffle_49_319_0.data >> >> -rw-r--r-- 1 root root 330 Mar 18 16:55 shuffle_49_418_0.data >> >> -rw-r--r-- 1 root root 65K Mar 18 16:55 shuffle_49_517_0.data >> >> MemStore says: >> >> 15/03/18 17:59:43 WARN MemoryStore: Failed to reserve initial memory >> threshold of 1024.0 KB for computing block rdd_1338_2 in memory. >> 15/03/18 17:59:43 WARN MemoryStore: Not enough space to cache rdd_1338_2 in >> memory! (computed 512.0 B so far) >> 15/03/18 17:59:43 INFO MemoryStore: Memory use = 529.0 MB (blocks) + 0.0 B >> (scratch space shared across 0 thread(s)) = 529.0 MB. Storage limit = 529.9 >> MB. >> >> Not enough space even for 512 byte?? >> >> >> The executors still has plenty free memory: >> 0 slave1:40778 0 0.0 B / 529.9 MB 0.0 B 16 0 15047 15063 2.17 >> h 0.0 B 402.3 MB 768.0 B >> 1 slave2:50452 0 0.0 B / 529.9 MB 0.0 B 16 0 14447 14463 2.17 h 0.0 B >> 388.8 MB 1248.0 B >> >> 1 lvs02:47325 116 27.6 MB / 529.9 MB 0.0 B 8 0 58169 58177 3.16 >> h 893.5 MB 624.0 B 1189.9 MB >> >> <driver> lvs02:47041 0 0.0 B / 529.9 MB 0.0 B 0 0 0 0 0 ms 0.0 B >> 0.0 B 0.0 B >> >> >> Besides if CPU or network is the bottleneck, you might need to add more >>> resources to your cluster. >>> >>> >>> >> 3 dedicated servers each with CPU 16 cores + 16GB memory and Gigabyte >> network. >> CPU load is quite low , about 1~3 from top, and network usage is far >> from saturated. >> >> I don't even do any usefull complex calculations in this small Simple >> App yet. >> >> >> >