I think you can disable it with spark.shuffle.spill=false Thanks Best Regards
On Wed, Mar 18, 2015 at 3:39 PM, Darren Hoo <darren....@gmail.com> wrote: > Thanks, Shao > > On Wed, Mar 18, 2015 at 3:34 PM, Shao, Saisai <saisai.s...@intel.com> > wrote: > >> Yeah, as I said your job processing time is much larger than the >> sliding window, and streaming job is executed one by one in sequence, so >> the next job will wait until the first job is finished, so the total >> latency will be accumulated. >> >> >> >> I think you need to identify the bottleneck of your job at first. If the >> shuffle is so slow, you could enlarge the shuffle fraction of memory to >> reduce the spill, but finally the shuffle data will be written to disk, >> this cannot be disabled, unless you mount your spark.tmp.dir on ramdisk. >> >> >> > I have increased spark.shuffle.memoryFraction to 0.8 which I can see > from SparKUI's environment variables > > But spill always happens even from start when latency is less than slide > window(I changed it to 10 seconds), > the shuflle data disk written is really a snow ball effect, it slows down > eventually. > > I noticed that the files spilled to disk are all very small in size but > huge in numbers: > > total 344K > > drwxr-xr-x 2 root root 4.0K Mar 18 16:55 . > > drwxr-xr-x 66 root root 4.0K Mar 18 16:39 .. > > -rw-r--r-- 1 root root 80K Mar 18 16:54 shuffle_47_519_0.data > > -rw-r--r-- 1 root root 75K Mar 18 16:54 shuffle_48_419_0.data > > -rw-r--r-- 1 root root 36K Mar 18 16:54 shuffle_48_518_0.data > > -rw-r--r-- 1 root root 69K Mar 18 16:55 shuffle_49_319_0.data > > -rw-r--r-- 1 root root 330 Mar 18 16:55 shuffle_49_418_0.data > > -rw-r--r-- 1 root root 65K Mar 18 16:55 shuffle_49_517_0.data > > MemStore says: > > 15/03/18 17:59:43 WARN MemoryStore: Failed to reserve initial memory > threshold of 1024.0 KB for computing block rdd_1338_2 in memory. > 15/03/18 17:59:43 WARN MemoryStore: Not enough space to cache rdd_1338_2 in > memory! (computed 512.0 B so far) > 15/03/18 17:59:43 INFO MemoryStore: Memory use = 529.0 MB (blocks) + 0.0 B > (scratch space shared across 0 thread(s)) = 529.0 MB. Storage limit = 529.9 > MB. > > Not enough space even for 512 byte?? > > > The executors still has plenty free memory: > 0 slave1:40778 0 0.0 B / 529.9 MB 0.0 B 16 0 15047 15063 2.17 > h 0.0 B 402.3 MB 768.0 B > 1 slave2:50452 0 0.0 B / 529.9 MB 0.0 B 16 0 14447 14463 2.17 h 0.0 B > 388.8 MB 1248.0 B > > 1 lvs02:47325 116 27.6 MB / 529.9 MB 0.0 B 8 0 58169 58177 3.16 > h 893.5 MB 624.0 B 1189.9 MB > > <driver> lvs02:47041 0 0.0 B / 529.9 MB 0.0 B 0 0 0 0 0 ms 0.0 B > 0.0 B 0.0 B > > > Besides if CPU or network is the bottleneck, you might need to add more >> resources to your cluster. >> >> >> > 3 dedicated servers each with CPU 16 cores + 16GB memory and Gigabyte > network. > CPU load is quite low , about 1~3 from top, and network usage is far > from saturated. > > I don't even do any usefull complex calculations in this small Simple > App yet. > > >