I think you can disable it with spark.shuffle.spill=false

Thanks
Best Regards

On Wed, Mar 18, 2015 at 3:39 PM, Darren Hoo <darren....@gmail.com> wrote:

> Thanks, Shao
>
> On Wed, Mar 18, 2015 at 3:34 PM, Shao, Saisai <saisai.s...@intel.com>
> wrote:
>
>>  Yeah, as I said your job processing time is much larger than the
>> sliding window, and streaming job is executed one by one in sequence, so
>> the next job will wait until the first job is finished, so the total
>> latency will be accumulated.
>>
>>
>>
>> I think you need to identify the bottleneck of your job at first. If the
>> shuffle is so slow, you could enlarge the shuffle fraction of memory to
>> reduce the spill, but finally the shuffle data will be written to disk,
>> this cannot be disabled, unless you mount your spark.tmp.dir on ramdisk.
>>
>>
>>
> I have increased spark.shuffle.memoryFraction  to  0.8  which I can see
> from SparKUI's environment variables
>
> But spill  always happens even from start when latency is less than slide
> window(I changed it to 10 seconds),
> the shuflle data disk written is really a snow ball effect,  it slows down
> eventually.
>
> I noticed that the files spilled to disk are all very small in size but
> huge in numbers:
>
> total 344K
>
> drwxr-xr-x  2 root root 4.0K Mar 18 16:55 .
>
> drwxr-xr-x 66 root root 4.0K Mar 18 16:39 ..
>
> -rw-r--r--  1 root root  80K Mar 18 16:54 shuffle_47_519_0.data
>
> -rw-r--r--  1 root root  75K Mar 18 16:54 shuffle_48_419_0.data
>
> -rw-r--r--  1 root root  36K Mar 18 16:54 shuffle_48_518_0.data
>
> -rw-r--r--  1 root root  69K Mar 18 16:55 shuffle_49_319_0.data
>
> -rw-r--r--  1 root root  330 Mar 18 16:55 shuffle_49_418_0.data
>
> -rw-r--r--  1 root root  65K Mar 18 16:55 shuffle_49_517_0.data
>
> MemStore says:
>
> 15/03/18 17:59:43 WARN MemoryStore: Failed to reserve initial memory 
> threshold of 1024.0 KB for computing block rdd_1338_2 in memory.
> 15/03/18 17:59:43 WARN MemoryStore: Not enough space to cache rdd_1338_2 in 
> memory! (computed 512.0 B so far)
> 15/03/18 17:59:43 INFO MemoryStore: Memory use = 529.0 MB (blocks) + 0.0 B 
> (scratch space shared across 0 thread(s)) = 529.0 MB. Storage limit = 529.9 
> MB.
>
> Not enough space even for 512 byte??
>
>
> The executors still has plenty free memory:
> 0        slave1:40778 0       0.0 B / 529.9 MB  0.0 B 16 0 15047 15063 2.17
> h  0.0 B  402.3 MB  768.0 B
> 1 slave2:50452 0 0.0 B / 529.9 MB  0.0 B 16 0 14447 14463 2.17 h  0.0 B
> 388.8 MB  1248.0 B
>
>     1 lvs02:47325        116 27.6 MB / 529.9 MB  0.0 B 8 0 58169 58177 3.16
> h  893.5 MB  624.0 B  1189.9 MB
>
>     <driver> lvs02:47041 0 0.0 B / 529.9 MB  0.0 B 0 0 0 0 0 ms  0.0 B
> 0.0 B  0.0 B
>
>
> Besides if CPU or network is the bottleneck, you might need to add more
>> resources to your cluster.
>>
>>
>>
>  3 dedicated servers each with CPU 16 cores + 16GB memory and Gigabyte
> network.
>  CPU load is quite low , about 1~3 from top,  and network usage  is far
> from saturated.
>
>  I don't even  do any usefull complex calculations in this small Simple
> App yet.
>
>
>

Reply via email to