Please see the inline comments.

Thanks
Jerry

From: Darren Hoo [mailto:darren....@gmail.com]
Sent: Wednesday, March 18, 2015 9:30 PM
To: Shao, Saisai
Cc: user@spark.apache.org; Akhil Das
Subject: Re: [spark-streaming] can shuffle write to disk be disabled?



On Wed, Mar 18, 2015 at 8:31 PM, Shao, Saisai 
<saisai.s...@intel.com<mailto:saisai.s...@intel.com>> wrote:

>From the log you pasted I think this (-rw-r--r--  1 root root  80K Mar 18 
>16:54 shuffle_47_519_0.data) is not shuffle spilled data, but the final 
>shuffle result.

why the shuffle result  is written to disk?

This is the internal mechanism for Spark.



As I said, did you think shuffle is the bottleneck which makes your job running 
slowly?

I am quite new to spark, So I am just doing wild guesses. which information 
should I provide further that
can help to find the real bottleneck?

You can monitor the system metrics, as well as JVM, also information from web 
UI is very useful.



Maybe you should identify the cause at first. Besides from the log it looks 
your memory is not enough the cache the data, maybe you should increase the 
memory size of the executor.



 running two executors, the memory ussage is quite low:

executor 0  8.6 MB / 4.1 GB
executor 1  23.9 MB / 4.1 GB
<driver>     0.0B / 529.9 MB


submitted with args : --executor-memory 8G  --num-executors 2 --driver-memory 1G


Reply via email to