On Wed, Mar 18, 2015 at 8:31 PM, Shao, Saisai <saisai.s...@intel.com> wrote:

>  From the log you pasted I think this (-rw-r--r--  1 root root  80K Mar
> 18 16:54 shuffle_47_519_0.data) is not shuffle spilled data, but the
> final shuffle result.
>

why the shuffle result  is written to disk?


> As I said, did you think shuffle is the bottleneck which makes your job
> running slowly?
>

I am quite new to spark, So I am just doing wild guesses. which information
should I provide further that
can help to find the real bottleneck?

Maybe you should identify the cause at first. Besides from the log it looks
> your memory is not enough the cache the data, maybe you should increase the
> memory size of the executor.
>
>
>

 running two executors, the memory ussage is quite low:

executor 0  8.6 MB / 4.1 GB
executor 1  23.9 MB / 4.1 GB
<driver>     0.0B / 529.9 MB

submitted with args : --executor-memory 8G  --num-executors 2
--driver-memory 1G

Reply via email to