On Wed, Mar 18, 2015 at 8:31 PM, Shao, Saisai <saisai.s...@intel.com> wrote:
> From the log you pasted I think this (-rw-r--r-- 1 root root 80K Mar > 18 16:54 shuffle_47_519_0.data) is not shuffle spilled data, but the > final shuffle result. > why the shuffle result is written to disk? > As I said, did you think shuffle is the bottleneck which makes your job > running slowly? > I am quite new to spark, So I am just doing wild guesses. which information should I provide further that can help to find the real bottleneck? Maybe you should identify the cause at first. Besides from the log it looks > your memory is not enough the cache the data, maybe you should increase the > memory size of the executor. > > > running two executors, the memory ussage is quite low: executor 0 8.6 MB / 4.1 GB executor 1 23.9 MB / 4.1 GB <driver> 0.0B / 529.9 MB submitted with args : --executor-memory 8G --num-executors 2 --driver-memory 1G