Thanks a lot!
I just realize the spark is not a really in-memory version of mapreduce J
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Tuesday, January 13, 2015 3:53 PM
To: Shuai Zheng
Cc: user@spark.apache.org
Subject: Re: Why always spilling to disk and how to improve it?
You could try setting the following to tweak the application a little bit:
.set(spark.rdd.compress,true)
.set(spark.storage.memoryFraction, 1)
.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
For shuffle behavior, you can look at this document