RE: Why always spilling to disk and how to improve it?

2015-01-14 Thread Shuai Zheng
Thanks a lot! I just realize the spark is not a really in-memory version of mapreduce J From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Tuesday, January 13, 2015 3:53 PM To: Shuai Zheng Cc: user@spark.apache.org Subject: Re: Why always spilling to disk and how to improve it?

Re: Why always spilling to disk and how to improve it?

2015-01-13 Thread Akhil Das
You could try setting the following to tweak the application a little bit: .set(spark.rdd.compress,true) .set(spark.storage.memoryFraction, 1) .set(spark.serializer, org.apache.spark.serializer.KryoSerializer) For shuffle behavior, you can look at this document