You should probably increase executor memory by setting "spark.executor.memory".

Full list of available configurations can be found here http://spark.apache.org/docs/latest/configuration.html

Cheng

On 3/18/15 9:15 PM, Yiannis Gkoufas wrote:
Hi there,

I was trying the new DataFrame API with some basic operations on a parquet dataset. I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in a standalone cluster mode.
The code is the following:

val people = sqlContext.parquetFile("/data.parquet");
val res = people.groupBy("name","date").agg(sum("power"),sum("supply")).take(10);
System.out.println(res);

The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded

My configuration is:

spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory    6g
spark.executor.extraJavaOptions -XX:+UseCompressedOops
spark.shuffle.manager    sort

Any idea how can I workaround this?

Thanks a lot


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to