You should probably increase executor memory by setting
"spark.executor.memory".
Full list of available configurations can be found here
http://spark.apache.org/docs/latest/configuration.html
Cheng
On 3/18/15 9:15 PM, Yiannis Gkoufas wrote:
Hi there,
I was trying the new DataFrame API with some basic operations on a
parquet dataset.
I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in a
standalone cluster mode.
The code is the following:
val people = sqlContext.parquetFile("/data.parquet");
val res =
people.groupBy("name","date").agg(sum("power"),sum("supply")).take(10);
System.out.println(res);
The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded
My configuration is:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 6g
spark.executor.extraJavaOptions -XX:+UseCompressedOops
spark.shuffle.manager sort
Any idea how can I workaround this?
Thanks a lot
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org