Re: DataFrame operation on parquet: GC overhead limit exceeded

Cheng Lian Wed, 18 Mar 2015 07:00:52 -0700

You should probably increase executor memory by setting"spark.executor.memory".

Full list of available configurations can be found herehttp://spark.apache.org/docs/latest/configuration.html


Cheng

On 3/18/15 9:15 PM, Yiannis Gkoufas wrote:

Hi there,
I was trying the new DataFrame API with some basic operations on aparquet dataset.I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in astandalone cluster mode.
The code is the following:

val people = sqlContext.parquetFile("/data.parquet");
val res =people.groupBy("name","date").agg(sum("power"),sum("supply")).take(10);
System.out.println(res);

The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded

My configuration is:

spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory    6g
spark.executor.extraJavaOptions -XX:+UseCompressedOops
spark.shuffle.manager    sort

Any idea how can I workaround this?

Thanks a lot



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: DataFrame operation on parquet: GC overhead limit exceeded

Reply via email to