>export SPARK_WORKER_MEMORY=4g
May be you could increase the max heapsize on the worker? In case if the
OutOfMemory is for the driver, then you may want to set it up explicitly
for the driver.

Thanks,



On Tue, Jan 12, 2016 at 2:04 AM, Barak Yaish <barak.ya...@gmail.com> wrote:

> Hello,
>
> I've a 5 nodes cluster which hosts both hdfs datanodes and spark workers.
> Each node has 8 cpu and 16G memory. Spark version is 1.5.2, spark-env.sh is
> as follow:
>
> export SPARK_MASTER_IP=10.52.39.92
>
> export SPARK_WORKER_INSTANCES=4
>
> export SPARK_WORKER_CORES=8
> export SPARK_WORKER_MEMORY=4g
>
> And more settings done in the application code:
>
>
> sparkConf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer");
>
> sparkConf.set("spark.kryo.registrator",InternalKryoRegistrator.class.getName());
> sparkConf.set("spark.kryo.registrationRequired","true");
> sparkConf.set("spark.kryoserializer.buffer.max.mb","512");
> sparkConf.set("spark.default.parallelism","300");
> sparkConf.set("spark.rpc.askTimeout","500");
>
> I'm trying to load data from hdfs and running some sqls on it (mostly
> groupby) using DataFrames. The logs keep saying that tasks are lost due to
> OutOfMemoryError (GC overhead limit exceeded).
>
> Can you advice what is the recommended settings (memory, cores,
> partitions, etc.) for the given hardware?
>
> Thanks!
>

Reply via email to