RE: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

Sun, Rui Sat, 07 Nov 2015 00:04:02 -0800

This is probably because your config option actually do not take effect. Please 
refer to the email thread titled “How to set memory for SparkR with 
master="local[*]"”, which may answer you.


I recommend you to try to use SparkR built from the master branch, which 
contains two fixes that may help you in your use case:
https://issues.apache.org/jira/browse/SPARK-11340
https://issues.apache.org/jira/browse/SPARK-11258

BTW, it seems that there is a config conflict in your settings?
spark.driver.memory="30g",
spark.driver.extraJavaOptions="-Xms5g -Xmx5g


From: Dhaval Patel [mailto:dhaval1...@gmail.com]
Sent: Saturday, November 7, 2015 12:26 AM
To: Spark User Group
Subject: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit 
exceeded

I have been struggling through this error since past 3 days and have tried all 
possible ways/suggestions people have provided on stackoverflow and here in 
this group.

I am trying to read a parquet file using sparkR and convert it into an R 
dataframe for further usage. The file size is not that big, ~4G and 250 mil 
records.

My standalone cluster has more than enough memory and processing power : 24 
core, 128 GB RAM. Here is configuration to give an idea:

Tried this on both spark 1.4.1 and 1.5.1.  I have attached both stack 
traces/logs. Parquet file has 24 partitions.

spark.default.confs=list(spark.cores.max="24",
                         spark.executor.memory="50g",
                         spark.driver.memory="30g",
                         spark.driver.extraJavaOptions="-Xms5g -Xmx5g 
-XX:MaxPermSize=1024M")
sc <- sparkR.init(master="local[24]",sparkEnvir = spark.default.confs)
.......
........ reading parquet file and storing in R dataframe
med.Rdf <- collect(mednew.DF)

--- Begin Message ---

Hi, Matej,

For the convenience of SparkR users, when they start SparkR without using 
bin/sparkR, (for example, in RStudio), 
https://issues.apache.org/jira/browse/SPARK-11340 enables setting of 
“spark.driver.memory”, (also other similar options, like: 
spark.driver.extraClassPath, spark.driver.extraJavaOptions, 
spark.driver.extraLibraryPath) in the sparkEnvir parameter for sparkR.init() to 
take effect.

Would you like to give it a try? Note the change is on the master branch, you 
have to build Spark from source before using it.


From: Sun, Rui [mailto:rui....@intel.com]
Sent: Monday, October 26, 2015 10:24 AM
To: Dirceu Semighini Filho
Cc: user
Subject: RE: How to set memory for SparkR with master="local[*]"

As documented in 
http://spark.apache.org/docs/latest/configuration.html#available-properties,
Note for “spark.driver.memory”:
Note: In client mode, this config must not be set through the SparkConf 
directly in your application, because the driver JVM has already started at 
that point. Instead, please set this through the --driver-memory command line 
option or in your default properties file.

If you are to start a SparkR shell using bin/sparkR, then you can use 
bin/sparkR –driver-memory. You have no chance to set the driver memory size 
after the R shell has been launched via bin/sparkR.

Buf if you are to start a SparkR shell manually without using bin/sparkR (for 
example, in Rstudio), you can:
library(SparkR)
Sys.setenv("SPARKR_SUBMIT_ARGS" = "--conf spark.driver.memory=2g sparkr-shell")
sc <- sparkR.init()

From: Dirceu Semighini Filho [mailto:dirceu.semigh...@gmail.com]
Sent: Friday, October 23, 2015 7:53 PM
Cc: user
Subject: Re: How to set memory for SparkR with master="local[*]"

Hi Matej,
I'm also using this and I'm having the same behavior here, my driver has only 
530mb which is the default value.

Maybe this is a bug.

2015-10-23 9:43 GMT-02:00 Matej Holec 
<hol...@gmail.com<mailto:hol...@gmail.com>>:
Hello!

How to adjust the memory settings properly for SparkR with master="local[*]"
in R?


*When running from  R -- SparkR doesn't accept memory settings :(*

I use the following commands:

R>  library(SparkR)
R>  sc <- sparkR.init(master = "local[*]", sparkEnvir =
list(spark.driver.memory = "5g"))

Despite the variable spark.driver.memory is correctly set (checked in
http://node:4040/environment/), the driver has only the default amount of
memory allocated (Storage Memory 530.3 MB).

*But when running from  spark-1.5.1-bin-hadoop2.6/bin/sparkR -- OK*

The following command:

]$ spark-1.5.1-bin-hadoop2.6/bin/sparkR --driver-memory 5g

creates SparkR session with properly adjustest driver memory (Storage Memory
2.6 GB).


Any suggestion?

Thanks
Matej



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-memory-for-SparkR-with-master-local-tp25178.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

--- End Message ---

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

Reply via email to