AFAIK, there are two places where you can specify the driver memory. One is via spark-summit --driver-memory and the other is via spark.driver.memory in spark-defaults.conf. Please try these approaches and see whether they work or not. You can find detailed instructions at http://spark.apache.org/docs/latest/configuration.html and http://spark.apache.org/docs/latest/submitting-applications.html. -Xiangrui
On Tue, Apr 28, 2015 at 4:06 AM, Rok Roskar <rokros...@gmail.com> wrote: > That's exactly what I'm saying -- I specify the memory options using spark > options, but this is not reflected in how the JVM is created. No matter > which memory settings I specify, the JVM for the driver is always made with > 512Mb of memory. So I'm not sure if this is a feature or a bug? > > rok > > On Mon, Apr 27, 2015 at 6:54 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> You might need to specify driver memory in spark-submit instead of >> passing JVM options. spark-submit is designed to handle different >> deployments correctly. -Xiangrui >> >> On Thu, Apr 23, 2015 at 4:58 AM, Rok Roskar <rokros...@gmail.com> wrote: >> > ok yes, I think I have narrowed it down to being a problem with driver >> > memory settings. It looks like the application master/driver is not >> > being >> > launched with the settings specified: >> > >> > For the driver process on the main node I see "-XX:MaxPermSize=128m >> > -Xms512m >> > -Xmx512m" as options used to start the JVM, even though I specified >> > >> > 'spark.yarn.am.memory', '5g' >> > 'spark.yarn.am.memoryOverhead', '2000' >> > >> > The info shows that these options were read: >> > >> > 15/04/23 13:47:47 INFO yarn.Client: Will allocate AM container, with >> > 7120 MB >> > memory including 2000 MB overhead >> > >> > Is there some reason why these options are being ignored and instead >> > starting the driver with just 512Mb of heap? >> > >> > On Thu, Apr 23, 2015 at 8:06 AM, Rok Roskar <rokros...@gmail.com> wrote: >> >> >> >> the feature dimension is 800k. >> >> >> >> yes, I believe the driver memory is likely the problem since it doesn't >> >> crash until the very last part of the tree aggregation. >> >> >> >> I'm running it via pyspark through YARN -- I have to run in client mode >> >> so >> >> I can't set spark.driver.memory -- I've tried setting the >> >> spark.yarn.am.memory and overhead parameters but it doesn't seem to >> >> have an >> >> effect. >> >> >> >> Thanks, >> >> >> >> Rok >> >> >> >> On Apr 23, 2015, at 7:47 AM, Xiangrui Meng <men...@gmail.com> wrote: >> >> >> >> > What is the feature dimension? Did you set the driver memory? >> >> > -Xiangrui >> >> > >> >> > On Tue, Apr 21, 2015 at 6:59 AM, rok <rokros...@gmail.com> wrote: >> >> >> I'm trying to use the StandardScaler in pyspark on a relatively >> >> >> small >> >> >> (a few >> >> >> hundred Mb) dataset of sparse vectors with 800k features. The fit >> >> >> method of >> >> >> StandardScaler crashes with Java heap space or Direct buffer memory >> >> >> errors. >> >> >> There should be plenty of memory around -- 10 executors with 2 cores >> >> >> each >> >> >> and 8 Gb per core. I'm giving the executors 9g of memory and have >> >> >> also >> >> >> tried >> >> >> lots of overhead (3g), thinking it might be the array creation in >> >> >> the >> >> >> aggregators that's causing issues. >> >> >> >> >> >> The bizarre thing is that this isn't always reproducible -- >> >> >> sometimes >> >> >> it >> >> >> actually works without problems. Should I be setting up executors >> >> >> differently? >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Rok >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> View this message in context: >> >> >> >> >> >> http://apache-spark-user-list.1001560.n3.nabble.com/StandardScaler-failing-with-OOM-errors-in-PySpark-tp22593.html >> >> >> Sent from the Apache Spark User List mailing list archive at >> >> >> Nabble.com. >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >> >> >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org