Re: StandardScaler failing with OOM errors in PySpark

Xiangrui Meng Mon, 27 Apr 2015 09:56:25 -0700

You might need to specify driver memory in spark-submit instead of
passing JVM options. spark-submit is designed to handle different
deployments correctly. -Xiangrui


On Thu, Apr 23, 2015 at 4:58 AM, Rok Roskar <rokros...@gmail.com> wrote:
> ok yes, I think I have narrowed it down to being a problem with driver
> memory settings. It looks like the application master/driver is not being
> launched with the settings specified:
>
> For the driver process on the main node I see "-XX:MaxPermSize=128m -Xms512m
> -Xmx512m" as options used to start the JVM, even though I specified
>
> 'spark.yarn.am.memory', '5g'
> 'spark.yarn.am.memoryOverhead', '2000'
>
> The info shows that these options were read:
>
> 15/04/23 13:47:47 INFO yarn.Client: Will allocate AM container, with 7120 MB
> memory including 2000 MB overhead
>
> Is there some reason why these options are being ignored and instead
> starting the driver with just 512Mb of heap?
>
> On Thu, Apr 23, 2015 at 8:06 AM, Rok Roskar <rokros...@gmail.com> wrote:
>>
>> the feature dimension is 800k.
>>
>> yes, I believe the driver memory is likely the problem since it doesn't
>> crash until the very last part of the tree aggregation.
>>
>> I'm running it via pyspark through YARN -- I have to run in client mode so
>> I can't set spark.driver.memory -- I've tried setting the
>> spark.yarn.am.memory and overhead parameters but it doesn't seem to have an
>> effect.
>>
>> Thanks,
>>
>> Rok
>>
>> On Apr 23, 2015, at 7:47 AM, Xiangrui Meng <men...@gmail.com> wrote:
>>
>> > What is the feature dimension? Did you set the driver memory? -Xiangrui
>> >
>> > On Tue, Apr 21, 2015 at 6:59 AM, rok <rokros...@gmail.com> wrote:
>> >> I'm trying to use the StandardScaler in pyspark on a relatively small
>> >> (a few
>> >> hundred Mb) dataset of sparse vectors with 800k features. The fit
>> >> method of
>> >> StandardScaler crashes with Java heap space or Direct buffer memory
>> >> errors.
>> >> There should be plenty of memory around -- 10 executors with 2 cores
>> >> each
>> >> and 8 Gb per core. I'm giving the executors 9g of memory and have also
>> >> tried
>> >> lots of overhead (3g), thinking it might be the array creation in the
>> >> aggregators that's causing issues.
>> >>
>> >> The bizarre thing is that this isn't always reproducible -- sometimes
>> >> it
>> >> actually works without problems. Should I be setting up executors
>> >> differently?
>> >>
>> >> Thanks,
>> >>
>> >> Rok
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >> http://apache-spark-user-list.1001560.n3.nabble.com/StandardScaler-failing-with-OOM-errors-in-PySpark-tp22593.html
>> >> Sent from the Apache Spark User List mailing list archive at
>> >> Nabble.com.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: StandardScaler failing with OOM errors in PySpark

Reply via email to