What is the feature dimension? Did you set the driver memory? -Xiangrui

On Tue, Apr 21, 2015 at 6:59 AM, rok <rokros...@gmail.com> wrote:
> I'm trying to use the StandardScaler in pyspark on a relatively small (a few
> hundred Mb) dataset of sparse vectors with 800k features. The fit method of
> StandardScaler crashes with Java heap space or Direct buffer memory errors.
> There should be plenty of memory around -- 10 executors with 2 cores each
> and 8 Gb per core. I'm giving the executors 9g of memory and have also tried
> lots of overhead (3g), thinking it might be the array creation in the
> aggregators that's causing issues.
>
> The bizarre thing is that this isn't always reproducible -- sometimes it
> actually works without problems. Should I be setting up executors
> differently?
>
> Thanks,
>
> Rok
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/StandardScaler-failing-with-OOM-errors-in-PySpark-tp22593.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to