What is the feature dimension? Did you set the driver memory? -Xiangrui On Tue, Apr 21, 2015 at 6:59 AM, rok <rokros...@gmail.com> wrote: > I'm trying to use the StandardScaler in pyspark on a relatively small (a few > hundred Mb) dataset of sparse vectors with 800k features. The fit method of > StandardScaler crashes with Java heap space or Direct buffer memory errors. > There should be plenty of memory around -- 10 executors with 2 cores each > and 8 Gb per core. I'm giving the executors 9g of memory and have also tried > lots of overhead (3g), thinking it might be the array creation in the > aggregators that's causing issues. > > The bizarre thing is that this isn't always reproducible -- sometimes it > actually works without problems. Should I be setting up executors > differently? > > Thanks, > > Rok > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/StandardScaler-failing-with-OOM-errors-in-PySpark-tp22593.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org