I am also surprised that I face this problems with fairy small dataset on
14 M4.2xlarge machines.  Could you please let me know on which dataset you
can run 100 iterations of rank 30 on your laptop?

I am currently just trying to run the default example code given with spark
to run ALS on movie lens dataset. I did not change anything in the
code.  However I am running this example on Netflix dataset (1.5 gb)

Thanks,
Roshani

On Friday, September 16, 2016, Sean Owen <so...@cloudera.com> wrote:

> You may have to decrease the checkpoint interval to say 5 if you're
> getting StackOverflowError. You may have a particularly deep lineage
> being created during iterations.
>
> No space left on device means you don't have enough local disk to
> accommodate the big shuffles in some stage. You can add more disk or
> maybe look at tuning shuffle params to do more in memory and maybe
> avoid spilling to disk as much.
>
> However, given the small data size, I'm surprised that you see either
> problem.
>
> 10-20 iterations is usually where the model stops improving much anyway.
>
> I can run 100 iterations of rank 30 on my *laptop* so something is
> fairly wrong in your setup or maybe in other parts of your user code.
>
> On Thu, Sep 15, 2016 at 10:00 PM, Roshani Nagmote
> <roshaninagmo...@gmail.com <javascript:;>> wrote:
> > Hi,
> >
> > I need help to run matrix factorization ALS algorithm in Spark MLlib.
> >
> > I am using dataset(1.5Gb) having 480189 users and 17770 items formatted
> in
> > similar way as Movielens dataset.
> > I am trying to run MovieLensALS example jar on this dataset on AWS Spark
> EMR
> > cluster having 14 M4.2xlarge slaves.
> >
> > Command run:
> > /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn
> --class
> > org.apache.spark.examples.mllib.MovieLensALS --jars
> > /usr/lib/spark/examples/jars/scopt_2.11-3.3.0.jar
> > /usr/lib/spark/examples/jars/spark-examples_2.11-2.0.0.jar --rank 32
> > --numIterations 50 --kryo s3://dataset/input_dataset
> >
> > Issues I get:
> > If I increase rank to 70 or more and numIterations 15 or more, I get
> > following errors:
> > 1) stack overflow error
> > 2) No space left on device - shuffle phase
> >
> > Could you please let me know if there are any parameters I should tune to
> > make this algorithm work on this dataset?
> >
> > For better rmse, I want to increase iterations. Am I missing something
> very
> > trivial? Could anyone help me run this algorithm on this specific dataset
> > with more iterations?
> >
> > Was anyone able to run ALS on spark with more than 100 iterations and
> rank
> > more than 30?
> >
> > Any help will be greatly appreciated.
> >
> > Thanks and Regards,
> > Roshani
>

Reply via email to