Hi Roberto, 1. How do they differ in terms of performance? They both use alternating least squares matrix factorization, the main difference is ml.recommendation.ALS uses DataFrames as input which has built-in optimizations and should give better performance
2. Am I correct to assume ml.recommendation.ALS (unlike mllib) does not support key-value RDDs? If so, what is the reason? mllib.recommendation.ALS expects a Ratings RDD type as input, while ml.recommendation.ALS expects a DataFrame with user, item and ratings columns. I'm not sure if that is what you mean about key-value RDDs. On Mon, Dec 14, 2015 at 3:22 PM, Roberto Pagliari <roberto.pagli...@asos.com > wrote: > Currently, there are two implementations of ALS available: > ml.recommendation.ALS > <http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml.recommendation> > and mllib.recommendation.ALS > <http://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#module-pyspark.mllib.recommendation> > > > > 1. How do they differ in terms of performance? > 2. Am I correct to assume ml.recommendation.ALS (unlike mllib) does > not support key-value RDDs? If so, what is the reason? > > > > Thank you, > >