Hi Roberto,

1. How do they differ in terms of performance?
They both use alternating least squares matrix factorization, the main
difference is ml.recommendation.ALS uses DataFrames as input which has
built-in optimizations and should give better performance

2.  Am I correct to assume ml.recommendation.ALS (unlike mllib) does not
support key-value RDDs? If so, what is the reason?
mllib.recommendation.ALS expects a Ratings RDD type as input, while
ml.recommendation.ALS expects a DataFrame with user, item and ratings
columns.  I'm not sure if that is what you mean about key-value RDDs.

On Mon, Dec 14, 2015 at 3:22 PM, Roberto Pagliari <roberto.pagli...@asos.com
> wrote:

> Currently, there are two implementations of ALS available:
> ml.recommendation.ALS
> <http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml.recommendation>
>  and mllib.recommendation.ALS
> <http://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#module-pyspark.mllib.recommendation>
>
>
>
>    1. How do they differ in terms of performance?
>    2. Am I correct to assume ml.recommendation.ALS (unlike mllib) does
>    not support key-value RDDs? If so, what is the reason?
>
>
>
> Thank you,
>
>

Reply via email to