Hello, I was working on Spark MLlib ALS Matrix factorization algorithm and came across the following blog post:
https://databricks.com/blog/2014/07/23/scalable-collaborative-filtering-with-spark-mllib.html Can anyone help me understanding what "s" scaling factor does and does it really give better performance? What's the significance of this? If we convert input data to scaledData with the help of "s", will it speedup the algorithm? Scaled data usage: *(For each user, we create pseudo-users that have the same ratings. That is, for every rating as (userId, productId, rating), we generate (userId+i, productId, rating) where 0 <= i < s and s is the scaling factor)* Also, this blogpost is for spark 1.1 and I am currently using 2.0 Any help will be greatly appreciated. Thanks, Roshani