Hello,

I was working on Spark MLlib ALS Matrix factorization algorithm and came
across the following blog post:

https://databricks.com/blog/2014/07/23/scalable-collaborative-filtering-with-spark-mllib.html

Can anyone help me understanding what "s" scaling factor does and does it
really give better performance? What's the significance of this?
If we convert input data to scaledData with the help of "s", will it
speedup the algorithm?

Scaled data usage:
*(For each user, we create pseudo-users that have the same ratings. That
is, for every rating as (userId, productId, rating), we generate (userId+i,
productId, rating) where 0 <= i < s and s is the scaling factor)*

Also, this blogpost is for spark 1.1 and I am currently using 2.0

Any help will be greatly appreciated.

Thanks,
Roshani

Reply via email to