Repository: spark Updated Branches: refs/heads/branch-1.1 1af68caf6 -> eba399b3c
[SPARK-2843][MLLIB] add a section about regularization parameter in ALS atalwalkar srowen Author: Xiangrui Meng <m...@databricks.com> Closes #2064 from mengxr/als-doc and squashes the following commits: b2e20ab [Xiangrui Meng] introduced -> discussed 98abdd7 [Xiangrui Meng] add reference 339bd08 [Xiangrui Meng] add a section about regularization parameter in ALS (cherry picked from commit e0f946265b9ea5bc48849cf7794c2c03d5e29fba) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eba399b3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eba399b3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eba399b3 Branch: refs/heads/branch-1.1 Commit: eba399b3c6768f5106cbc17752630fa81d9cdce4 Parents: 1af68ca Author: Xiangrui Meng <m...@databricks.com> Authored: Wed Aug 20 17:47:39 2014 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Wed Aug 20 17:47:58 2014 -0700 ---------------------------------------------------------------------- docs/mllib-collaborative-filtering.md | 11 +++++++++++ 1 file changed, 11 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/eba399b3/docs/mllib-collaborative-filtering.md ---------------------------------------------------------------------- diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md index ab10b2f..d5c539d 100644 --- a/docs/mllib-collaborative-filtering.md +++ b/docs/mllib-collaborative-filtering.md @@ -43,6 +43,17 @@ level of confidence in observed user preferences, rather than explicit ratings g model then tries to find latent factors that can be used to predict the expected preference of a user for an item. +### Scaling of the regularization parameter + +Since v1.1, we scale the regularization parameter `lambda` in solving each least squares problem by +the number of ratings the user generated in updating user factors, +or the number of ratings the product received in updating product factors. +This approach is named "ALS-WR" and discussed in the paper +"[Large-Scale Parallel Collaborative Filtering for the Netflix Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)". +It makes `lambda` less dependent on the scale of the dataset. +So we can apply the best parameter learned from a sampled subset to the full dataset +and expect similar performance. + ## Examples <div class="codetabs"> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org