[ 
https://issues.apache.org/jira/browse/SPARK-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuo Xiang updated SPARK-2085:
------------------------------

    Description: 
The current implementation of ALS takes a single regularization parameter and 
apply it on both of the user factors and the product factors. This kind of 
regularization can be less effective while user number is significantly larger 
than the number of products (and vice versa). For example, if we have 10M users 
and 1K product, regularization on user factors will dominate. Following the 
discussion in [this 
thread](http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tt2567.html#a2704),
 the implementation in this PR will regularize each factor vector by #ratings * 
lambda.

Link to PR: https://github.com/apache/spark/pull/1026

  was:
The current implementation of ALS takes a single regularization parameter and 
apply it on both of the user factors and the product factors. This kind of 
regularization can be less effective while users number is significantly larger 
than the number of products (and vice versa). For example, if we have 10M users 
and 1K product, regularization on user factors will dominate. Following the 
discussion in [this 
thread](http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tt2567.html#a2704),
 the implementation in this PR will regularize each factor vector by #ratings * 
lambda.

Link to PR: https://github.com/apache/spark/pull/1026


> Apply user-specific regularization instead of uniform regularization in 
> Alternating Least Squares (ALS)
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2085
>                 URL: https://issues.apache.org/jira/browse/SPARK-2085
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Shuo Xiang
>            Priority: Minor
>
> The current implementation of ALS takes a single regularization parameter and 
> apply it on both of the user factors and the product factors. This kind of 
> regularization can be less effective while user number is significantly 
> larger than the number of products (and vice versa). For example, if we have 
> 10M users and 1K product, regularization on user factors will dominate. 
> Following the discussion in [this 
> thread](http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tt2567.html#a2704),
>  the implementation in this PR will regularize each factor vector by #ratings 
> * lambda.
> Link to PR: https://github.com/apache/spark/pull/1026



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to