[jira] [Comment Edited] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

Xiangrui Meng (JIRA) Fri, 13 Mar 2015 10:16:22 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360679#comment-14360679
 ]


Xiangrui Meng edited comment on SPARK-6323 at 3/13/15 5:13 PM:
---------------------------------------------------------------

[~debasish83] Please help me understand some details.

> The problems that we can solve are in the normal equation/quadratic form: 
> 0.5x'Hx + c'x + g(z)

You call `g(z)` "constraints" but it looks like the regularization term in the 
objective function, and you didn't mention what `z` is.

> ALM will be capable of solving the following problems: min f ( x ) + g (z)

Are they sub-problems of the matrix factorization? If yes, could you also tell 
the global objective? For example, in ALS, the global objective is

{code}
minimize \frac{1}[2} \sum_{ij \in \Omega} (r_{ij} - u_i^T v_j)^2
{code}

and if we take alternating directions, the problem in each step is decoupled 
into many sub-problems (least squares).

{code}
minimize \frac{1}{2} \sum_{j, ij \in \Omega} (r_{ij}  - u_i^T v_j)^2  
(sub-problem for u_i)
{code}

We can add the nonnegative constraints to the global objective, and then the 
sub-problems receive the same constraints. I can see other loss may work, but I 
cannot clearly see the benefits of using other losses, which usually make the 
problem much harder to solve. Any papers for reference?

Another issue is dealing with very frequent items 
(https://issues.apache.org/jira/browse/SPARK-3735). We plan to assemble and 
send partial AtA directly. But this only works if the subproblems can be 
expressed using normal equation. I think it only applies to squared loss.

> As we do scaling experiments, we will understand which flow is more suited as 
> ratings get denser (my understanding is that since we already scaled ALS to 2 
> billion ratings and we will keep sparsity in check, the same 2 billion flow 
> will scale to 10K ranks as well)...

Any papers using rank ~10K?


was (Author: mengxr):
[~debasish83] Please help me understand some details.

> The problems that we can solve are in the normal equation/quadratic form: 
> 0.5x'Hx + c'x + g(z)

You call `g(z)` "constraints" but it looks like the regularization term in the 
objective function, and you didn't mention what `z` is.

> ALM will be capable of solving the following problems: min f (x) + g (z)

Are they sub-problems of the matrix factorization? If yes, could you also tell 
the global objective? For example, in ALS, the global objective is

minimize \frac{1}[2} \sum_{ij \in \Omega} (r_{ij} - u_i^T v_j)^2

and if we take alternating directions, the problem in each step is decoupled 
into many sub-problems (least squares).

minimize \frac{1}{2} \sum_{j, ij \in \Omega} (r_{ij}  - u_i^T v_j)^2  
(sub-problem for u_i)

We can add the nonnegative constraints to the global objective, and then the 
sub-problems receive the same constraints. I can see other loss may work, but I 
cannot clearly see the benefits of using other losses, which usually make the 
problem much harder to solve. Any papers for reference?

Another issue is dealing with very frequent items 
(https://issues.apache.org/jira/browse/SPARK-3735). We plan to assemble and 
send partial AtA directly. But this only works if the subproblems can be 
expressed using normal equation. I think it only applies to squared loss.

> As we do scaling experiments, we will understand which flow is more suited as 
> ratings get denser (my understanding is that since we already scaled ALS to 2 
> billion ratings and we will keep sparsity in check, the same 2 billion flow 
> will scale to 10K ranks as well)...

Any papers using rank ~10K?

> Large rank matrix factorization with Nonlinear loss and constraints
> -------------------------------------------------------------------
>
>                 Key: SPARK-6323
>                 URL: https://issues.apache.org/jira/browse/SPARK-6323
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, MLlib
>    Affects Versions: 1.4.0
>            Reporter: Debasish Das
>             Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recommendation.ALS design and come up with 
> ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
> changes, it's straightforward to do it now !
> ALM will be capable of solving the following problems: min f ( x ) + g ( z )
> 1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
> HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
> and implement LoglikelihoodLoss
> 2. Constraints g ( z ) supported are same as above except that we don't 
> support affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we 
> don't need that for ML applications
> 3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which 
> in turn uses projection based solver (SPG) or proximal solvers (ADMM) based 
> on convergence speed.
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala
> 4. The factors will be SparseVector so that we keep shuffle size in check. 
> For example we will run with 10K ranks but we will force factors to be 
> 100-sparse.
> This is closely related to Sparse LDA 
> https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
> are not using graph representation here.
> As we do scaling experiments, we will understand which flow is more suited as 
> ratings get denser (my understanding is that since we already scaled ALS to 2 
> billion ratings and we will keep sparsity in check, the same 2 billion flow 
> will scale to 10K ranks as well)...
> This JIRA is intended to extend the capabilities of ml recommendation to 
> generalized loss function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

Reply via email to