[jira] [Commented] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

Debasish Das (JIRA) Fri, 13 Mar 2015 12:18:07 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360956#comment-14360956
 ]


Debasish Das commented on SPARK-6323:
-------------------------------------

g(z) is not regularization...we support constraints like z>=0; lb <= z <= 
ub;1'z = s, z >=0;L1(z) for now...These are the same constraints I supported 
through QuadraticMinimizer for 2426. I already migrated ALS to use 
QuadraticMinimizer (default) and NNLS(positive) and waiting for the next breeze 
release.

I call it z since we are using splitting algorithms here for the solve 
(projection based or admm + proximal)...

Sure for papers on global objective refer to any PLSA paper with matrix 
factorization. I personally like these 2 and I am focused on them:

1. Tutorial on Probabilistic Topic Modeling: Additive Regularization for 
Stochastic Matrix Factorization Equation (2) and (3) 
2. The original PLSA paper from Hoffman et al.

For large rank matrix factorization I think the requirements come from sparse 
topics now which can easily range in ~ 10K...


> Large rank matrix factorization with Nonlinear loss and constraints
> -------------------------------------------------------------------
>
>                 Key: SPARK-6323
>                 URL: https://issues.apache.org/jira/browse/SPARK-6323
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, MLlib
>    Affects Versions: 1.4.0
>            Reporter: Debasish Das
>             Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recommendation.ALS design and come up with 
> ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
> changes, it's straightforward to do it now !
> ALM will be capable of solving the following problems: min f ( x ) + g ( z )
> 1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
> HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
> and implement LoglikelihoodLoss
> 2. Constraints g ( z ) supported are same as above except that we don't 
> support affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we 
> don't need that for ML applications
> 3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which 
> in turn uses projection based solver (SPG) or proximal solvers (ADMM) based 
> on convergence speed.
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala
> 4. The factors will be SparseVector so that we keep shuffle size in check. 
> For example we will run with 10K ranks but we will force factors to be 
> 100-sparse.
> This is closely related to Sparse LDA 
> https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
> are not using graph representation here.
> As we do scaling experiments, we will understand which flow is more suited as 
> ratings get denser (my understanding is that since we already scaled ALS to 2 
> billion ratings and we will keep sparsity in check, the same 2 billion flow 
> will scale to 10K ranks as well)...
> This JIRA is intended to extend the capabilities of ml recommendation to 
> generalized loss function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

Reply via email to