[ https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360956#comment-14360956 ]
Debasish Das commented on SPARK-6323: ------------------------------------- g(z) is not regularization...we support constraints like z>=0; lb <= z <= ub;1'z = s, z >=0;L1(z) for now...These are the same constraints I supported through QuadraticMinimizer for 2426. I already migrated ALS to use QuadraticMinimizer (default) and NNLS(positive) and waiting for the next breeze release. I call it z since we are using splitting algorithms here for the solve (projection based or admm + proximal)... Sure for papers on global objective refer to any PLSA paper with matrix factorization. I personally like these 2 and I am focused on them: 1. Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization Equation (2) and (3) 2. The original PLSA paper from Hoffman et al. For large rank matrix factorization I think the requirements come from sparse topics now which can easily range in ~ 10K... > Large rank matrix factorization with Nonlinear loss and constraints > ------------------------------------------------------------------- > > Key: SPARK-6323 > URL: https://issues.apache.org/jira/browse/SPARK-6323 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib > Affects Versions: 1.4.0 > Reporter: Debasish Das > Fix For: 1.4.0 > > Original Estimate: 672h > Remaining Estimate: 672h > > Currently ml.recommendation.ALS is optimized for gram matrix generation which > scales to modest ranks. The problems that we can solve are in the normal > equation/quadratic form: 0.5x'Hx + c'x + g(z) > g(z) can be one of the constraints from Breeze proximal library: > https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala > In this PR we will re-use ml.recommendation.ALS design and come up with > ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent > changes, it's straightforward to do it now ! > ALM will be capable of solving the following problems: min f ( x ) + g ( z ) > 1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and > HingeLoss. Most likely we will re-use the Gradient interfaces already defined > and implement LoglikelihoodLoss > 2. Constraints g ( z ) supported are same as above except that we don't > support affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we > don't need that for ML applications > 3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which > in turn uses projection based solver (SPG) or proximal solvers (ADMM) based > on convergence speed. > https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala > 4. The factors will be SparseVector so that we keep shuffle size in check. > For example we will run with 10K ranks but we will force factors to be > 100-sparse. > This is closely related to Sparse LDA > https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we > are not using graph representation here. > As we do scaling experiments, we will understand which flow is more suited as > ratings get denser (my understanding is that since we already scaled ALS to 2 > billion ratings and we will keep sparsity in check, the same 2 billion flow > will scale to 10K ranks as well)... > This JIRA is intended to extend the capabilities of ml recommendation to > generalized loss function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org