[jira] [Created] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

Debasish Das (JIRA) Fri, 13 Mar 2015 09:03:15 -0700

Debasish Das created SPARK-6323:
-----------------------------------

             Summary: Large rank matrix factorization with Nonlinear loss and 
constraints
                 Key: SPARK-6323
                 URL: https://issues.apache.org/jira/browse/SPARK-6323
             Project: Spark
          Issue Type: New Feature
          Components: ML, MLlib
    Affects Versions: 1.4.0
            Reporter: Debasish Das
             Fix For: 1.4.0

Currently ml.recommendation.ALS is optimized for gram matrix generation which
only scales to modest ranks. The problems that we can solve are in the normal
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f(x) + g(z)

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss.
Most likely we will re-use the Gradient interfaces already defined and
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support
affine constraint Aeq x = beq , lb <= x <= ub yet. But most likely we don't
need that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we
are not using graph representation here.

As we do scaling experiments, we will understand the underlying architecture.

This JIRA is intended to extend the capabilities of Spark's collaborative
filtering toolkit to generalized loss function.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

Reply via email to