[
https://issues.apache.org/jira/browse/MAHOUT-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021138#comment-13021138
]
Jonathan Traupman commented on MAHOUT-672:
------------------------------------------
Ted-
I'd have to dig a little deeper to get a full understanding of all that your
doing with the SGD regression stuff. (BTW, I think you mean MAHOUT-529?)
Broadly speaking, though, I'd say the two patches are complementary. Conjugate
gradient is just a iterative method for solving linear systems. Regression is
one obvious application, but linear systems come up a lot in a whole range of
algorithms, making CG a fairly general building block.
As a linear system solver, the big advantage of CG over e.g. the Cholesky(*)
decomposition is a) being iterative, it's very easy to adapt it to map/reduce
for very large datasets and b) for matrices of the form (cI + A), where A is of
rank k, CG will typically run in O(kn^2) time instead of O(n^3). CG also has a
few disadvantages, namely that for full rank matrices, it requires n^3
multiplies compared to IIRC n^3/3 for Cholesky -- the same asymptotic
performance, but that constant factor difference can add up in the real world.
Another large disadvantage is that if you are solving a collection of linear
systems, i.e. AX = B, where X and B are both matrices instead of vectors, you
have to run a separate CG solver for each of the k columns of X for a total
O(kn^3) runtime. Traditional matrix decomposition methods are usually O(n^3) to
do the decomposition, but only O(n^2) to solve the system using the decomposed
matrix, so you can solve a collection of k systems in O(n^3 + kn^2
).
As for a linear regression implementation using CG compared to one using SGD,
it would be hard for me to reach any conclusions without comparing the two
approaches head to head on the same data. CG would probably gain some benefit
from being easily parallelizable, but the individual updates in SGD seem very
fast and lightweight, so any speed advantage to CG would probably only come up
for truly massive datasets. The SGD implementation in your patch also has a lot
of regularization support that a simple CG implementation of LMS would lack
(ridge regression i.e. L2 regularization comes for free, but L1 is considerably
harder). I'm also unaware of how one would do the automatic
validation/hyperparameter tuning using CG that your SGD implementation does.
(*) FWIW, I also have an implementation of the Cholesky decomposition, which
I've been meaning to Mahout-ize and submit when I can find the time to do it.
> Implementation of Conjugate Gradient for solving large linear systems
> ---------------------------------------------------------------------
>
> Key: MAHOUT-672
> URL: https://issues.apache.org/jira/browse/MAHOUT-672
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.5
> Reporter: Jonathan Traupman
> Priority: Minor
> Attachments: MAHOUT-672.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> This patch contains an implementation of conjugate gradient, an iterative
> algorithm for solving large linear systems. In particular, it is well suited
> for large sparse systems where a traditional QR or Cholesky decomposition is
> infeasible. Conjugate gradient only works for matrices that are square,
> symmetric, and positive definite (basically the same types where Cholesky
> decomposition is applicable). Systems like these commonly occur in statistics
> and machine learning problems (e.g. regression).
> Both a standard (in memory) solver and a distributed hadoop-based solver
> (basically the standard solver run using a DistributedRowMatrix a la
> DistributedLanczosSolver) are included.
> There is already a version of this algorithm in taste package, but it doesn't
> operate on standard mahout matrix/vector objects, nor does it implement a
> distributed version. I believe this implementation will be more generically
> useful to the community than the specialized one in taste.
> This implementation solves the following types of systems:
> Ax = b, where A is square, symmetric, and positive definite
> A'Ax = b where A is arbitrary but A'A is positive definite. Directly solving
> this system is more efficient than computing A'A explicitly then solving.
> (A + lambda * I)x = b and (A'A + lambda * I)x = b, for systems where A or A'A
> is singular and/or not full rank. This occurs commonly if A is large and
> sparse. Solving a system of this form is used, for example, in ridge
> regression.
> In addition to the normal conjugate gradient solver, this implementation also
> handles preconditioning, and has a sample Jacobi preconditioner included as
> an example. More work will be needed to build more advanced preconditioners
> if desired.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira