port Hadoop-ified Lanczos SVD implementation from decomposer
------------------------------------------------------------

                 Key: MAHOUT-180
                 URL: https://issues.apache.org/jira/browse/MAHOUT-180
             Project: Mahout
          Issue Type: New Feature
          Components: Matrix
    Affects Versions: 0.2
            Reporter: Jake Mannix
            Priority: Minor


I wrote up a hadoop version of the Lanczos algorithm for performing SVD on 
sparse matrices available at http://decomposer.googlecode.com/, which is 
Apache-licensed, and I'm willing to donate it.  I'll have to port over the 
implementation to use Mahout vectors, or else add in these vectors as well.

Current issues with the decomposer implementation include: if your matrix is 
really big, you need to re-normalize before decomposition: find the largest 
eigenvalue first, and divide all your rows by that value, then decompose, or 
else you'll blow over Double.MAX_VALUE once you've run too many iterations (the 
L^2 norm of intermediate vectors grows roughly as 
(largest-eigenvalue)^(num-eigenvalues-found-so-far), so losing precision on the 
lower end is better than blowing over MAX_VALUE).  When this is ported to 
Mahout, we should add in the capability to do this automatically (run a couple 
iterations to find the largest eigenvalue, save that, then iterate while 
scaling vectors by 1/max_eigenvalue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to