First test results of this method including QR blocking as per document: Test matrix is 800x200, k+p=20.
Colt library results (singular values) 1.204e+04 1.196e+04 1.191e+04 1.181e+04 1.175e+04 1.170e+04 1.159e+04 1.151e+04 1.146e+04 1.143e+04 1.138e+04 1.132e+04 1.129e+04 1.118e+04 1.112e+04 1.106e+04 1.101e+04 1.097e+04 1.090e+04 1.087e+04 Singular values, stochastic SVD, 1 QR block 1.019e+04 9.964e+03 9.842e+03 9.638e+03 9.574e+03 9.457e+03 9.275e+03 9.190e+03 9.057e+03 9.022e+03 8.957e+03 8.689e+03 8.673e+03 8.588e+03 8.415e+03 8.300e+03 8.214e+03 8.120e+03 8.007e+03 7.929e+03 Singular values, 20 Givens-based QR blocks 9.263e+03 6.941e+03 6.413e+03 6.241e+03 5.985e+03 5.898e+03 5.650e+03 5.246e+03 5.021e+03 4.898e+03 4.836e+03 4.594e+03 4.412e+03 4.191e+03 4.079e+03 3.949e+03 3.786e+03 3.728e+03 3.565e+03 3.408e+03 Notes: the source matrix is the same (as well as seed for random matrix Omega). Omega is random Gaussian numbers, as the paper suggests as a first step. Source matrix is random which perhaps means it is ill conditioned. The authors in the paper warned that in such sitations results may go astray quite a lot and power iterations are required. With blocking we get almost another 8% of error on the largest singular value (but we probably want to go with k+p far more than 20). Obviously not all of that 8% can be attribute to QR definciencies themselves, for error would be magnified by further calculations in the pipeline. I use eigensolver from the apache math-commons, which may use slightly different approaches, who knows. it seesm though that either way in my case regardless of # of blocks stochastic svd seems to consistently underestimate singular values in my randomly generated source. Perhaps there are better tests (better conditioned matrices to begin with) in which case this method should presumably work better, but i am not sure how to generate them and what margin of error to expect. Orthogonality tests for Q pass with epsilon set to 1e-10 though. I was worried about numerical stability, i wonder how significantly errors may grow on gigantic matrices. Please kindly let me know if you think this kind of precision to expect or it actually should be better than that with this method. If this is acceptable (well Halko et al call it 'approximate' method anyway) i will put those subroutine into map-reduce code. I certainly can run more tests per your suggestions if needed, in order to figure more details about the limitations of the method. Thanks. -Dmitriy On Mon, Nov 15, 2010 at 9:27 PM, Dmitriy Lyubimov (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Dmitriy Lyubimov updated MAHOUT-376: > ------------------------------------ > > Attachment: QR decomposition for Map.pdf > > QR step doc is ready for review. > > Especially sections 3.2 and on which i still haven't worked up in the > actual code. These reflect my original ideas to deal with blockwise QR in a > MapReduce settings. I am probably retracing some block computations already > found elsewhere (such as in LAPack) but i think it may actually work out ok. > > > Implement Map-reduce version of stochastic SVD > > ---------------------------------------------- > > > > Key: MAHOUT-376 > > URL: https://issues.apache.org/jira/browse/MAHOUT-376 > > Project: Mahout > > Issue Type: Improvement > > Components: Math > > Reporter: Ted Dunning > > Assignee: Ted Dunning > > Fix For: 0.5 > > > > Attachments: MAHOUT-376.patch, Modified stochastic svd algorithm > for mapreduce.pdf, QR decomposition for Map.pdf, QR decomposition for > Map.pdf, sd-bib.bib, sd.pdf, sd.pdf, sd.pdf, sd.pdf, sd.tex, sd.tex, sd.tex, > sd.tex, Stochastic SVD using eigensolver trick.pdf > > > > > > See attached pdf for outline of proposed method. > > All comments are welcome. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
