I would suggest low-rank test matrices at first. With k+p = 20, you can build rank 10, 20 and 30 input matrices. These can be built in R with this:
a.10 = matrix(rnorm(10*800), nrow=800) %*% matrix(rnorm(10*800), ncol=800) a.20 = matrix(rnorm(20*800), nrow=800) %*% matrix(rnorm(20*800), ncol=800) a.30 = matrix(rnorm(30*800), nrow=800) %*% matrix(rnorm(30*800), ncol=800) You should get essentially perfect results with a.10, very good results with a.20 and ok results with a.30. With block QR and rank(A) \le k, p = 5, I didn't see any important difference between block QR, stochastic projection or svd. On Thu, Nov 18, 2010 at 12:32 AM, Dmitriy Lyubimov <[email protected]>wrote: > First test results of this method including QR blocking as per document: > > Test matrix is 800x200, k+p=20. > > Colt library results (singular values) > > 1.204e+04 1.196e+04 1.191e+04 1.181e+04 1.175e+04 1.170e+04 1.159e+04 > 1.151e+04 1.146e+04 1.143e+04 1.138e+04 1.132e+04 1.129e+04 1.118e+04 > 1.112e+04 1.106e+04 1.101e+04 1.097e+04 1.090e+04 1.087e+04 > > Singular values, stochastic SVD, 1 QR block > > 1.019e+04 9.964e+03 9.842e+03 9.638e+03 9.574e+03 9.457e+03 9.275e+03 > 9.190e+03 9.057e+03 9.022e+03 8.957e+03 8.689e+03 8.673e+03 8.588e+03 > 8.415e+03 8.300e+03 8.214e+03 8.120e+03 8.007e+03 7.929e+03 > > Singular values, 20 Givens-based QR blocks > > 9.263e+03 6.941e+03 6.413e+03 6.241e+03 5.985e+03 5.898e+03 5.650e+03 > 5.246e+03 5.021e+03 4.898e+03 4.836e+03 4.594e+03 4.412e+03 4.191e+03 > 4.079e+03 3.949e+03 3.786e+03 3.728e+03 3.565e+03 3.408e+03 > > Notes: > > the source matrix is the same (as well as seed for random matrix Omega). > Omega is random Gaussian numbers, as the paper suggests as a first step. > Source matrix is random which perhaps means it is ill conditioned. The > authors in the paper warned that in such sitations results may go astray > quite a lot and power iterations are required. With blocking we get almost > another 8% of error on the largest singular value (but we probably want to > go with k+p far more than 20). Obviously not all of that 8% can be > attribute > to QR definciencies themselves, for error would be magnified by further > calculations in the pipeline. I use eigensolver from the apache > math-commons, which may use slightly different approaches, who knows. > it seesm though that either way in my case regardless of # of blocks > stochastic svd seems to consistently underestimate singular values in my > randomly generated source. Perhaps there are better tests (better > conditioned matrices to begin with) in which case this method should > presumably work better, but i am not sure how to generate them and what > margin of error to expect. > > Orthogonality tests for Q pass with epsilon set to 1e-10 though. > > I was worried about numerical stability, i wonder how significantly errors > may grow on gigantic matrices. > Please kindly let me know if you think this kind of precision to expect or > it actually should be better than that with this method. > > If this is acceptable (well Halko et al call it 'approximate' method > anyway) > i will put those subroutine into map-reduce code. I certainly can run more > tests per your suggestions if needed, in order to figure more details about > the limitations of the method. > > Thanks. > -Dmitriy > > On Mon, Nov 15, 2010 at 9:27 PM, Dmitriy Lyubimov (JIRA) <[email protected] > >wrote: > > > > > [ > > > https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > > > Dmitriy Lyubimov updated MAHOUT-376: > > ------------------------------------ > > > > Attachment: QR decomposition for Map.pdf > > > > QR step doc is ready for review. > > > > Especially sections 3.2 and on which i still haven't worked up in the > > actual code. These reflect my original ideas to deal with blockwise QR in > a > > MapReduce settings. I am probably retracing some block computations > already > > found elsewhere (such as in LAPack) but i think it may actually work out > ok. > > > > > Implement Map-reduce version of stochastic SVD > > > ---------------------------------------------- > > > > > > Key: MAHOUT-376 > > > URL: https://issues.apache.org/jira/browse/MAHOUT-376 > > > Project: Mahout > > > Issue Type: Improvement > > > Components: Math > > > Reporter: Ted Dunning > > > Assignee: Ted Dunning > > > Fix For: 0.5 > > > > > > Attachments: MAHOUT-376.patch, Modified stochastic svd > algorithm > > for mapreduce.pdf, QR decomposition for Map.pdf, QR decomposition for > > Map.pdf, sd-bib.bib, sd.pdf, sd.pdf, sd.pdf, sd.pdf, sd.tex, sd.tex, > sd.tex, > > sd.tex, Stochastic SVD using eigensolver trick.pdf > > > > > > > > > See attached pdf for outline of proposed method. > > > All comments are welcome. > > > > -- > > This message is automatically generated by JIRA. > > - > > You can reply to this email to add a comment to the issue online. > > > > >
