I would suggest low-rank test matrices at first.  With k+p = 20, you can
build rank 10, 20 and 30 input matrices.  These
can be built in R with this:

a.10 = matrix(rnorm(10*800), nrow=800) %*% matrix(rnorm(10*800), ncol=800)
a.20 = matrix(rnorm(20*800), nrow=800) %*% matrix(rnorm(20*800), ncol=800)
a.30 = matrix(rnorm(30*800), nrow=800) %*% matrix(rnorm(30*800), ncol=800)

You should get essentially perfect results with a.10, very good results with
a.20 and ok results with a.30.

With block QR and rank(A) \le k, p = 5, I didn't see any important
difference between block QR, stochastic projection or svd.

On Thu, Nov 18, 2010 at 12:32 AM, Dmitriy Lyubimov <[email protected]>wrote:

> First test results of this method including QR blocking as per document:
>
> Test matrix is 800x200, k+p=20.
>
> Colt library results (singular values)
>
> 1.204e+04 1.196e+04 1.191e+04 1.181e+04 1.175e+04 1.170e+04 1.159e+04
> 1.151e+04 1.146e+04 1.143e+04 1.138e+04 1.132e+04 1.129e+04 1.118e+04
> 1.112e+04 1.106e+04 1.101e+04 1.097e+04 1.090e+04 1.087e+04
>
> Singular values, stochastic SVD, 1 QR block
>
> 1.019e+04 9.964e+03 9.842e+03 9.638e+03 9.574e+03 9.457e+03 9.275e+03
> 9.190e+03 9.057e+03 9.022e+03 8.957e+03 8.689e+03 8.673e+03 8.588e+03
> 8.415e+03 8.300e+03 8.214e+03 8.120e+03 8.007e+03 7.929e+03
>
> Singular values, 20 Givens-based QR blocks
>
> 9.263e+03 6.941e+03 6.413e+03 6.241e+03 5.985e+03 5.898e+03 5.650e+03
> 5.246e+03 5.021e+03 4.898e+03 4.836e+03 4.594e+03 4.412e+03 4.191e+03
> 4.079e+03 3.949e+03 3.786e+03 3.728e+03 3.565e+03 3.408e+03
>
> Notes:
>
> the source matrix is the same (as well as seed for random matrix Omega).
> Omega is random Gaussian numbers, as the paper suggests as a first step.
> Source matrix is random which perhaps means it is ill conditioned. The
> authors in the paper warned that in such sitations results may go astray
> quite a lot and power iterations are required. With blocking we get almost
> another 8% of error on the largest singular value (but we probably want to
> go with k+p far more than 20). Obviously not all of that 8% can be
> attribute
> to QR definciencies themselves, for error would be magnified by further
> calculations in the pipeline. I use eigensolver from the apache
> math-commons, which may use slightly different approaches, who knows.
> it seesm though that either way in my case regardless of # of blocks
> stochastic svd seems to consistently underestimate singular values in my
> randomly generated source. Perhaps there are better tests (better
> conditioned matrices to begin with) in which case this method should
> presumably work better, but i am not sure how to generate them and what
> margin of error to expect.
>
> Orthogonality tests for Q pass with epsilon set to 1e-10 though.
>
> I was worried about numerical stability, i wonder how significantly errors
> may grow on gigantic matrices.
> Please kindly let me know if you think this kind of precision to expect or
> it actually should be better than that with this method.
>
> If this is acceptable (well Halko et al call it 'approximate' method
> anyway)
> i will put those subroutine into map-reduce code. I certainly can run more
> tests per your suggestions if needed, in order to figure more details about
> the limitations of the method.
>
> Thanks.
> -Dmitriy
>
> On Mon, Nov 15, 2010 at 9:27 PM, Dmitriy Lyubimov (JIRA) <[email protected]
> >wrote:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> >
> > Dmitriy Lyubimov updated MAHOUT-376:
> > ------------------------------------
> >
> >    Attachment: QR decomposition for Map.pdf
> >
> > QR step doc is ready for review.
> >
> > Especially sections 3.2 and on which i still haven't worked up in the
> > actual code. These reflect my original ideas to deal with blockwise QR in
> a
> > MapReduce settings. I am probably retracing some block computations
> already
> > found elsewhere (such as in LAPack) but i think it may actually work out
> ok.
> >
> > > Implement Map-reduce version of stochastic SVD
> > > ----------------------------------------------
> > >
> > >                 Key: MAHOUT-376
> > >                 URL: https://issues.apache.org/jira/browse/MAHOUT-376
> > >             Project: Mahout
> > >          Issue Type: Improvement
> > >          Components: Math
> > >            Reporter: Ted Dunning
> > >            Assignee: Ted Dunning
> > >             Fix For: 0.5
> > >
> > >         Attachments: MAHOUT-376.patch, Modified stochastic svd
> algorithm
> > for mapreduce.pdf, QR decomposition for Map.pdf, QR decomposition for
> > Map.pdf, sd-bib.bib, sd.pdf, sd.pdf, sd.pdf, sd.pdf, sd.tex, sd.tex,
> sd.tex,
> > sd.tex, Stochastic SVD using eigensolver trick.pdf
> > >
> > >
> > > See attached pdf for outline of proposed method.
> > > All comments are welcome.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>

Reply via email to