Thank you, Ted. 

I guess I have a little trouble visualizing all of it . The original work goes 
to great lengths to preserve 'the action' of the original matrix, which I can 
understand. The way I visualize it, let's say we take a hyperplane we project 
into with help of orthonormal R'. but since R' is not fully orthogonal, it is 
not an isometry which means some of the left eigenvectors may get 'squashed' 
more than the others which means the 'order of principiality' may be lost. 
That's how I tried to visualize it before and this is what I found confirmation 
for in the paper with all this talk about aligning Q columns with left 
eigenvectors of A (not that I get to the point how they actually achieve that).

Well I guess I don't have really to understand all of it. Thank you very much 
for taking time to explain.

-Dmitriy

-----Original Message-----
From: Ted Dunning [mailto:ted.dunn...@gmail.com] 
Sent: Saturday, March 27, 2010 11:59 PM
To: mahout-dev@lucene.apache.org
Subject: Re: Stochastic SVD

Dmitriy,

The current approach that Jake is working on is to form R' A' A R in a
single pass over A where R is a suitable random matrix.  This is a
relatively small symmetric matrix so we can find V and S^2 such that R' A' A
R = V S^2 V'.  But this can give us the SVD of AR by U = (AR) V S^-1.  U is
the desired orthonormal decomposition of AR and can be used as Q in the
algorithms.  Jake is talking about storing AR as a separate output of the
first pass, but a second pass over A would work just as well.

Since R has more columns than singular values that we want to extract from
A, we can be pretty sure that AR spans the eigen-vectors of interest.

The problem that I see with the Gram-Schmidt approach is that it assumes
that you have the y(i) one at a time while the practical situation is that
you get rows of Y rather than columns.  My guess is that you can do some
trickiness here, and that the Gram-Schmidt approach would give better
accuracy, but that the simplicity of the R'A' AR approach may compensate in
practice.


On Sat, Mar 27, 2010 at 10:14 PM, Dmitriy Lyubimov <dlie...@gmail.com>wrote:

> Hi,
>
> so i am trying to read a little deeper into this and was wondering if you
> can give me some insight of yours into this.
>
> so we construct a random projection, that's clear and that will require one
> pass over the original matrix or covariance matrix in case of PCA.
>
> But... it goes to say that before we project onto Q, we need to make sure
> its span captures principal left eigenvectors. Which in its turn suggests
> per alg 4.1 another pass over A and then running a Gram Schmidt procedure.
>
> so what's your insight about this --
>
> 1) do we really need to do 2 passes over A to get to the projection? if
> not,
> why not?
>
> 2) how do we orthonormalize Q efficiently? They seem to have mentioned (so
> far) that they themselves used Gramm-Schmidt with double orthogonalization
> for that purpose. I am not familiar with this particular permutation, can
> we
> do it in non-iterative way? I am only familiar with 'standard'
> Gramm-Schmidt
> which is strictly iterative. But to iterativeness is averse to MR (and even
> if it weren't, that probably would mean we would need to run another MR
> here
> just to orthonormalize Q unless again we can figure how to combine that
> with
> the projection job).
>
> i guess it still may be simplified further, i haven't finished reading all
> of it, but if there are answers further in the paper, do you think you
> could
> point the section that talks about variation that is most suitable for MR?
>
>
> Thank you very much.
>
> -Dmitriy
>
> On Mon, Mar 22, 2010 at 8:12 PM, Jake Mannix <jake.man...@gmail.com>
> wrote:
>
> > Actually, maybe what you were thinking (at least, what *I* am thinking)
> is
> > that you can indeed do it on one pass through the *original* data (ie you
> > can
> > get away with never keeping a handle on the original data itself),
> because
> > on the "one pass" through that data, you spit out MultipleOutputs - one
> > SequenceFile of the randomly projected data, which doesn't hit a reducer
> > at all, and a second output which is the outer product of those vectors
> > with themselves, which its a summing reducer.
> >
> > In this sense, while you need to pass over the original data's *size*
> > (in terms of number of rows) a second time, if you want to consider
> > it data to be played with (instead of just "training" data for use on a
> > smaller subset or even totally different set), you don't need to pass
> > over the original entire data *set* ever again.
> >
> >  -jake
> >
> > On Mon, Mar 22, 2010 at 6:35 PM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> >
> > > You are probably right.  I had a wild hare tromp through my thoughts
> the
> > > other day saying that one pass should be possible, but I can't
> > reconstruct
> > > the details just now.
> > >
> > > On Mon, Mar 22, 2010 at 6:00 PM, Jake Mannix <jake.man...@gmail.com>
> > > wrote:
> > >
> > > > I guess if you mean just do a random projection on the original data,
> > you
> > > > can certainly do that in one pass, but that's random projection, not
> a
> > > > stochastic decomposition.
> > > >
> > >
> >
>

Reply via email to