Re: Stochastic SVD

Jake Mannix Mon, 22 Mar 2010 20:13:22 -0700

Actually, maybe what you were thinking (at least, what *I* am thinking) is
that you can indeed do it on one pass through the *original* data (ie you
can
get away with never keeping a handle on the original data itself), because
on the "one pass" through that data, you spit out MultipleOutputs - one
SequenceFile of the randomly projected data, which doesn't hit a reducer
at all, and a second output which is the outer product of those vectors
with themselves, which its a summing reducer.

In this sense, while you need to pass over the original data's *size*
(in terms of number of rows) a second time, if you want to consider
it data to be played with (instead of just "training" data for use on a
smaller subset or even totally different set), you don't need to pass
over the original entire data *set* ever again.

  -jake

On Mon, Mar 22, 2010 at 6:35 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> You are probably right.  I had a wild hare tromp through my thoughts the
> other day saying that one pass should be possible, but I can't reconstruct
> the details just now.
>
> On Mon, Mar 22, 2010 at 6:00 PM, Jake Mannix <jake.man...@gmail.com>
> wrote:
>
> > I guess if you mean just do a random projection on the original data, you
> > can certainly do that in one pass, but that's random projection, not a
> > stochastic decomposition.
> >
>

Re: Stochastic SVD

Reply via email to