On Mon, Sep 28, 2009 at 5:25 PM, Ted Dunning <[email protected]> wrote:
> Yes, but y probably has about as much data as A had. A is sparse, y is
> dense, but they have about the same number of rows. Y will have dozens to
> hundreds of non-zero elements per row which is probably comparable to A.
>
This is true, but is recomputing Y on the second pass worth not saving it?
This means that dealing with y'y implicitly at the cost of passing over y is
> a major loss. Better to go ahead and form y' y on the fly as y is formed,
> do an in-memory eigenvector computation and then pass over y to get Q (in
> the original algorithm notation). Q is also quite large.
>
Both your technique and mine seem to require all the different mappers
building pieces of Y'Y, so the "in-memory pass" to get the small
decomposition of Y'Y is only going to be done after one full pass through
A. If we're not storing Y anywhere, what are we going to do with the
k-by-k decomposition of Y'Y, other than to get Q, where you do another
pass over Y (computing Y on the fly, I guess you're suggesting, by really
doing a pass over A?)?
Maybe I'm just tired (6am flight this morning and I'm still up why?), but
the tradeoff of not storing Y anywhere means that when you need to
compute Q, you have to recompute it, which requires another factor
of k * (avg. density of A) more flops to compute, right? If you've got
the decomposition of Y'Y = V S^2 V', and distributed V S^-1 to all the
mappers for the Q computation, then to get Q = Y (V S^-1) you need
to recompute y_i = a_i * R (requiring k sparse dot products, each with
j operations needed if the average density of A is j), then post-multiply
by the k x k matrix V S^-1. If you were instead doing a pass over Y,
you'd miss that entire multiplicative factor of k*j at the expense of
basically
doubling your temporary HDFS usage. Maybe I'm being dense (hah!
man I should really get to bed, with humor like that), what am I messing
up here?
Either way, at this point you've got one half of the decomposition of A
and for many use-cases you don't need to go any further: you already
have the singular values from your Y'Y decomposition, so really the only
thing to do is really do a transpose on your q_i to put it in the right
form for doing projections or folding-in of new data.
If you do really want Q'A, you've got it in a reasonable form to do this:
both Q and A are presented in the same "wrong" way (Q as n k-vectors:
y_i * (VS^-1), and A as a set of n sparse m-vectors: a_j), so doing the
outer-product trick works here too:
Q'A = matrixSum_{i=1..n} ( q_i (x) a_i )
(where (x) is my lovely ascii rendition of a tensor product)
Total work done - three passes over the dataset. Now I just need to
read further into these papers and see how that gets turned into only
*one* pass through! But I'll save that for another day.
-jake