Hmm... the degree to which I've found SVD useful is primarily contained in the amount to which the metric is *not* preserved, in my experience... that's the whole point, or else you get very little out of it: you trade a high dimensional sparse computation for a low-dimensional dense one, and if you exactly preserved the metric you basically get nothing.
When you notice that for text, ngrams like "software engineer" are now considerably closer to "c++ developer" than to other ngrams, this gives you information. You don't get that information from a random projection. You'll get some of that information from A'AR, because you get second-order correlations, but then you're still losing all correlations beyond second-order (and a true eigenvector is getting you the full infinite series of correlations, properly weighted). I mean, I guess you can use SVD purely for dimensional reduction, but like you say, doing reduction can be done lots of other more efficient ways. Doing it with reduction which enhances co-occurrence relationships and distorts the metric to produce better clusters than when you started is something that SVD, NMF, and LDA were designed for. Maybe I'm missing your point? -jake On Mon, Jan 4, 2010 at 2:44 PM, Ted Dunning <[email protected]> wrote: > SVD is (approximately) metric-preserving while also dimensionality > reducing. If you use A'AR instead of the actual term eigenvectors you > should get similar results. > > On Mon, Jan 4, 2010 at 2:21 PM, Jake Mannix <[email protected]> wrote: > > > Ted, how would just doing a random projection do the right thing? It's a > > basically metric-preserving technique, and one of the primary reasons to > > *do* LSA is to use a *different* metric (one in which "similar" terms are > > nearer to each other than would be otherwise imagined). > > > > > > -- > Ted Dunning, CTO > DeepDyve >
