The machinery of SVD is almost always described in terms of least squares matrix approximation without mentioning the probabilistic underpinnings of why least-squares is a good idea. The connection, however, goes all the way back to Gauss' reduction of planetary position observations (this is *why* the normal distribution is often called a Gaussian). Gauss provided such a compelling rationale for both the normal distribution (what I called a Gaussian below) and the resulting least squared error formulation of the estimation problem that everybody has just assumed that least-squared-error estimation is the way to go. Generally this is a pretty good approximation. Occasionally it is not at all good. One place where it is a really bad approximation is with very sparse count data. Netflix data is a great example, text represented as word counts per document is another.
To fill in more detail, here is a relatively jargon-filled explanation of the connection. I apologize for not being able to express this more lucidly. A more general view of both SVD and LDA are that they find probabilistic mixture models to describe data. SVD finds a single mixture of Gaussian distributions that all have the same variance and uses maximum likelihood to find this mixture. LDA finds a multi-level mixture of multinomial models and gives you a distribution of models that represents the distribution of possible models given your data and explicit assumptions. Gaussian distributions and multinomials look quite different, but for relatively large observed counts their log-likelihood functions become very similar. For Gaussians, the log-likelihood is just the sum of squared deviations from the mean. For large counts, the log-likelihood for multinomials approximates squared deviations from the mean. On Tue, Mar 31, 2009 at 11:43 PM, Atul Kulkarni <atulskulka...@gmail.com>wrote: > I do not understand the relation in LDA and SVD. In my limited > understanding > I understand LDA transforms data points in to a coordinate system where > they can be easily discriminated/classified. SVD on the other hand is used > for dimension reduction, can you help me bridge the gap by providing > something to read on? -- Ted Dunning, CTO DeepDyve