The machinery of SVD is almost always described in terms of least squares
matrix approximation without mentioning the probabilistic underpinnings of
why least-squares is a good idea.  The connection, however, goes all the way
back to Gauss' reduction of planetary position observations (this is *why*
the normal distribution is often called a Gaussian).  Gauss provided such a
compelling rationale for both the normal distribution (what I called a
Gaussian below) and the resulting least squared error formulation of the
estimation problem that everybody has just assumed that least-squared-error
estimation is the way to go.  Generally this is a pretty good
approximation.  Occasionally it is not at all good.  One place where it is a
really bad approximation is with very sparse count data.  Netflix data is a
great example, text represented as word counts per document is another.

To fill in more detail, here is a relatively jargon-filled explanation of
the connection.  I apologize for not being able to express this more
lucidly.

A more general view of both SVD and LDA are that they find probabilistic
mixture models to describe data.   SVD finds a single mixture of Gaussian
distributions that all have the same variance and uses maximum likelihood to
find this mixture.  LDA finds a multi-level mixture of multinomial models
and gives you a distribution of models that represents the distribution of
possible models given your data and explicit assumptions.

Gaussian distributions and multinomials look quite different, but for
relatively large observed counts their log-likelihood functions become very
similar.  For Gaussians, the log-likelihood is just the sum of squared
deviations from the mean.  For large counts, the log-likelihood for
multinomials approximates squared deviations from the mean.


On Tue, Mar 31, 2009 at 11:43 PM, Atul Kulkarni <atulskulka...@gmail.com>wrote:

> I do not understand the relation in LDA and SVD. In my limited
> understanding
> I understand LDA transforms data points in to a coordinate system  where
> they can be easily discriminated/classified. SVD on the other hand is used
> for dimension reduction, can you help me bridge the gap by providing
> something to read on?




-- 
Ted Dunning, CTO
DeepDyve

Reply via email to