Low-rank transition matrix approximation

Shannon Quinn Mon, 12 Jul 2010 16:40:15 -0700

Hi all,

I have a question that's a bit more rooted in theory. In my clustering
algorithm, I depend on the markovian properties of the graph representation
of the data, using the eigenvectors/eigenvalues to determine where the
cluster boundaries are. The markov transition matrix is dense, as it
represents the full distribution of a random walk through the entire data
set.


Within the thesis for this algorithm (link:
http://dl.dropbox.com/u/1377610/eigencuts_thesis.pdf ), the author dedicates
an entire chapter (chpt 8) to a discussion on using successively lower-rank
sparse approximations of the full markov transition matrix in order to speed
up computation of the eigendecomposition (since there is a bottleneck in
that the decomposition is performed at every iterative step in the
clustering process).

My question is whether or not, on a framework like Mahout, this optimization
is necessary. I certainly don't doubt there would be a performance
improvement, but would it be noticeable across a distributed framework? If
the markov transition matrix is already a DistributedRowMatrix, and I'm
already using tools like the DistributedLanczosSolver, is an extra
map/reduce procedure for creating a lower-rank markov matrix really
necessary?

I'm very eager to hear other folks' thoughts on this. Thanks!

Regards,
Shannon

Low-rank transition matrix approximation

Reply via email to