On Tue, Jul 13, 2010 at 1:38 AM, Shannon Quinn <[email protected]> wrote:
> Hi all, > > I have a question that's a bit more rooted in theory. In my clustering > algorithm, I depend on the markovian properties of the graph representation > of the data, using the eigenvectors/eigenvalues to determine where the > cluster boundaries are. The markov transition matrix is dense, as it > represents the full distribution of a random walk through the entire data > set. > Hey Shannon, I really wish I'd had the time to read more on the full theory of this algorithm - is it really the case that you are always finding eigenvectors of a fully *dense* matrix? The DRM and Lanczos impls in Mahout may really not be terribly suited to this use case, if so... Can the algorithm work if you "sparsify" your transition matrix, so that it only keeps the relatively significant entries, setting the smaller values to zero? -jake
