Author: squinn Date: Fri May 2 20:33:12 2014 New Revision: 1592031 URL: http://svn.apache.org/r1592031 Log: Fixed one last URL syntax error, and also added information on DistributedRowMatrix use in spectral clustering.
Modified: mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext Modified: mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext?rev=1592031&r1=1592030&r2=1592031&view=diff ============================================================================== --- mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext (original) +++ mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext Fri May 2 20:33:12 2014 @@ -24,12 +24,14 @@ As of Mahout 0.3, spectral clustering ha ## Input -The input format for the algorithm currently takes the form of a Hadoop-backed affinity matrix, in text form. Each line of the text file specifies a single element of the affinity matrix: the row index `\(i\)`, the column index `\(j\)`, and the value: +The input format for the algorithm currently takes the form of a Hadoop-backed affinity matrix in the form of text files. Each line of the text file specifies a single element of the affinity matrix: the row index `\(i\)`, the column index `\(j\)`, and the value: `i, j, value` The affinity matrix is symmetric, and any unspecified `\(i, j\)` pairs are assumed to be 0 for sparsity. The row and column indices are 0-indexed. Thus, only the non-zero entries of either the upper or lower triangular need be specified. +The matrix elements specified in the text files are collected into a Mahout `DistributedRowMatrix`. + **([MAHOUT-1539](https://issues.apache.org/jira/browse/MAHOUT-1539) will allow for the creation of the affinity matrix to occur as part of the core spectral clustering algorithm, as opposed to the current requirement that the user create this matrix themselves and provide it, rather than the original data, to the algorithm)** ## Running spectral clustering @@ -45,7 +47,7 @@ Spectral clustering can be invoked with -k <number of clusters AND number of top eigenvectors to use> \ -x <maximum number of k-means iterations> -The affinity matrix can be contained in a single text file (using the aforementioned one-line-per-entry format) or span many text files (per (MAHOUT-978)[https://issues.apache.org/jira/browse/MAHOUT-978], do not prefix text files with a leading underscore '_' or period '.'). The `-d` flag is required for the algorithm to know the dimensions of the affinity matrix. `-k` is the number of top eigenvectors from the normalized graph Laplacian in the SSVD step, and also the number of clusters given to k-means after the SSVD step. +The affinity matrix can be contained in a single text file (using the aforementioned one-line-per-entry format) or span many text files [per (MAHOUT-978](https://issues.apache.org/jira/browse/MAHOUT-978), do not prefix text files with a leading underscore '_' or period '.'). The `-d` flag is required for the algorithm to know the dimensions of the affinity matrix. `-k` is the number of top eigenvectors from the normalized graph Laplacian in the SSVD step, and also the number of clusters given to k-means after the SSVD step. ## Example