spectral-clustering.mdtext

squinn Fri, 02 May 2014 13:33:59 -0700

Author: squinn
Date: Fri May  2 20:33:12 2014
New Revision: 1592031

URL: http://svn.apache.org/r1592031
Log:
Fixed one last URL syntax error, and also added information on 
DistributedRowMatrix use in spectral clustering.


Modified:
    
mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext

Modified: 
mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext
URL: 
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext?rev=1592031&r1=1592030&r2=1592031&view=diff
==============================================================================
--- 
mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext
 (original)
+++ 
mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext
 Fri May  2 20:33:12 2014
@@ -24,12 +24,14 @@ As of Mahout 0.3, spectral clustering ha
 
 ## Input
 
-The input format for the algorithm currently takes the form of a Hadoop-backed 
affinity matrix, in text form. Each line of the text file specifies a single 
element of the affinity matrix: the row index `\(i\)`, the column index 
`\(j\)`, and the value:
+The input format for the algorithm currently takes the form of a Hadoop-backed 
affinity matrix in the form of text files. Each line of the text file specifies 
a single element of the affinity matrix: the row index `\(i\)`, the column 
index `\(j\)`, and the value:
 
 `i, j, value`
 
 The affinity matrix is symmetric, and any unspecified `\(i, j\)` pairs are 
assumed to be 0 for sparsity. The row and column indices are 0-indexed. Thus, 
only the non-zero entries of either the upper or lower triangular need be 
specified.
 
+The matrix elements specified in the text files are collected into a Mahout 
`DistributedRowMatrix`.
+
 **([MAHOUT-1539](https://issues.apache.org/jira/browse/MAHOUT-1539) will allow 
for the creation of the affinity matrix to occur as part of the core spectral 
clustering algorithm, as opposed to the current requirement that the user 
create this matrix themselves and provide it, rather than the original data, to 
the algorithm)**
 
 ## Running spectral clustering
@@ -45,7 +47,7 @@ Spectral clustering can be invoked with 
         -k <number of clusters AND number of top eigenvectors to use> \
         -x <maximum number of k-means iterations>
 
-The affinity matrix can be contained in a single text file (using the 
aforementioned one-line-per-entry format) or span many text files (per 
(MAHOUT-978)[https://issues.apache.org/jira/browse/MAHOUT-978], do not prefix 
text files with a leading underscore '_' or period '.'). The `-d` flag is 
required for the algorithm to know the dimensions of the affinity matrix. `-k` 
is the number of top eigenvectors from the normalized graph Laplacian in the 
SSVD step, and also the number of clusters given to k-means after the SSVD step.
+The affinity matrix can be contained in a single text file (using the 
aforementioned one-line-per-entry format) or span many text files [per 
(MAHOUT-978](https://issues.apache.org/jira/browse/MAHOUT-978), do not prefix 
text files with a leading underscore '_' or period '.'). The `-d` flag is 
required for the algorithm to know the dimensions of the affinity matrix. `-k` 
is the number of top eigenvectors from the normalized graph Laplacian in the 
SSVD step, and also the number of clusters given to k-means after the SSVD step.
 
 ## Example

svn commit: r1592031 - /mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext

Reply via email to