spectral-clustering.html

buildbot Fri, 02 May 2014 13:27:20 -0700

Author: buildbot
Date: Fri May  2 20:26:03 2014
New Revision: 907804

Log:
Staging update by buildbot for mahout


Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    
websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri May  2 20:26:03 2014
@@ -1 +1 @@
-1592026
+1592028

Modified: 
websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html 
Fri May  2 20:26:03 2014
@@ -240,16 +240,16 @@
 <p>At its simplest, spectral clustering relies on the following four steps:</p>
 <ol>
 <li>
-<p>Computing a similarity (or <em>affinity</em>) matrix (\mathbf{A}) from the 
data. This involves determining a pairwise distance function (f) that takes a 
pair of data points and returns a scalar.</p>
+<p>Computing a similarity (or <em>affinity</em>) matrix 
<code>\(\mathbf{A}\)</code> from the data. This involves determining a pairwise 
distance function <code>\(f\)</code> that takes a pair of data points and 
returns a scalar.</p>
 </li>
 <li>
-<p>Computing a graph Laplacian (\mathbf{L}) from the affinity matrix. There 
are several types of graph Laplacians; which is used will often depends on the 
situation.</p>
+<p>Computing a graph Laplacian <code>\(\mathbf{L}\)</code> from the affinity 
matrix. There are several types of graph Laplacians; which is used will often 
depends on the situation.</p>
 </li>
 <li>
-<p>Computing the eigenvectors and eigenvalues of (\mathbf{L}). The degree of 
this decomposition is often modulated by (k), or the number of clusters. Put 
another way, (k) eigenvectors and eigenvalues are computed.</p>
+<p>Computing the eigenvectors and eigenvalues of <code>\(\mathbf{L}\)</code>. 
The degree of this decomposition is often modulated by <code>\(k\)</code>, or 
the number of clusters. Put another way, <code>\(k\)</code> eigenvectors and 
eigenvalues are computed.</p>
 </li>
 <li>
-<p>The (k) eigenvectors are used as "proxy" data for the original dataset, and 
fed into k-means clustering. The resulting cluster assignments are 
transparently passed back to the original data.</p>
+<p>The <code>\(k\)</code> eigenvectors are used as "proxy" data for the 
original dataset, and fed into k-means clustering. The resulting cluster 
assignments are transparently passed back to the original data.</p>
 </li>
 </ol>
 <p>For more theoretical background on spectral clustering, such as how 
affinity matrices are computed, the different types of graph Laplacians, and 
whether the top or bottom eigenvectors and eigenvalues are computed, please 
read <a 
href="http://link.springer.com/article/10.1007/s11222-007-9033-z";>Ulrike von 
Luxburg's article in <em>Statistics and Computing</em> from December 2007</a>. 
It provides an excellent description of the linear algebra operations behind 
spectral clustering, and imbues a thorough understanding of the types of 
situations in which it can be used.</p>
@@ -257,9 +257,9 @@
 <p>As of Mahout 0.3, spectral clustering has been implemented to take 
advantage of the MapReduce framework. It uses <a 
href="http://mahout.apache.org/users/dim-reduction/ssvd.html";>SSVD</a> for 
dimensionality reduction of the input data set, and <a 
href="http://mahout.apache.org/users/clustering/k-means-clustering.html";>k-means</a>
 to perform the final clustering.</p>
 <p><strong>(<a 
href="https://issues.apache.org/jira/browse/MAHOUT-1538";>MAHOUT-1538</a> will 
port the existing Hadoop MapReduce implementation to Mahout DSL, allowing for 
one of several distinct distributed back-ends to conduct the 
computation)</strong></p>
 <h2 id="input">Input</h2>
-<p>The input format for the algorithm currently takes the form of a 
Hadoop-backed affinity matrix, in text form. Each line of the text file 
specifies a single element of the affinity matrix: the row index (i), the 
column index (j), and the value:</p>
+<p>The input format for the algorithm currently takes the form of a 
Hadoop-backed affinity matrix, in text form. Each line of the text file 
specifies a single element of the affinity matrix: the row index 
<code>\(i\)</code>, the column index <code>\(j\)</code>, and the value:</p>
 <p><code>i, j, value</code></p>
-<p>The affinity matrix is symmetric, and any unspecified (i, j) pairs are 
assumed to be 0 for sparsity. The row and column indices are 0-indexed. Thus, 
only the non-zero entries of either the upper or lower triangular need be 
specified.</p>
+<p>The affinity matrix is symmetric, and any unspecified <code>\(i, j\)</code> 
pairs are assumed to be 0 for sparsity. The row and column indices are 
0-indexed. Thus, only the non-zero entries of either the upper or lower 
triangular need be specified.</p>
 <p><strong>(<a 
href="https://issues.apache.org/jira/browse/MAHOUT-1539";>MAHOUT-1539</a> will 
allow for the creation of the affinity matrix to occur as part of the core 
spectral clustering algorithm, as opposed to the current requirement that the 
user create this matrix themselves and provide it, rather than the original 
data, to the algorithm)</strong></p>
 <h2 id="running-spectral-clustering">Running spectral clustering</h2>
 <p><strong>(<a 
href="https://issues.apache.org/jira/browse/MAHOUT-1540";>MAHOUT-1540</a> will 
provide a running example of this algorithm and this section will be updated to 
show how to run the example and what the expected output should be; until then, 
this section provides a how-to for simply running the algorithm on arbitrary 
input)</strong></p>

svn commit: r907804 - in /websites/staging/mahout/trunk/content: ./ users/clustering/spectral-clustering.html

Reply via email to