Author: squinn Date: Fri May 2 20:21:23 2014 New Revision: 1592026 URL: http://svn.apache.org/r1592026 Log: Fixed the latex syntax to match mathjax.
Modified: mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext Modified: mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext?rev=1592026&r1=1592025&r2=1592026&view=diff ============================================================================== --- mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext (original) +++ mahout/site/mahout_cms/trunk/content/users/clustering/spectral-clustering.mdtext Fri May 2 20:21:23 2014 @@ -6,13 +6,13 @@ Spectral clustering, as its name implies At its simplest, spectral clustering relies on the following four steps: - 1. Computing a similarity (or _affinity_) matrix $\mathbf{A}$ from the data. This involves determining a pairwise distance function $f$ that takes a pair of data points and returns a scalar. + 1. Computing a similarity (or _affinity_) matrix \(\mathbf{A}\) from the data. This involves determining a pairwise distance function \(f\) that takes a pair of data points and returns a scalar. - 2. Computing a graph Laplacian $\mathbf{L}$ from the affinity matrix. There are several types of graph Laplacians; which is used will often depends on the situation. + 2. Computing a graph Laplacian \(\mathbf{L}\) from the affinity matrix. There are several types of graph Laplacians; which is used will often depends on the situation. - 3. Computing the eigenvectors and eigenvalues of $\mathbf{L}$. The degree of this decomposition is often modulated by $k$, or the number of clusters. Put another way, $k$ eigenvectors and eigenvalues are computed. + 3. Computing the eigenvectors and eigenvalues of \(\mathbf{L}\). The degree of this decomposition is often modulated by \(k\), or the number of clusters. Put another way, \(k\) eigenvectors and eigenvalues are computed. - 4. The $k$ eigenvectors are used as "proxy" data for the original dataset, and fed into k-means clustering. The resulting cluster assignments are transparently passed back to the original data. + 4. The \(k\) eigenvectors are used as "proxy" data for the original dataset, and fed into k-means clustering. The resulting cluster assignments are transparently passed back to the original data. For more theoretical background on spectral clustering, such as how affinity matrices are computed, the different types of graph Laplacians, and whether the top or bottom eigenvectors and eigenvalues are computed, please read [Ulrike von Luxburg's article in _Statistics and Computing_ from December 2007](http://link.springer.com/article/10.1007/s11222-007-9033-z). It provides an excellent description of the linear algebra operations behind spectral clustering, and imbues a thorough understanding of the types of situations in which it can be used. @@ -24,11 +24,11 @@ As of Mahout 0.3, spectral clustering ha ## Input -The input format for the algorithm currently takes the form of a Hadoop-backed affinity matrix, in text form. Each line of the text file specifies a single element of the affinity matrix: the row index $i$, the column index $j$, and the value: +The input format for the algorithm currently takes the form of a Hadoop-backed affinity matrix, in text form. Each line of the text file specifies a single element of the affinity matrix: the row index \(i\), the column index \(j\), and the value: `i, j, value` -The affinity matrix is symmetric, and any unspecified $i, j$ pairs are assumed to be 0 for sparsity. The row and column indices are 0-indexed. Thus, only the non-zero entries of either the upper or lower triangular need be specified. +The affinity matrix is symmetric, and any unspecified \(i, j\) pairs are assumed to be 0 for sparsity. The row and column indices are 0-indexed. Thus, only the non-zero entries of either the upper or lower triangular need be specified. **([MAHOUT-1539](https://issues.apache.org/jira/browse/MAHOUT-1539) will allow for the creation of the affinity matrix to occur as part of the core spectral clustering algorithm, as opposed to the current requirement that the user create this matrix themselves and provide it, rather than the original data, to the algorithm)**