[ https://issues.apache.org/jira/browse/SPARK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng updated SPARK-4259: --------------------------------- Shepherd: Xiangrui Meng > Add Spectral Clustering Algorithm with Gaussian Similarity Function > ------------------------------------------------------------------- > > Key: SPARK-4259 > URL: https://issues.apache.org/jira/browse/SPARK-4259 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Fan Jiang > Assignee: Fan Jiang > Labels: features > > In recent years, spectral clustering has become one of the most popular > modern clustering algorithms. It is simple to implement, can be solved > efficiently by standard linear algebra software, and very often outperforms > traditional clustering algorithms such as the k-means algorithm. > We implemented the unnormalized graph Laplacian matrix by Gaussian similarity > function. A brief design looks like below: > Unnormalized spectral clustering > Input: raw data points, number k of clusters to construct: > • Comupte Similarity matrix S ∈ Rn×n, . > • Construct a similarity graph. Let W be its weighted adjacency matrix. > • Compute the unnormalized Laplacian L = D - W. where D is the Degree > diagonal matrix > • Compute the first k eigenvectors u1, . . . , uk of L. > • Let U ∈ Rn×k be the matrix containing the vectors u1, . . . , uk as columns. > • For i = 1, . . . , n, let yi ∈ Rk be the vector corresponding to the i-th > row of U. > • Cluster the points (yi)i=1,...,n in Rk with the k-means algorithm into > clusters C1, . . . , Ck. > Output: Clusters A1, . . . , Ak with Ai = { j | yj ∈ Ci }. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org