[ 
https://issues.apache.org/jira/browse/SPARK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298030#comment-14298030
 ] 

Xiangrui Meng commented on SPARK-4259:
--------------------------------------

[~andrew.musselman] PIC is more or less a spectral clustering algorithm. It 
should produce similar result when there is a significant gap between the 
second and the third eigenvalues. If there is not such a gap, it creates a 
weighted combination, which should work well in practice. Feel free to create a 
new JIRA for the original spectral clustering algorithm. But note that our goal 
is not to provide reference machine learning implementations. If PIC is an 
alternative to the original spectral clustering and it is more scalable, we 
don't want to maintain two implementations.

> Add Power Iteration Clustering Algorithm with Gaussian Similarity Function
> --------------------------------------------------------------------------
>
>                 Key: SPARK-4259
>                 URL: https://issues.apache.org/jira/browse/SPARK-4259
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Fan Jiang
>            Assignee: Fan Jiang
>              Labels: features
>
> In recent years, power Iteration clustering has become one of the most 
> popular modern clustering algorithms. It is simple to implement, can be 
> solved efficiently by standard linear algebra software, and very often 
> outperforms traditional clustering algorithms such as the k-means algorithm.
> Power iteration clustering is a scalable and efficient algorithm for 
> clustering points given pointwise mutual affinity values.  Internally the 
> algorithm:
> computes the Gaussian distance between all pairs of points and represents 
> these distances in an Affinity Matrix
> calculates a Normalized Affinity Matrix
> calculates the principal eigenvalue and eigenvector
> Clusters each of the input points according to their principal eigenvector 
> component value
> Details of this algorithm are found within [Power Iteration Clustering, Lin 
> and Cohen]{www.icml2010.org/papers/387.pdf}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to