[ https://issues.apache.org/jira/browse/SPARK-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326951#comment-14326951 ]
Liang-Chi Hsieh commented on SPARK-5832: ---------------------------------------- For clustering algorithm, O(N^2) complexity is not rare. So I suggest maybe we can consider clustering algorithm more with its usability, scalability, feature in addition to complexity? One of the feature of AP is that it does not require users to specify the number of clusters. As I know, we have no clustering algorithms in MLlib with similar feature? O(N^2) time/memory complexity is the worst case for AP. The worst case means that we are going to cluster on a complete digraph (directed graph, every pair of distinct vertices is connected by a pair of unique edges) or a complete graph (undirected graph, every pair of distinct vertices is connected by a unique edge). As we know, that is very rare for practical use case. At most cases, the graph we process would be sparse graph. Unlike K-means and DBSCAN, AP is not a metric-space clustering techniques. For K-means and DBSCAN, they do not distinguish between dense and sparse graph. Because they need to calculate the distances between the possible cluster centers and all other nodes. But for AP, since the similarity graph is given, a sparse graph can significantly reduce the time/memory complexity required to perform clustering. I have no strong opinion towards whether we should include AP in MLlib. As Xiangrui Meng suggested, it is also a good idea to just maintain the implementation outside Spark as a third-party package. > Add Affinity Propagation clustering algorithm > --------------------------------------------- > > Key: SPARK-5832 > URL: https://issues.apache.org/jira/browse/SPARK-5832 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Liang-Chi Hsieh > Assignee: Liang-Chi Hsieh > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org