[ 
https://issues.apache.org/jira/browse/SPARK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357367#comment-14357367
 ] 

RJ Nowling commented on SPARK-2429:
-----------------------------------

Hi [~yuu.ishik...@gmail.com]

I think the new implementation is great.  Did you change the algorithm?

I've spoken with [~srowen].  The hierarchical clustering would be valuable to 
the community -- I actually had a couple people reach out to me about it. 
However, Spark is currently undergoing the transition to the new ML API and as 
such, there is concern about accepting code into the older MLlib library.  With 
the announcement of Spark packages, there is also a move to encourage external 
libraries instead of large commits into Spark itself.

Would you be interested in publishing your hierarchical clustering 
implementation as an external library like [~derrickburns] did for the [KMeans 
Mini Batch 
implementation|https://github.com/derrickburns/generalized-kmeans-clustering]?  
 It could be listed in the [Spark packages index|http://spark-packages.org/] 
along with two other clustering packages so users can find it.

> Hierarchical Implementation of KMeans
> -------------------------------------
>
>                 Key: SPARK-2429
>                 URL: https://issues.apache.org/jira/browse/SPARK-2429
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: RJ Nowling
>            Assignee: Yu Ishikawa
>            Priority: Minor
>              Labels: clustering
>         Attachments: 2014-10-20_divisive-hierarchical-clustering.pdf, The 
> Result of Benchmarking a Hierarchical Clustering.pdf, 
> benchmark-result.2014-10-29.html, benchmark2.html
>
>
> Hierarchical clustering algorithms are widely used and would make a nice 
> addition to MLlib.  Clustering algorithms are useful for determining 
> relationships between clusters as well as offering faster assignment. 
> Discussion on the dev list suggested the following possible approaches:
> * Top down, recursive application of KMeans
> * Reuse DecisionTree implementation with different objective function
> * Hierarchical SVD
> It was also suggested that support for distance metrics other than Euclidean 
> such as negative dot or cosine are necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to