[ 
https://issues.apache.org/jira/browse/SPARK-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119422#comment-14119422
 ] 

Derrick Burns commented on SPARK-3219:
--------------------------------------

The current implementation supports one concrete representation of points, 
centers, and centroids. However, in order for arbitrary distance functions to 
be supported efficiently, the clusterer needs abstractions for point, center, 
and centroid that are actually defined by along with the distance function. 

> K-Means clusterer should support Bregman distance functions
> -----------------------------------------------------------
>
>                 Key: SPARK-3219
>                 URL: https://issues.apache.org/jira/browse/SPARK-3219
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Derrick Burns
>            Assignee: Derrick Burns
>
> The K-Means clusterer supports the Euclidean distance metric.  However, it is 
> rather straightforward to support Bregman 
> (http://machinelearning.wustl.edu/mlpapers/paper_files/BanerjeeMDG05.pdf) 
> distance functions which would increase the utility of the clusterer 
> tremendously.
> I have modified the clusterer to support pluggable distance functions.  
> However, I notice that there are hundreds of outstanding pull requests.  If 
> someone is willing to work with me to sponsor the work through the process, I 
> will create a pull request.  Otherwise, I will just keep my own fork.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to