Re: Distance metrics in KMeans

2015-09-26 Thread Robineast
There is a Spark Package that gives some alternative distance metrics, http://spark-packages.org/package/derrickburns/generalized-kmeans-clustering. Not used it myself. - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co.

Re: Distance metrics in KMeans

2015-09-25 Thread sethah
It looks like the distance metric is hard coded to the L2 norm (euclidean distance) in MLlib. As you may expect, you are not the first person to desire other metrics and there has been some prior effort. Please reference this PR: https://github.com/apache/spark/pull/2634 And corresponding JIRA: