Re: Distance metrics in KMeans
There is a Spark Package that gives some alternative distance metrics, http://spark-packages.org/package/derrickburns/generalized-kmeans-clustering. Not used it myself. - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Distance-metrics-in-KMeans-tp24823p24829.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Distance metrics in KMeans
It looks like the distance metric is hard coded to the L2 norm (euclidean distance) in MLlib. As you may expect, you are not the first person to desire other metrics and there has been some prior effort. Please reference this PR: https://github.com/apache/spark/pull/2634 And corresponding JIRA: https://issues.apache.org/jira/browse/SPARK-3219 Seems as if the addition of arbitrary distance metrics is non-trivial given current implementation in MLlib. Not sure of any current work towards this issue. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Distance-metrics-in-KMeans-tp24823p24826.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org