Hi all, In my limited understanding of the MLlib, it is a good idea to use the various distance functions on some machine learning algorithms. For example, we can only use Euclidean distance metric in KMeans. And I am tackling with contributing hierarchical clustering to MLlib (https://issues.apache.org/jira/browse/SPARK-2429). I would like to support the various distance functions in it.
Should we support the standardized distance function in MLlib or not? You know, Spark depends on Breeze. So I think we have two approaches in order to use distance functions in MLlib. One is implementing some distance functions in MLlib. The other is wrapping the functions of Breeze. And I am a bit worried about using Breeze directly in Spark. For example, we can't absolutely control the release of Breeze. I sent a PR before. But it is stopping. I'd like to get your thoughts on it, community. https://github.com/apache/spark/pull/1964#issuecomment-54953348 Best, ----- -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Standardized-Distance-Functions-in-MLlib-tp8697.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org