Hi all, 

In my limited understanding of the MLlib, it is a good idea to use the
various distance functions on some machine learning algorithms. For example,
we can only use Euclidean distance metric in KMeans. And I am tackling with
contributing hierarchical clustering to MLlib
(https://issues.apache.org/jira/browse/SPARK-2429). I would like to support
the various distance functions in it.

Should we support the standardized distance function in MLlib or not?
You know, Spark depends on Breeze. So I think we have two approaches in
order to use distance functions in MLlib. One is implementing some distance
functions in MLlib. The other is wrapping the functions of Breeze. And I am
a bit worried about using Breeze directly in Spark. For example,  we can't
absolutely control the release of Breeze. 

I sent a PR before. But it is stopping. I'd like to get your thoughts on it,
community.
https://github.com/apache/spark/pull/1964#issuecomment-54953348

Best,



-----
-- Yu Ishikawa
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Standardized-Distance-Functions-in-MLlib-tp8697.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to