[ https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504671#comment-14504671 ]
Raghav Chalapathy commented on FLINK-1745: ------------------------------------------ Hi Chiwan, Till I totally agree with Chiwan and Till's idea to make it a generic trait Distance measure; to support various distance metrics such as Euclidean, Manhattan etc Going by the literature shared in Approach 1 : https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf We must start of with the basic : Exact approaches HBNLJ, H-BRJ, ( Compare the cost ) Approximate approach : zkNN As a part of implementation homework I was going through some of the implementations and stumbled accross this one https://github.com/codeAshu/SparkAlgorithms/tree/master/mllib/src/main/scala/org/sparkalgos/mllib/join Issue has be considered here : https://issues.apache.org/jira/browse/SPARK-2335 Approach 2: The paper they have referred to is : http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5447837&tag=1&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5447837%26tag%3D1 My Question is : Have we done a comparision of Approach 1 and Approach 2 ? Should we perform a comparision study going forward ? Raghav > Add k-nearest-neighbours algorithm to machine learning library > -------------------------------------------------------------- > > Key: FLINK-1745 > URL: https://issues.apache.org/jira/browse/FLINK-1745 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Till Rohrmann > Assignee: Chiwan Park > Labels: ML, Starter > > Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial > it is still used as a mean to classify data and to do regression. > Could be a starter task. > Resources: > [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm] > [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf] -- This message was sent by Atlassian JIRA (v6.3.4#6332)