[jira] [Commented] (FLINK-1745) Add k-nearest-neighbours algorithm to machine learning library

Raghav Chalapathy (JIRA) Tue, 21 Apr 2015 02:32:39 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504671#comment-14504671
 ]


Raghav Chalapathy commented on FLINK-1745:
------------------------------------------

Hi Chiwan, Till 

I totally agree with Chiwan and Till's idea to make it a generic trait Distance 
measure; to support various distance metrics such as Euclidean, Manhattan etc  

Going by the literature shared in
Approach 1 :  https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf

We must start of with the basic : 
Exact approaches HBNLJ, H-BRJ, ( Compare the cost )
Approximate approach : zkNN 

As a part of implementation homework I was going through some of the 
implementations and stumbled accross this one 
https://github.com/codeAshu/SparkAlgorithms/tree/master/mllib/src/main/scala/org/sparkalgos/mllib/join

Issue has be considered here : 
https://issues.apache.org/jira/browse/SPARK-2335

Approach 2: The paper they have referred to is : 
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5447837&tag=1&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5447837%26tag%3D1

My Question is : Have we done a comparision of Approach 1 and Approach  2 ? 
Should we perform a comparision study going forward ?

Raghav



> Add k-nearest-neighbours algorithm to machine learning library
> --------------------------------------------------------------
>
>                 Key: FLINK-1745
>                 URL: https://issues.apache.org/jira/browse/FLINK-1745
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Chiwan Park
>              Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression.
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1745) Add k-nearest-neighbours algorithm to machine learning library

Reply via email to