[ https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544997#comment-14544997 ]
Chiwan Park commented on FLINK-1745: ------------------------------------ Hello, All. I pushed a draft of exact kNN Join in my github repository. (https://github.com/chiwanpark/flink/tree/FLINK-1745) There are some issues in this draft: # Because the {{Vector}} trait doesn't implement {{Comparable}} interface, we cannot use {{Vector}} as a key. But we need to group data by {{Vector}}, so I added a UUID string with each given vector. (I cannot add a {{java.util.UUID}} object because of lack of non-args constructor in {{java.util.UUID}}.) I'm concerned about overhead of the UUID string. # Currently, the implementation is block nested loop kNN join. (called H-BNLJ in the given paper) We can use R-Tree to improve performance in nested loop stage. But I cannot find a R-Tree implementation with multi-dimension in scala. The implementation of H-BRJ in the paper uses ELKI Framework (http://elki.dbs.ifi.lmu.de), but I think we cannot use it because it is under AGPLv3 license. How can I solve this issues? If you tell your opinion to me, I really appreciate it. Regards Chiwan Park > Add exact k-nearest-neighbours algorithm to machine learning library > -------------------------------------------------------------------- > > Key: FLINK-1745 > URL: https://issues.apache.org/jira/browse/FLINK-1745 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Till Rohrmann > Assignee: Chiwan Park > Labels: ML, Starter > > Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial > it is still used as a mean to classify data and to do regression. This issue > focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as > proposed in [2]. > Could be a starter task. > Resources: > [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm] > [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf] -- This message was sent by Atlassian JIRA (v6.3.4#6332)