[ 
https://issues.apache.org/jira/browse/FLINK-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227980#comment-15227980
 ] 

Till Rohrmann commented on FLINK-1934:
--------------------------------------

Hi Daniel,

these are good news. I think your proposed plan to clean up your code,
opening a PR and then later add optimizations via new PRs sounds good.

What we could do for small input data sets is to broadcast them to the
training data set instead of using the cross operation. This should be more
efficient. Moreover, I would assume that the prediction data set is often
smaller than the training data set assuming that one calculates the
prediction for incoming events.

Have you already compared the accuracy of the exact and approximative knn
implementation? This could be interesting.

On Wed, Apr 6, 2016 at 5:08 AM, Daniel Blazevski (JIRA) <[email protected]>



> Add approximative k-nearest-neighbours (kNN) algorithm to machine learning 
> library
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-1934
>                 URL: https://issues.apache.org/jira/browse/FLINK-1934
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Daniel Blazevski
>              Labels: ML
>
> kNN is still a widely used algorithm for classification and regression. 
> However, due to the computational costs of an exact implementation, it does 
> not scale well to large amounts of data. Therefore, it is worthwhile to also 
> add an approximative kNN implementation as proposed in [1,2].  Reference [3] 
> is cited a few times in [1], and gives necessary background on the z-value 
> approach.
> Resources:
> [1] https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf
> [2] http://www.computer.org/csdl/proceedings/wacv/2007/2794/00/27940028.pdf
> [3] http://cs.sjtu.edu.cn/~yaobin/papers/icde10_knn.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to