[ 
https://issues.apache.org/jira/browse/MADLIB-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762095#comment-16762095
 ] 

Frank McQuillan commented on MADLIB-1061:
-----------------------------------------

Suggest change the algorithm to identify the sorted list of region centroids 
from each point,
then search the closest number of regions as per the `leaf_node` param.  It 
means if user specified `leaf_node=6` and there are 8 total leaf nodes, then 
6/8 of the leaf nodes will be searched for each point.  Previously, we only 
searched adjacent leaf nodes which might be less than 6, depending on number of 
features.

Also, input checks to add:
- throw error if algo input is not asking for either kd-tree or brute force
- depth needs to be >0 
- leaf_nodes must be >0

Also, algo options are 'brute_force' of 'kd_tree'. Can also use partial strings 
like 'b' or 'kd' to select the algorithm. There is existing code elsewhere in 
MADlib you can use for this since we do it elsewhere

> Additional computation methods for k-NN - kd tree
> -------------------------------------------------
>
>                 Key: MADLIB-1061
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1061
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: k-NN
>            Reporter: Frank McQuillan
>            Assignee: Orhan Kislal
>            Priority: Major
>              Labels: starter
>             Fix For: v1.16
>
>         Attachments: KNN-chart-data.pdf, KNN-charts.pdf, KNN-raw.pdf, 
> KNN-w-KD-tree-leaf-node-only.pdf, Sheet1-KNN-perf-num-features.pdf, 
> Sheet2-KNN-tree-construction.pdf, Sheet3-KNN-tree-depth.pdf
>
>
> Follow on to
> https://issues.apache.org/jira/browse/MADLIB-927
> which uses brute force.
> Determine other k-NN algos to implement.  From 
> http://scikit-learn.org/stable/modules/neighbors.html
> candidates are:
> * K-D Tree
> * Ball Tree
> * Other?
> This JIRA is to implement K-D tree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to