[ https://issues.apache.org/jira/browse/MADLIB-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762095#comment-16762095 ]
Frank McQuillan commented on MADLIB-1061: ----------------------------------------- Suggest change the algorithm to identify the sorted list of region centroids from each point, then search the closest number of regions as per the `leaf_node` param. It means if user specified `leaf_node=6` and there are 8 total leaf nodes, then 6/8 of the leaf nodes will be searched for each point. Previously, we only searched adjacent leaf nodes which might be less than 6, depending on number of features. Also, input checks to add: - throw error if algo input is not asking for either kd-tree or brute force - depth needs to be >0 - leaf_nodes must be >0 Also, algo options are 'brute_force' of 'kd_tree'. Can also use partial strings like 'b' or 'kd' to select the algorithm. There is existing code elsewhere in MADlib you can use for this since we do it elsewhere > Additional computation methods for k-NN - kd tree > ------------------------------------------------- > > Key: MADLIB-1061 > URL: https://issues.apache.org/jira/browse/MADLIB-1061 > Project: Apache MADlib > Issue Type: New Feature > Components: k-NN > Reporter: Frank McQuillan > Assignee: Orhan Kislal > Priority: Major > Labels: starter > Fix For: v1.16 > > Attachments: KNN-chart-data.pdf, KNN-charts.pdf, KNN-raw.pdf, > KNN-w-KD-tree-leaf-node-only.pdf, Sheet1-KNN-perf-num-features.pdf, > Sheet2-KNN-tree-construction.pdf, Sheet3-KNN-tree-depth.pdf > > > Follow on to > https://issues.apache.org/jira/browse/MADLIB-927 > which uses brute force. > Determine other k-NN algos to implement. From > http://scikit-learn.org/stable/modules/neighbors.html > candidates are: > * K-D Tree > * Ball Tree > * Other? > This JIRA is to implement K-D tree. -- This message was sent by Atlassian JIRA (v7.6.3#76005)