[ https://issues.apache.org/jira/browse/MADLIB-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan updated MADLIB-1370: ------------------------------------ Description: In unsupervised mode of knn http://madlib.apache.org/docs/latest/group__grp__knn.html when `point_source` and `test_source` are the same data set, nearest neighbors is not reliably returning the 0 distance point as a nearest neighbor. Could there a small neg issue here? Also, please assess if we can add a vector of distances to the output file {code} Output Format The output of the KNN module is a table with the following columns: id INTEGER. The ids of test data points. test_column_name DOUBLE PRECISION[]. The test data points. prediction INTEGER. Label in case of classification, average value in case of regression. k_nearest_neighbours INTEGER[]. List of nearest neighbors, sorted closest to furthest from the corresponding test point. {code} which could help trouble shoot this for users in the future was: In unsupervised mode of knn http://madlib.apache.org/docs/latest/group__grp__knn.html when `point_source` and `test_source` are the same data set, nearest neighbors is not reliably returning the 0 distance point as a nearest neighbor. Could there a small neg issue here? Also, please assess if we can add a vector of distances to the output file {code} Output Format The output of the KNN module is a table with the following columns: id INTEGER. The ids of test data points. test_column_name DOUBLE PRECISION[]. The test data points. prediction INTEGER. Label in case of classification, average value in case of regression. k_nearest_neighbours INTEGER[]. List of nearest neighbors, sorted closest to furthest from the corresponding test point. {code} which could help trouble shoot this > Knn in unsupervised mode not producing consistent results > --------------------------------------------------------- > > Key: MADLIB-1370 > URL: https://issues.apache.org/jira/browse/MADLIB-1370 > Project: Apache MADlib > Issue Type: Bug > Components: k-NN > Reporter: Frank McQuillan > Priority: Major > Fix For: v1.17 > > > In unsupervised mode of knn > http://madlib.apache.org/docs/latest/group__grp__knn.html > when `point_source` and `test_source` are the same data set, nearest > neighbors is not reliably returning the 0 distance point as a nearest > neighbor. > Could there a small neg issue here? > Also, please assess if we can add a vector of distances to the output file > {code} > Output Format > The output of the KNN module is a table with the following columns: > id INTEGER. The ids of test data points. > test_column_name DOUBLE PRECISION[]. The test data points. > prediction INTEGER. Label in case of classification, average value in case > of regression. > k_nearest_neighbours INTEGER[]. List of nearest neighbors, sorted closest to > furthest from the corresponding test point. > {code} > which could help trouble shoot this for users in the future -- This message was sent by Atlassian JIRA (v7.6.14#76016)