Frank McQuillan created MADLIB-1370:
---------------------------------------

             Summary: Knn in unsupervised mode not producing consistent results
                 Key: MADLIB-1370
                 URL: https://issues.apache.org/jira/browse/MADLIB-1370
             Project: Apache MADlib
          Issue Type: Bug
          Components: k-NN
            Reporter: Frank McQuillan
             Fix For: v1.17


In unsupervised mode of knn 
http://madlib.apache.org/docs/latest/group__grp__knn.html
when `point_source` and `test_source` are the same data set, nearest neighbors 
is not reliably returning the 0 distance point as a nearest neighbor.

Could there a small neg issue here?

Also, please assess if we can add a vector of distances to the output file

{code}
Output Format
The output of the KNN module is a table with the following columns:

id      INTEGER. The ids of test data points.
test_column_name        DOUBLE PRECISION[]. The test data points.
prediction      INTEGER. Label in case of classification, average value in case 
of regression.
k_nearest_neighbours    INTEGER[]. List of nearest neighbors, sorted closest to 
furthest from the corresponding test point.
{code}
which could help trouble shoot this




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to