[ 
https://issues.apache.org/jira/browse/MADLIB-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1370:
------------------------------------
       Priority: Minor  (was: Major)
    Description: 
In unsupervised mode of knn 
http://madlib.apache.org/docs/latest/group__grp__knn.html
when `point_source` and `test_source` are the same data set, nearest neighbors 
is not reliably returning the 0 distance point as a nearest neighbor.

Could there a small neg issue here for a distance that is effectively 0 but 
shows up as neg epsilon?

Also, please assess if we can add a vector of distances to the output file:

{code}
Output Format
The output of the KNN module is a table with the following columns:

id      INTEGER. The ids of test data points.
test_column_name        DOUBLE PRECISION[]. The test data points.
prediction      INTEGER. Label in case of classification, average value in case 
of regression.
k_nearest_neighbours    INTEGER[]. List of nearest neighbors, sorted closest to 
furthest from the corresponding test point.
distance DOUBLE PRECISION[].  Distance sorted in the same order as the 
'k_nearest_neighbours' array.
{code}



  was:
In unsupervised mode of knn 
http://madlib.apache.org/docs/latest/group__grp__knn.html
when `point_source` and `test_source` are the same data set, nearest neighbors 
is not reliably returning the 0 distance point as a nearest neighbor.

Could there a small neg issue here for a distance that is effectively 0 but 
shows up as neg epsilon?

Also, please assess if we can add a vector of distances to the output file

{code}
Output Format
The output of the KNN module is a table with the following columns:

id      INTEGER. The ids of test data points.
test_column_name        DOUBLE PRECISION[]. The test data points.
prediction      INTEGER. Label in case of classification, average value in case 
of regression.
k_nearest_neighbours    INTEGER[]. List of nearest neighbors, sorted closest to 
furthest from the corresponding test point.
{code}
which could help trouble shoot this for users in the future


     Issue Type: Improvement  (was: Bug)
        Summary: Knn - add zero check and output distance array  (was: Knn in 
unsupervised mode not producing consistent results)

> Knn - add zero check and output distance array
> ----------------------------------------------
>
>                 Key: MADLIB-1370
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1370
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: k-NN
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.17
>
>
> In unsupervised mode of knn 
> http://madlib.apache.org/docs/latest/group__grp__knn.html
> when `point_source` and `test_source` are the same data set, nearest 
> neighbors is not reliably returning the 0 distance point as a nearest 
> neighbor.
> Could there a small neg issue here for a distance that is effectively 0 but 
> shows up as neg epsilon?
> Also, please assess if we can add a vector of distances to the output file:
> {code}
> Output Format
> The output of the KNN module is a table with the following columns:
> id    INTEGER. The ids of test data points.
> test_column_name      DOUBLE PRECISION[]. The test data points.
> prediction    INTEGER. Label in case of classification, average value in case 
> of regression.
> k_nearest_neighbours  INTEGER[]. List of nearest neighbors, sorted closest to 
> furthest from the corresponding test point.
> distance DOUBLE PRECISION[].  Distance sorted in the same order as the 
> 'k_nearest_neighbours' array.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to