Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 I'm not sure what this is doing: ``` %%sql DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table of training data 'array[99.]::int[] || array[99]', -- Col name of training data 'id', -- Col name of id in train data 'label', -- Training labels 'knn_test_data', -- Table of test data 'data', -- Col name of test data 'id', -- Col name of id in test data 'knn_result_classification', -- Output table 1, -- Number of nearest neighbors True, -- True to list nearest-neighbors by id 'madlib.squared_dist_norm2' -- Distance function ); SELECT * from knn_result_classification ORDER BY id; ``` produces ``` id | data | prediction | k_nearest_neighbours ----+---------+------------+---------------------- 1 | {2,1} | 0 | {8} 2 | {2,6} | 0 | {8} 3 | {15,40} | 0 | {8} 4 | {12,1} | 0 | {8} 5 | {2,90} | 1 | {1} 6 | {50,45} | 1 | {1} (6 rows) ``` I get the same result if I do: ``` DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table of training data 'array[0.]::int[] || array[0]', -- Col name of training data 'id', -- Col name of id in train data 'label', -- Training labels 'knn_test_data', -- Table of test data 'data', -- Col name of test data 'id', -- Col name of id in test data 'knn_result_classification', -- Output table 1, -- Number of nearest neighbors True, -- True to list nearest-neighbors by id 'madlib.squared_dist_norm2' -- Distance function ); SELECT * from knn_result_classification ORDER BY id; ``` gives ``` id | data | prediction | k_nearest_neighbours ----+---------+------------+---------------------- 1 | {2,1} | 0 | {8} 2 | {2,6} | 0 | {8} 3 | {15,40} | 0 | {8} 4 | {12,1} | 0 | {8} 5 | {2,90} | 1 | {1} 6 | {50,45} | 1 | {1} (6 rows) ```
---