[
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763121#comment-15763121
]
ASF GitHub Bot commented on MADLIB-927:
---
GitHub user auonhaidar opened a pull request:
https://github.com/apache/incubator-madlib/pull/81
JIRA: MADLIB-927 Changes made in KNN-help message-test cases-etc
KNN Added
Usage:
select * from madlib.knn()
select * from madlib.knn('help')
select * from
madlib.knn('knn_train_data','data','label','knn_test_data','data','id','knn_results','c',3)
select * from
madlib.knn('knn_train_data','data','label','knn_test_data','data','id','knn_results','r',3)
select * from
madlib.knn('knn_train_data','data','label','knn_test_data','data','id','knn_results','c')
You need to enter following arguments in order:
Argument 1: Training data table having training features as vector column
and labels
Argument 2: Name of column having feature vectors in training data table
Argument 3: Name of column having actual label/vlaue for corresponding
feature vector in training data table
Argument 4: Test data table having features as vector column. Id of
features is mandatory
Argument 5: Name of column having feature vectors in test data table
Argument 6: Name of column having feature vector Ids in test data table
Argument 7: Name of output table
Argument 8: c for classification task, r for regression task
Argument 9: value of k. Default will go as 1';
test file added
changes made in main sql file and python file.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/auonhaidar/incubator-madlib features/knn
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-madlib/pull/81.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #81
commit b1a8d103cf617d0332b6a3289460a4ef5de09df6
Author: auonhaidar
Date: 2016-12-13T02:09:12Z
KNN Added
commit 22db2e1a6f75826c3966771bb90a4f4607c29bb8
Author: auonhaidar
Date: 2016-12-20T03:36:40Z
JIRA: MADLIB-927 Changes made in KNN-help message-test cases-etc
> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
> Issue Type: New Feature
>Reporter: Rahul Iyer
> Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors
> of data points in a metric feature space according to a specified distance
> function. It is considered one of the canonical algorithms of data science.
> It is a nonparametric method, which makes it applicable to a lot of
> real-world problems where the data doesn’t satisfy particular distribution
> assumptions. It can also be implemented as a lazy algorithm, which means
> there is no training phase where information in the data is condensed into
> coefficients, but there is a costly testing phase where all data (or some
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k
> nearest neighbors by going through all points.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)