[jira] [Comment Edited] (MADLIB-927) Initial implementation of k-NN

2016-02-29 Thread Tianwei Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173187#comment-15173187
 ] 

Tianwei Shen edited comment on MADLIB-927 at 3/1/16 3:44 AM:
-

Hi Sir,
I am Tianwei, a second-year Ph.D. student in HKUST. I am interested in this 
proposal and have implemented a prototype of naive k-nn in one of my projects, 
libvot(https://github.com/hlzz/libvot). See the source code for my 
implementation of k-nn here 
(https://github.com/hlzz/libvot/blob/master/src/vocab_tree/clustering.cpp), 
which support multi-thread processing using native c++11 support. This project 
is an implementation of vocabulary tree, which is a image retrieval algorithm 
widely used. I think this issue best suits my skill sets, so I would like to 
discuss with you for things like "where should I put it", "how should I adapt 
to the interface of MADlib" sort of stuffs, in greater depth. Thanks.



was (Author: tianwei37):
Hi Sir,
I am Tianwei, a second-year Ph.D. student in HKUST. I am interested in this 
proposal and have implemented a prototype of naive k-nn in one of my projects, 
libvot(https://github.com/hlzz/libvot). See the source code for my 
implementation of k-nn here 
(https://github.com/hlzz/libvot/blob/master/src/vocab_tree/clustering.cpp), 
which support multi-thread processing using native c++11 support. This project 
is an implementation of vocabulary tree, which is a image retrieval algorithm 
widely used. I think this issue best suits my skill sets, so I would like to 
discuss with you in greater depth. Thanks.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a very simple algorithm that is based on finding 
> nearest neighbors of data points in a metric feature space according to a 
> specified distance function. It is considered one of the canonical algorithms 
> of data science. It is a nonparametric method, which makes it applicable to a 
> lot of real-world problems, where the data doesn’t satisfy particular 
> distribution assumptions. Also, it can be implemented as a lazy algorithm, 
> which means there is no training phase where information in the data is 
> condensed into coefficients, but there is a costly testing phase where all 
> data is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-02-29 Thread Tianwei Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173187#comment-15173187
 ] 

Tianwei Shen commented on MADLIB-927:
-

Hi Sir,
I am Tianwei, a second-year Ph.D. student in HKUST. I am interested in this 
proposal and have implemented a prototype of naive k-nn in one of my projects, 
libvot(https://github.com/hlzz/libvot). See the source code for my 
implementation of k-nn here 
(https://github.com/hlzz/libvot/blob/master/src/vocab_tree/clustering.cpp), 
which support multi-thread processing using native c++11 support. This project 
is an implementation of vocabulary tree, which is a image retrieval algorithm 
widely used. I think this issue best suits my skill sets, so I would like to 
discuss with you in greater depth. Thanks.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a very simple algorithm that is based on finding 
> nearest neighbors of data points in a metric feature space according to a 
> specified distance function. It is considered one of the canonical algorithms 
> of data science. It is a nonparametric method, which makes it applicable to a 
> lot of real-world problems, where the data doesn’t satisfy particular 
> distribution assumptions. Also, it can be implemented as a lazy algorithm, 
> which means there is no training phase where information in the data is 
> condensed into coefficients, but there is a costly testing phase where all 
> data is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)