[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847838#comment-15847838
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user njayaram2 commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Go ahead and make the commit. I had a couple of changes to make, will open 
a PR on your branch for those changes.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: starter
> Fix For: v1.10
>
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847815#comment-15847815
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Hi NJ, Orhan
I am done with adding following validation cases:

- Check if train and test table is valid
- if columns specified are present in these tables
- if k>0 or not
- if k<= number of rows in train table or not
- Are feature column of array type or not
- Are NULL values present in these feature columns or not
- Is Id column of test table integer or not
- Is label valid (float, integer, boolean) or not


I will be committing these changes tomorrow.
Please suggest if I am leaving anything.



Auon





> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: starter
> Fix For: v1.10
>
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843758#comment-15843758
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Hey NJ,
I think the rebase is not happening in the desired way. I first pulled the 
changes from apache repo to my local master.
Output:

haidar@haidar-XPS-L501X:~/MADLIB-AUON/GIT/Madlib/incubator-madlib$ git log 
--graph --decorate --oneline --all
*   c069a42 (origin/features/knn) Merge pull request #1 from 
orhankislal/features/knn
|\  
| * d9fb5c0 KNN: Documentation updates
|/  
* 9a01440 JIRA: MADLIB-927 Documentation Added
* 29969c2 License added:Assertions added
* 573edc4 changes in knn function of knn_sql.in:distance calculation 
optimized:error messages
* 22db2e1 JIRA: MADLIB-927 Changes made in KNN-help message-test cases-etc
* b1a8d10 KNN Added
| * 0e00a27 (HEAD, origin/master, origin/HEAD, master) Include 
boost::format in MathToolkit_impl.hpp.
| * f7cb980 Madpack: Add password into connection args
| * 29acc53 Documentation: Fix misc errors
| * faec6be Reverses the changes to the madlib.mode function to maintain 
backwards compatibility
| * 13203ba Update dateformat in multiple install-checks
| * 9d04b7d Minor fixes
| * 8e5da2f Association Rules: Add rule counts and limit itemset size 
feature
| * e384c1f RF: Fixes the online help and example
| * 498c559 Graph: SSSP
| * 02a7ef4 PCA: Add grouping support to PCA
| * e0439ed Madpack: Disable psqlrc when executing queries
| * c564e31 Build: Update madpack versioning to include _ and +
| * 3cf3f67 Build: Exclude AggCheckCallContext for GPDB5
| * e75a944 Elastic Net: Add CV examples, clean user docs
| * 6f12264 CV: Fix order of validation output table columns
| * e1f37bb Utilities: Fix incorrect flag for distribution
| * 02f4602 DT and RF: Adds verbose option for the dot output format.
| * c56b209 Build: Correct madlib version in gppkg spec file
| * e43b449 New module: Encode categorical variables
| * d2289b0 Fixes the kmeans_state related bug
| * 6021f67 Minor error message corrections
| * b045f7e Adds cluster variance to kmeans for PivotalR support.
| * 6939fd6 Elastic net: Add cross validation
| * 38d1e87 Fix post process for gppkg to link to hyphenated directories
|/  
* 6138b00 Elastic Net: Add grouping support
* 21bec82 Build: Ensure gppkg version does not contain hyphen
* 82e56a4 Build: Fix version used in rpm installation
* 150459d Madpack: Disable unittest flag
* 39efdb9 Build: Fix madpack revision parsing
* ac1bcfa Assoc rules: Clean + elaborate documentation



 I then checked out my features/knn branch and ran 'git rebase master' but 
it showed: 
git rebase master
First, rewinding head to replay your work on top of it...
Applying: KNN Added
Using index info to reconstruct a base tree...
M   src/config/Modules.yml
:135: space before tab in indent.
DROP TABLE IF EXISTS pg_temp.knn_label;
:136: space before tab in indent.
CREATE TABLE pg_temp.knn_label(pid integer, predlabel float);
:138: trailing whitespace.

:142: trailing whitespace.

:159: trailing whitespace.

warning: squelched 4 whitespace errors
warning: 9 lines add whitespace errors.
Falling back to patching base and 3-way merge...
Auto-merging src/config/Modules.yml
Applying: JIRA: MADLIB-927 Changes made in KNN-help message-test cases-etc
Applying: changes in knn function of knn_sql.in:distance calculation 
optimized:error messages
Applying: License added:Assertions added
Applying: JIRA: MADLIB-927 Documentation Added
Applying: KNN: Documentation updates


And after that my repo looks like:

git log --graph --decorate --oneline --all
* 9cc0b0a (HEAD, features/knn) KNN: Documentation updates
* 8be68b9 JIRA: MADLIB-927 Documentation Added
* 35d976d License added:Assertions added
* 67b466f changes in knn function of knn_sql.in:distance calculation 
optimized:error messages
* a718a1e JIRA: MADLIB-927 Changes made in KNN-help message-test cases-etc
* 6922da1 KNN Added
* 0e00a27 (origin/master, origin/HEAD, master) Include boost::format in 
MathToolkit_impl.hpp.
* f7cb980 Madpack: Add password into connection args
* 29acc53 Documentation: Fix misc errors
* faec6be Reverses the changes to the madlib.mode function to maintain 
backwards compatibility
* 13203ba Update dateformat in multiple install-checks
* 9d04b7d Minor fixes
* 8e5da2f Association Rules: Add rule counts and limit itemset size feature
* e384c1f RF: Fixes the online help and example
   

[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843611#comment-15843611
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Cool. I will have a look and start with the implementations.
Thanks NJ!


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843371#comment-15843371
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
I think you have already covered a lot of validation cases @njayaram2 . I 
will work on that and If I get stuck somewhere I will let  you know. Meanwhile, 
could you please point me to the python files that have examples of such 
functions you were talking about? That will save me a lot of time.
Thanks!


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840874#comment-15840874
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Sure NJ. But I will be free from my work after 5 tomorrow. Would that work 
for you?


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832385#comment-15832385
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Sure NJ, Orhan,
Thanks!




Auon


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832143#comment-15832143
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Hi Auon,
My suggestion is to give them a try and if you agree with the content, 
merge them. 
Here is a small list of validations (I know you covered some of them in the 
code):
- Every input should be checked for null
- Every string should be checked for empty string ''
- Columns should exist in their respective tables
- Input Tables should not be empty
- Output tables should not exist
Thanks
Orhan


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831093#comment-15831093
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Hi Orhan,
Thanks!
Should I merge these changes?
I will try to look for the validations you were talking about. Could you 
specifically tell what kind of checks do I need to add?

Regards
Auon


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831034#comment-15831034
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Hi Auon,

I created a pull request for your branch that alters the docs as well as 
the online help. We will have to improve the input validation a little bit. If 
the user gives an invalid column name, we should be able to display a proper 
error. You might want to take a look at `validate_pivot_coding` function in the 
`pivot.py_in` for various cases to test.

Thanks

Orhan


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829113#comment-15829113
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Hi Orhan,
I have added the documentation. Please have a look. I did not compile it 
because of my system issues.


Regards


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819825#comment-15819825
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
I ran this command inside build:
$ du -h doc/


4.0Kdoc/design/figures
4.0Kdoc/design/modules
20K doc/design/CMakeFiles/auxclean.dir
44K doc/design/CMakeFiles/design_ps.dir
20K doc/design/CMakeFiles/html.dir
20K doc/design/CMakeFiles/design_html.dir
20K doc/design/CMakeFiles/design.dir
28K doc/design/CMakeFiles/design_auxclean.dir
40K doc/design/CMakeFiles/design_dvi.dir
20K doc/design/CMakeFiles/pdf.dir
20K doc/design/CMakeFiles/safepdf.dir
20K doc/design/CMakeFiles/ps.dir
20K doc/design/CMakeFiles/design_safepdf.dir
40K doc/design/CMakeFiles/design_pdf.dir
20K doc/design/CMakeFiles/dvi.dir
344Kdoc/design/CMakeFiles
4.0Kdoc/design/other-chapters
380Kdoc/design
12K doc/bin/CMakeFiles
36K doc/bin
8.0Kdoc/imgs
20K doc/CMakeFiles/update_mathjax.dir
40K doc/CMakeFiles/doxysql.dir
20K doc/CMakeFiles/devdoc.dir
20K doc/CMakeFiles/doc.dir
112Kdoc/CMakeFiles
12K doc/etc/CMakeFiles
152Kdoc/etc
720Kdoc/


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819810#comment-15819810
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Not sure how to tackle. It is interesting that you don't get any actual 
errors but a simple confirmation. It seems them makefile (generated by cmake) 
doesn't even try to build anything. Could you paste the results of `du -h 
doc/`? Maybe the folder sizes will point us to somewhere. For reference, here 
is my output (taken right after the cmake)
```
du -h doc/
8.0Kdoc//bin/CMakeFiles
 28Kdoc//bin
 16Kdoc//CMakeFiles/devdoc.dir
 16Kdoc//CMakeFiles/doc.dir
 36Kdoc//CMakeFiles/doxysql.dir
 16Kdoc//CMakeFiles/update_mathjax.dir
 92Kdoc//CMakeFiles
 16Kdoc//design/CMakeFiles/auxclean.dir
 16Kdoc//design/CMakeFiles/design.dir
 20Kdoc//design/CMakeFiles/design_auxclean.dir
 36Kdoc//design/CMakeFiles/design_dvi.dir
 16Kdoc//design/CMakeFiles/design_html.dir
 36Kdoc//design/CMakeFiles/design_pdf.dir
 36Kdoc//design/CMakeFiles/design_ps.dir
 16Kdoc//design/CMakeFiles/design_safepdf.dir
 16Kdoc//design/CMakeFiles/dvi.dir
 16Kdoc//design/CMakeFiles/html.dir
 16Kdoc//design/CMakeFiles/pdf.dir
 16Kdoc//design/CMakeFiles/ps.dir
 16Kdoc//design/CMakeFiles/safepdf.dir
280Kdoc//design/CMakeFiles
  0Bdoc//design/figures
  0Bdoc//design/modules
  0Bdoc//design/other-chapters
300Kdoc//design
8.0Kdoc//etc/CMakeFiles
144Kdoc//etc
4.0Kdoc//imgs
596Kdoc/
```


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819771#comment-15819771
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Yes.
Then I ran make and then make doc. Its says 'up to date'.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819759#comment-15819759
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
And the output of `make doc` is still the same?


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819746#comment-15819746
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
If you start with a completely empty folder, what is the output of `cmake 
../`?


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819738#comment-15819738
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
I installed doxygen and latex2html. I ran 'make' and then 'make doc'. But 
still I couldn't see folder /doc/user/html/



> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819342#comment-15819342
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Okay.
Then, I will install try installing Doxygen and let you know.

Thanks!


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819328#comment-15819328
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
You'll need doxygen in addition to latex to compile the docs. 


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819305#comment-15819305
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
It runs with the following output:

_"cmake version 2.8.12.2
Usage

  cmake [options] 
  cmake [options] 

Options
  -C   = Pre-load a script to populate the cache.
..."_




> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819242#comment-15819242
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
It is under `doc/user/html` folder. Make sure to compile the code itself 
with `make` before `make doc`.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819224#comment-15819224
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Oh sorry, I meant run it in the build folder where you run `make`.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819191#comment-15819191
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Yes, the section that starts with `@addtogroup` is the documentation that 
will be reflected on the website when the pr is merged in the the repo. You 
will need latex installed on your machine as well as a gnu gcc (Apple's 
compiler doesn't work). You can start by a copy-paste from an existing module 
and replace the content as needed. The doc is compiled by `make doc` command 
and the output html files will be in the build folder for inspection. If the 
command doesn't work you can still submit the changes so that I can compile and 
alter it if needed. 
I really appreciate your contribution in this regard. I know writing the 
docs is a boring job but it is very important for the usability of MADlib. 


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819116#comment-15819116
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Hi,
What is documentation in pivot.sql.in? Is it the lines written as comments 
after m4_include(`SQLCommon.m4')?
How is this thing compiled? How can I see how will it look on website?




> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813253#comment-15813253
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Yes, I just pulled them, I can see the licenses you added. I see there is a 
madlib aggregate called mode (in utilities.sql_in). That and an altered search 
path on my end might be the issue.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2017-01-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813245#comment-15813245
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on the issue:

https://github.com/apache/incubator-madlib/pull/81
  
Hi Orhan Kislal,
No, it should work. Even I am using 9.4 postgres. I pushed some more 
changes 11 days ago. Are you using that version?


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-12-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783820#comment-15783820
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/81#discussion_r94083157
  
--- Diff: src/ports/postgres/modules/knn/test/knn.sql_in ---
@@ -0,0 +1,41 @@
+m4_include(`SQLCommon.m4')
+/* 
-
+ * Test knn.
+ *
+ * FIXME: Verify results
--- End diff --

I got the license, thanks!
For assertions, I was trying doing that yesterday. It was not working.
For example, I tried doing 
SELECT assert(3 = 3, 'Wrong output in pivoting');
in postgres prompt
and it says ''HINT:  No function matches the given name and argument types. 
You might need to add explicit type casts."
Can you tell what is happening here. I am using postgres 9.4.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-12-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783786#comment-15783786
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/81#discussion_r94081753
  
--- Diff: src/ports/postgres/modules/knn/test/knn.sql_in ---
@@ -0,0 +1,41 @@
+m4_include(`SQLCommon.m4')
+/* 
-
+ * Test knn.
+ *
+ * FIXME: Verify results
--- End diff --

You can take a look at the pivot function in the utilities folder for an 
example of assertion as well as the necessary license text for sql and py files.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-12-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781225#comment-15781225
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/81#discussion_r93969163
  
--- Diff: src/ports/postgres/modules/knn/test/knn.sql_in ---
@@ -0,0 +1,41 @@
+m4_include(`SQLCommon.m4')
+/* 
-
+ * Test knn.
+ *
+ * FIXME: Verify results
--- End diff --

You mean to say that I should include assert statements in this 
test/knn.sql_in file in order to validate results, right?


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-12-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781217#comment-15781217
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user auonhaidar commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/81#discussion_r93968700
  
--- Diff: src/ports/postgres/modules/knn/knn.sql_in ---
@@ -0,0 +1,165 @@
+/* --- 
*//**
+ *
+ * @file knn.sql_in
+ *
+ * @brief Set of functions for k-nearest neighbors.
+ *
+ *
+ *//* 
--- */
+
+m4_include(`SQLCommon.m4')
+
+DROP TYPE IF EXISTS MADLIB_SCHEMA.knn_result CASCADE;
+CREATE TYPE MADLIB_SCHEMA.knn_result AS (
+prediction float
+);
+DROP TYPE IF EXISTS MADLIB_SCHEMA.test_table_spec CASCADE;
+CREATE TYPE MADLIB_SCHEMA.test_table_spec AS (
+id integer,
+vector DOUBLE PRECISION[]
+);
+
+CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__knn_validate_src(
+rel_source VARCHAR
+) RETURNS VOID AS $$
+PythonFunction(knn, knn, knn_validate_src)
+$$ LANGUAGE plpythonu
+m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
+
+
+CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
+arg1 VARCHAR
+) RETURNS VOID AS $$
+BEGIN
+IF arg1 = 'help' THEN
+   RAISE NOTICE 'You need to enter following arguments in order:
+   Argument 1: Training data table having training features as vector 
column and labels
+   Argument 2: Name of column having feature vectors in training data table
+   Argument 3: Name of column having actual label/vlaue for corresponding 
feature vector in training data table
+   Argument 4: Test data table having features as vector column. Id of 
features is mandatory
+   Argument 5: Name of column having feature vectors in test data table
+   Argument 6: Name of column having feature vector Ids in test data table
+   Argument 7: Name of output table
+   Argument 8: c for classification task, r for regression task
+   Argument 9: value of k. Default will go as 1';
+END IF;
+END;
+$$ LANGUAGE plpgsql VOLATILE
+m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
+
+CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
+) RETURNS VOID AS $$
+BEGIN
+EXECUTE $sql$ select * from MADLIB_SCHEMA.knn('help') $sql$;
+END;
+$$ LANGUAGE plpgsql VOLATILE
+m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
+
+
+CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
+point_source VARCHAR,
+point_column_name VARCHAR,
+label_column_name VARCHAR,
+test_source VARCHAR,
+test_column_name VARCHAR,
+id_column_name VARCHAR,
+output_table VARCHAR,
+operation VARCHAR,
+k INTEGER
+) RETURNS VARCHAR AS $$
+DECLARE
+class_test_source REGCLASS;
+class_point_source REGCLASS;
+l FLOAT;
+id INTEGER;
+vector DOUBLE PRECISION[];
+cur_pid integer;
+theResult MADLIB_SCHEMA.knn_result;
+r MADLIB_SCHEMA.test_table_spec;
+oldClientMinMessages VARCHAR;
+returnstring VARCHAR;
+BEGIN
+oldClientMinMessages :=
+(SELECT setting FROM pg_settings WHERE name = 
'client_min_messages');
+EXECUTE 'SET client_min_messages TO warning';
+PERFORM MADLIB_SCHEMA.__knn_validate_src(test_source);
+PERFORM MADLIB_SCHEMA.__knn_validate_src(point_source);
+class_test_source := test_source;
+class_point_source := point_source;
+--checks
+IF (k <= 0) THEN
+RAISE EXCEPTION 'KNN error: Number of neighbors k must be a 
positive integer.';
+END IF;
+IF (operation != 'c' AND operation != 'r') THEN
+RAISE EXCEPTION 'KNN error: put r for regression OR c for 
classification.';
+END IF;
+PERFORM MADLIB_SCHEMA.create_schema_pg_temp();
+
+EXECUTE format('DROP TABLE IF EXISTS %I',output_table);
+EXECUTE format('CREATE TABLE %I(%I integer, %I DOUBLE PRECISION[], 
predlabel float)',output_table,id_column_name,test_column_name);
+   
+
+FOR r IN EXECUTE format('SELECT %I,%I FROM %I', id_column_name, 
test_column_name, test_source)
+LOOP
+   cur_pid := r.id;
+   vector := r.vector;
+   EXECUTE
+$sql$
+   DROP TABLE IF EXISTS pg_temp.knn_vector;
--- End diff --

Oh. Thanks! I get it now.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  

[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-12-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773398#comment-15773398
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/81#discussion_r93790128
  
--- Diff: src/ports/postgres/modules/knn/test/knn.sql_in ---
@@ -0,0 +1,41 @@
+m4_include(`SQLCommon.m4')
+/* 
-
+ * Test knn.
+ *
+ * FIXME: Verify results
--- End diff --

This file is used when you run the install-check. Since the dataset is 
small you can calculate the correct results by hand (or using some other knn 
implementation from python, R etc.) and then run an assertion function to 
ensure the result is correct. 

Since many functions are interconnected, using an install check helps us to 
identify problems faster. Assume that somebody changed the `squared_dist_norm2` 
function implementation for  some reason and it started to give incorrect 
results. This will cause the knn install-check to fail and lead us to more 
investigation.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768622#comment-15768622
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/81#discussion_r93548578
  
--- Diff: src/ports/postgres/modules/knn/test/knn.sql_in ---
@@ -0,0 +1,41 @@
+m4_include(`SQLCommon.m4')
+/* 
-
+ * Test knn.
+ *
+ * FIXME: Verify results
--- End diff --

We can use the assert function for verifying the results.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768623#comment-15768623
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user orhankislal commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/81#discussion_r93548165
  
--- Diff: src/ports/postgres/modules/knn/knn.sql_in ---
@@ -0,0 +1,165 @@
+/* --- 
*//**
+ *
+ * @file knn.sql_in
+ *
+ * @brief Set of functions for k-nearest neighbors.
+ *
+ *
+ *//* 
--- */
+
+m4_include(`SQLCommon.m4')
+
+DROP TYPE IF EXISTS MADLIB_SCHEMA.knn_result CASCADE;
+CREATE TYPE MADLIB_SCHEMA.knn_result AS (
+prediction float
+);
+DROP TYPE IF EXISTS MADLIB_SCHEMA.test_table_spec CASCADE;
+CREATE TYPE MADLIB_SCHEMA.test_table_spec AS (
+id integer,
+vector DOUBLE PRECISION[]
+);
+
+CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__knn_validate_src(
+rel_source VARCHAR
+) RETURNS VOID AS $$
+PythonFunction(knn, knn, knn_validate_src)
+$$ LANGUAGE plpythonu
+m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
+
+
+CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
+arg1 VARCHAR
+) RETURNS VOID AS $$
+BEGIN
+IF arg1 = 'help' THEN
+   RAISE NOTICE 'You need to enter following arguments in order:
+   Argument 1: Training data table having training features as vector 
column and labels
+   Argument 2: Name of column having feature vectors in training data table
+   Argument 3: Name of column having actual label/vlaue for corresponding 
feature vector in training data table
+   Argument 4: Test data table having features as vector column. Id of 
features is mandatory
+   Argument 5: Name of column having feature vectors in test data table
+   Argument 6: Name of column having feature vector Ids in test data table
+   Argument 7: Name of output table
+   Argument 8: c for classification task, r for regression task
+   Argument 9: value of k. Default will go as 1';
+END IF;
+END;
+$$ LANGUAGE plpgsql VOLATILE
+m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
+
+CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
+) RETURNS VOID AS $$
+BEGIN
+EXECUTE $sql$ select * from MADLIB_SCHEMA.knn('help') $sql$;
+END;
+$$ LANGUAGE plpgsql VOLATILE
+m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
+
+
+CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
+point_source VARCHAR,
+point_column_name VARCHAR,
+label_column_name VARCHAR,
+test_source VARCHAR,
+test_column_name VARCHAR,
+id_column_name VARCHAR,
+output_table VARCHAR,
+operation VARCHAR,
+k INTEGER
+) RETURNS VARCHAR AS $$
+DECLARE
+class_test_source REGCLASS;
+class_point_source REGCLASS;
+l FLOAT;
+id INTEGER;
+vector DOUBLE PRECISION[];
+cur_pid integer;
+theResult MADLIB_SCHEMA.knn_result;
+r MADLIB_SCHEMA.test_table_spec;
+oldClientMinMessages VARCHAR;
+returnstring VARCHAR;
+BEGIN
+oldClientMinMessages :=
+(SELECT setting FROM pg_settings WHERE name = 
'client_min_messages');
+EXECUTE 'SET client_min_messages TO warning';
+PERFORM MADLIB_SCHEMA.__knn_validate_src(test_source);
+PERFORM MADLIB_SCHEMA.__knn_validate_src(point_source);
+class_test_source := test_source;
+class_point_source := point_source;
+--checks
+IF (k <= 0) THEN
+RAISE EXCEPTION 'KNN error: Number of neighbors k must be a 
positive integer.';
+END IF;
+IF (operation != 'c' AND operation != 'r') THEN
+RAISE EXCEPTION 'KNN error: put r for regression OR c for 
classification.';
+END IF;
+PERFORM MADLIB_SCHEMA.create_schema_pg_temp();
+
+EXECUTE format('DROP TABLE IF EXISTS %I',output_table);
+EXECUTE format('CREATE TABLE %I(%I integer, %I DOUBLE PRECISION[], 
predlabel float)',output_table,id_column_name,test_column_name);
+   
+
+FOR r IN EXECUTE format('SELECT %I,%I FROM %I', id_column_name, 
test_column_name, test_source)
+LOOP
--- End diff --

This loop forces us to scan the table multiple times which is very costly. 
We might be able to collapse this into a single level of sql calls. For 
example, here is a code that finds the 2 closest points (ids and distances) for 
every test point (assuming you are using the tables from the test code):
`
select * from (
select row_number() over (partition by test_id order by 

[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763121#comment-15763121
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

GitHub user auonhaidar opened a pull request:

https://github.com/apache/incubator-madlib/pull/81

JIRA: MADLIB-927 Changes made in KNN-help message-test cases-etc

KNN Added
Usage: 

select * from madlib.knn()
select * from madlib.knn('help')
select * from 
madlib.knn('knn_train_data','data','label','knn_test_data','data','id','knn_results','c',3)
select * from 
madlib.knn('knn_train_data','data','label','knn_test_data','data','id','knn_results','r',3)
select * from 
madlib.knn('knn_train_data','data','label','knn_test_data','data','id','knn_results','c')

You need to enter following arguments in order:
Argument 1: Training data table having training features as vector column 
and labels
Argument 2: Name of column having feature vectors in training data table
Argument 3: Name of column having actual label/vlaue for corresponding 
feature vector in training data table
Argument 4: Test data table having features as vector column. Id of 
features is mandatory
Argument 5: Name of column having feature vectors in test data table
Argument 6: Name of column having feature vector Ids in test data table
Argument 7: Name of output table
Argument 8: c for classification task, r for regression task
Argument 9: value of k. Default will go as 1';

test file added
changes made in main sql file and python file.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/auonhaidar/incubator-madlib features/knn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-madlib/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #81


commit b1a8d103cf617d0332b6a3289460a4ef5de09df6
Author: auonhaidar 
Date:   2016-12-13T02:09:12Z

KNN Added

commit 22db2e1a6f75826c3966771bb90a4f4607c29bb8
Author: auonhaidar 
Date:   2016-12-20T03:36:40Z

JIRA: MADLIB-927 Changes made in KNN-help message-test cases-etc




> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752824#comment-15752824
 ] 

ASF GitHub Bot commented on MADLIB-927:
---

Github user njayaram2 commented on the issue:

https://github.com/apache/incubator-madlib/pull/80
  
This is a great start! 
I will provide some github-specific feedback here, and more knn-specific
comments in the code.
Git can be daunting to use at first, but it's great once you get a hang of 
it.
I would recommend you go through the following wonderful book if you
have not already done so:
https://git-scm.com/book/en/v2

When you work on a feature/bug, it is best if you create a branch locally
and make all changes for that feature there. You can then push that branch
into your github repo and open a pull request. This way you won't mess with
your local master branch, which should ideally be in sync with the origin's
(apache/incubator-madlib in this case) master branch. More information on
how to work with branches can be found in the following chapter:
https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell 
(especially section 3.5)

One other minor feedback is to try including the corresponding JIRA id 
with the commit message. The JIRA associated with this feature is:
https://issues.apache.org/jira/browse/MADLIB-927


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-02-29 Thread Tianwei Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173187#comment-15173187
 ] 

Tianwei Shen commented on MADLIB-927:
-

Hi Sir,
I am Tianwei, a second-year Ph.D. student in HKUST. I am interested in this 
proposal and have implemented a prototype of naive k-nn in one of my projects, 
libvot(https://github.com/hlzz/libvot). See the source code for my 
implementation of k-nn here 
(https://github.com/hlzz/libvot/blob/master/src/vocab_tree/clustering.cpp), 
which support multi-thread processing using native c++11 support. This project 
is an implementation of vocabulary tree, which is a image retrieval algorithm 
widely used. I think this issue best suits my skill sets, so I would like to 
discuss with you in greater depth. Thanks.


> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a very simple algorithm that is based on finding 
> nearest neighbors of data points in a metric feature space according to a 
> specified distance function. It is considered one of the canonical algorithms 
> of data science. It is a nonparametric method, which makes it applicable to a 
> lot of real-world problems, where the data doesn’t satisfy particular 
> distribution assumptions. Also, it can be implemented as a lazy algorithm, 
> which means there is no training phase where information in the data is 
> condensed into coefficients, but there is a costly testing phase where all 
> data is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

2016-02-29 Thread ANISH SINGH (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173113#comment-15173113
 ] 

ANISH SINGH commented on MADLIB-927:


Hello Rahul Sir,
I'm Anish, a sophomore CSE student. Last winter, I decided to develop a share 
price prediction program and started work on it. I decided to use Apache Spark 
ml libraries, but they did not contain a default implementation of k-NN 
algorithm and it has not been developed as of now. I extensively studied papers 
about the algorithm and find myself in a suitable position to work on this 
project for the entire Summer. I would like to request to be guided further 
about the issue so that I can study more about it and draw up my proposal. The 
completion of the project would facilitate my previous attempts at the share 
price prediction program.
Thank You.

> Initial implementation of k-NN
> --
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Rahul Iyer
>  Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a very simple algorithm that is based on finding 
> nearest neighbors of data points in a metric feature space according to a 
> specified distance function. It is considered one of the canonical algorithms 
> of data science. It is a nonparametric method, which makes it applicable to a 
> lot of real-world problems, where the data doesn’t satisfy particular 
> distribution assumptions. Also, it can be implemented as a lazy algorithm, 
> which means there is no training phase where information in the data is 
> condensed into coefficients, but there is a costly testing phase where all 
> data is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)