[ https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819825#comment-15819825 ]
ASF GitHub Bot commented on MADLIB-927: --------------------------------------- Github user auonhaidar commented on the issue: https://github.com/apache/incubator-madlib/pull/81 I ran this command inside build: $ du -h doc/ 4.0K doc/design/figures 4.0K doc/design/modules 20K doc/design/CMakeFiles/auxclean.dir 44K doc/design/CMakeFiles/design_ps.dir 20K doc/design/CMakeFiles/html.dir 20K doc/design/CMakeFiles/design_html.dir 20K doc/design/CMakeFiles/design.dir 28K doc/design/CMakeFiles/design_auxclean.dir 40K doc/design/CMakeFiles/design_dvi.dir 20K doc/design/CMakeFiles/pdf.dir 20K doc/design/CMakeFiles/safepdf.dir 20K doc/design/CMakeFiles/ps.dir 20K doc/design/CMakeFiles/design_safepdf.dir 40K doc/design/CMakeFiles/design_pdf.dir 20K doc/design/CMakeFiles/dvi.dir 344K doc/design/CMakeFiles 4.0K doc/design/other-chapters 380K doc/design 12K doc/bin/CMakeFiles 36K doc/bin 8.0K doc/imgs 20K doc/CMakeFiles/update_mathjax.dir 40K doc/CMakeFiles/doxysql.dir 20K doc/CMakeFiles/devdoc.dir 20K doc/CMakeFiles/doc.dir 112K doc/CMakeFiles 12K doc/etc/CMakeFiles 152K doc/etc 720K doc/ > Initial implementation of k-NN > ------------------------------ > > Key: MADLIB-927 > URL: https://issues.apache.org/jira/browse/MADLIB-927 > Project: Apache MADlib > Issue Type: New Feature > Reporter: Rahul Iyer > Labels: gsoc2016, starter > > k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors > of data points in a metric feature space according to a specified distance > function. It is considered one of the canonical algorithms of data science. > It is a nonparametric method, which makes it applicable to a lot of > real-world problems where the data doesn’t satisfy particular distribution > assumptions. It can also be implemented as a lazy algorithm, which means > there is no training phase where information in the data is condensed into > coefficients, but there is a costly testing phase where all data (or some > subset) is used to make predictions. > This JIRA involves implementing the naïve approach - i.e. compute the k > nearest neighbors by going through all points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)