[ https://issues.apache.org/jira/browse/MAHOUT-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999581#comment-15999581 ]
ASF GitHub Bot commented on MAHOUT-1976: ---------------------------------------- GitHub user rawkintrevo opened a pull request: https://github.com/apache/mahout/pull/314 MAHOUT-1976 Add CanopyClustering MAHOUT-1976 Add Canopy Clustering ### Purpose of PR: 1 . Primarily, this PR adds CanopyClustering to Algorithms Framework. 2. This PR introduces the "clustering" framework of the algorithms framework 3. this PR introduces distance metrics and ports two metrics from the old MR code base. ### Important ToDos Please mark each with an "x" - [x] Opening PR against `develop` NOT `master` (OR `feature-name` if this is part of an ongoing feature development). **need to delete this requirement, JIRA needed** - [x] A JIRA ticket exists (if not, please create this first)[https://issues.apache.org/jira/browse/ZEPPELIN/] - [x] Title of PR is "MAHOUT-XXXX Brief Description of Changes" where XXXX is the JIRA number. - [x] Created unit tests where appropriate - [x] Added licenses correct on newly added files - [x] Assigned JIRA to self - [x] Added documentation in scala docs/java docs, (and website once that is merged to dev) - [x] Successfully built and ran all unit tests, verified that all tests pass locally. Oh by the way, does this change break earlier versions? No Is this the beginning of a larger project for which a feature branch should be made? No You can merge this pull request into a Git repository by running: $ git pull https://github.com/rawkintrevo/mahout mahout-1976 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/314.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #314 ---- commit 7f18775afae639c1b291fb0273d92dc71de24884 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2017-05-04T14:25:42Z MAHOUT-1976 Add CanopyClustering MAHOUT-1976 Add Canopy Clustering forgot unit tests ---- > Add Canopy Clustering Algorithm > ------------------------------- > > Key: MAHOUT-1976 > URL: https://issues.apache.org/jira/browse/MAHOUT-1976 > Project: Mahout > Issue Type: Improvement > Components: Algorithms > Affects Versions: 0.13.2 > Reporter: Trevor Grant > Assignee: Trevor Grant > > Primarily, we need to lay out the clustering section of the Algorihtms > Framework. > The Canopy Clustering Algorithm is very simple and yet very useful as a > preprocessing step for more advanced clustering algorithms such as KMeans and > Hierarchical Clustering. > https://en.wikipedia.org/wiki/Canopy_clustering_algorithm > The majority of the "work" on this PR will be creating the framework. > It is also one of the Legacy MR algorithms that would be nice to port. -- This message was sent by Atlassian JIRA (v6.3.15#6346)