[jira] [Updated] (MAHOUT-1305) Rework the wiki
[ https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost-Fromm updated MAHOUT-1305: --- Attachment: MAHOUT-221213-1315-15716.pdf Before going over pages and deleting stuff I exported the existing Wiki content to pdf/xml/html. The zipped HTML version is too large for upload to JIRA (19MB) but the pdf should work. Rework the wiki --- Key: MAHOUT-1305 URL: https://issues.apache.org/jira/browse/MAHOUT-1305 Project: Mahout Issue Type: Bug Components: Website Reporter: Sebastian Schelter Priority: Blocker Fix For: 0.9 Attachments: MAHOUT-221213-1315-15716.pdf We should think about completely redoing our wiki. At the moment, we're listing lots of algorithms that we either never implemented or already removed. I also have the impression that a lot of stuff is outdated. It would be awesome if we had an up-to-date documentation of the code with instructions on how to get into using mahout quickly. We should also have examples for all our 3 C's. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1305) Rework the wiki
[ https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855166#comment-13855166 ] Isabel Drost-Fromm commented on MAHOUT-1305: Switched the wiki layout to documentation layout - this makes pages available visible in a navigation bar on the left - should make it easier to spot duplicate/bogus content as well as help users navigate the site. I didn't yet find out how to get rid of the funny centered formatting of the content. Rework the wiki --- Key: MAHOUT-1305 URL: https://issues.apache.org/jira/browse/MAHOUT-1305 Project: Mahout Issue Type: Bug Components: Website Reporter: Sebastian Schelter Priority: Blocker Fix For: 0.9 Attachments: MAHOUT-221213-1315-15716.pdf We should think about completely redoing our wiki. At the moment, we're listing lots of algorithms that we either never implemented or already removed. I also have the impression that a lot of stuff is outdated. It would be awesome if we had an up-to-date documentation of the code with instructions on how to get into using mahout quickly. We should also have examples for all our 3 C's. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1305) Rework the wiki
[ https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855170#comment-13855170 ] Isabel Drost-Fromm commented on MAHOUT-1305: One note: Deleting pages from the wiki will mean that several links we used in our mailing list in the past will no longer be valid (e.g. How to contribute, How to become a committer, How to release, How to get started, Powered by etc.). People searching the archives and finding these links will be left with 404 errors. Instead of deleting those procedural pages should we move them underneath a common parent and edit them to at least contain a link to the new updated page now hosted on Apache CMS? Rework the wiki --- Key: MAHOUT-1305 URL: https://issues.apache.org/jira/browse/MAHOUT-1305 Project: Mahout Issue Type: Bug Components: Website Reporter: Sebastian Schelter Priority: Blocker Fix For: 0.9 Attachments: MAHOUT-221213-1315-15716.pdf We should think about completely redoing our wiki. At the moment, we're listing lots of algorithms that we either never implemented or already removed. I also have the impression that a lot of stuff is outdated. It would be awesome if we had an up-to-date documentation of the code with instructions on how to get into using mahout quickly. We should also have examples for all our 3 C's. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1305) Rework the wiki
[ https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855202#comment-13855202 ] Isabel Drost-Fromm commented on MAHOUT-1305: Just received a question for open tasks. Here is what I think should be done in terms of (Confluence) Wiki rework: - Delete all pages that have bogus content (comments that essentially are questions best answered on the mailing list only, old release notes, etc.) - Delete all pages that have been migrated to Apache CMS (or have been migrated and later deleted from CMS for being outdated) - with one exception: - Move all pages that in the past have been referred to in issues and on the user/dev lists under a common parent, delete their content and instead add a link to the new page's URL so users aren't lost. - For the rest that remains find a structure that is easy to understand and navigate. The goal should be to keep stable, reliable documentation in CMS. Stuff that is in flux or a draft only is fine to remain in Confluence. Rework the wiki --- Key: MAHOUT-1305 URL: https://issues.apache.org/jira/browse/MAHOUT-1305 Project: Mahout Issue Type: Bug Components: Website Reporter: Sebastian Schelter Priority: Blocker Fix For: 0.9 Attachments: MAHOUT-221213-1315-15716.pdf We should think about completely redoing our wiki. At the moment, we're listing lots of algorithms that we either never implemented or already removed. I also have the impression that a lot of stuff is outdated. It would be awesome if we had an up-to-date documentation of the code with instructions on how to get into using mahout quickly. We should also have examples for all our 3 C's. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1305) Rework the wiki
[ https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855203#comment-13855203 ] Isabel Drost-Fromm commented on MAHOUT-1305: Oh - and if anyone finds a way to fix the formatting (at this point I'm even fine with removing any project specific css stuff we may have) - that would be great as well. Last but not least - at the latest when done, we need to add a link to the wiki back to our main homepage - potential somewhere under developer resources. Rework the wiki --- Key: MAHOUT-1305 URL: https://issues.apache.org/jira/browse/MAHOUT-1305 Project: Mahout Issue Type: Bug Components: Website Reporter: Sebastian Schelter Priority: Blocker Fix For: 0.9 Attachments: MAHOUT-221213-1315-15716.pdf We should think about completely redoing our wiki. At the moment, we're listing lots of algorithms that we either never implemented or already removed. I also have the impression that a lot of stuff is outdated. It would be awesome if we had an up-to-date documentation of the code with instructions on how to get into using mahout quickly. We should also have examples for all our 3 C's. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Mahout 0.9 Release Notes - First Draft
Hi, the draft looks good overall, I have some minor comments inline: On 22.12.2013 03:28, Suneel Marthi wrote: Hi All, Please see below the first draft of Release notes for Mahout 0.9. Please feel free to add/edit sections as u see fit. (This is a draft only). Regards, Suneel - The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.9 release is mainly a clean up release in preparation for an upcoming 1.0 release targeted for first half of 2014, but there are a few significant new features, which are highlighted below. To get started with Apache Mahout 0.9, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. In addition to the release highlights and artifacts, please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.9 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Scala DSL Bindings for Mahout Math Linear Algebra (MAHOUT-1297). See http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html - New Multilayer Perceptron Classifier (MAHOUT-1265) - Recommenders as a Search (MAHOUT-1288). See https://github.com/pferrel/solr-recommender - MAHOUT-1364: Upgrade Mahout to be Lucene 4.6.0 compliant - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering See https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf for the details. - Removed Deprecated algorithms. - the usual bug fixes. See JIRA [?} for more information on the 0.9 release. A total 91 separate JIRA issues were addressed in this release. The following algorithms that were marked deprecated in 0.8 have been removed in 0.9: - From Clustering: Dirichlet - replaced by Collapsible Variational Bayes (CVB) I think we switched our LDA implementation to use CVB and removed Dirichlet clustering, those are two different things, right? Meanshift MinHash - removed due to poor performance and lack of usage EigenCuts - - From Classification (both are sequential implementations) Winnow - lack of actual usage Perceptron - lack of actual usage - Frequent Pattern Mining - Collaborative Filtering All recommenders in org.apache.mahout.cf.taste.impl.recommender.knn SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender We should be careful, because the package knn could make people think we removed our itembased recommenders (already caused confusion on twitter). I think it would be sufficient to say we removed a couple of rarely used recommenders, in particular SlopeOne. - Mahout Math Lanczos in favour of SSVD IIRC, we agreed to not remove Lanczos, although it was initially deprecated. We should undeprecate it. Hadoop entropy stuff in org.apache.mahout.math.stats.entropy If you are interested in supporting 1 or more of these algorithms, please make it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting evidence as to their effectiveness for you in production. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page,
Re: Mahout 0.9 Release Notes - First Draft
On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter ssc.o...@googlemail.com wrote: - From Clustering: Dirichlet - replaced by Collapsible Variational Bayes (CVB) I think we switched our LDA implementation to use CVB and removed Dirichlet clustering, those are two different things, right? Correct.
[jira] [Commented] (MAHOUT-1305) Rework the wiki
[ https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855306#comment-13855306 ] Isabel Drost-Fromm commented on MAHOUT-1305: The stylesheet currently used in our confluence wiki is broken in that all content is centered. Looks like I lack permission to change it. Can someone else try please? https://cwiki.apache.org/confluence/spaces/viewstylesheet.action?key=MAHOUT (In the admin section click on Look and Feel - from there navigate to Stylesheet to see what is currently deployed.) Rework the wiki --- Key: MAHOUT-1305 URL: https://issues.apache.org/jira/browse/MAHOUT-1305 Project: Mahout Issue Type: Bug Components: Documentation Reporter: Sebastian Schelter Priority: Blocker Fix For: 0.9 Attachments: MAHOUT-221213-1315-15716.pdf We should think about completely redoing our wiki. At the moment, we're listing lots of algorithms that we either never implemented or already removed. I also have the impression that a lot of stuff is outdated. It would be awesome if we had an up-to-date documentation of the code with instructions on how to get into using mahout quickly. We should also have examples for all our 3 C's. -- This message was sent by Atlassian JIRA (v6.1.5#6160)