[jira] [Updated] (MAHOUT-1305) Rework the wiki

2013-12-22 Thread Isabel Drost-Fromm (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost-Fromm updated MAHOUT-1305:
---

Attachment: MAHOUT-221213-1315-15716.pdf

Before going over pages and deleting stuff I exported the existing Wiki content 
to pdf/xml/html. The zipped HTML version is too large for upload to JIRA (19MB) 
but the pdf should work.

 Rework the wiki
 ---

 Key: MAHOUT-1305
 URL: https://issues.apache.org/jira/browse/MAHOUT-1305
 Project: Mahout
  Issue Type: Bug
  Components: Website
Reporter: Sebastian Schelter
Priority: Blocker
 Fix For: 0.9

 Attachments: MAHOUT-221213-1315-15716.pdf


 We should think about completely redoing our wiki. At the moment, we're 
 listing lots of algorithms that we either never implemented or already 
 removed. I also have the impression that a lot of stuff is outdated.
 It would be awesome if we had an up-to-date documentation of the code with 
 instructions on how to get into using mahout quickly.
 We should also have examples for all our 3 C's.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1305) Rework the wiki

2013-12-22 Thread Isabel Drost-Fromm (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855166#comment-13855166
 ] 

Isabel Drost-Fromm commented on MAHOUT-1305:


Switched the wiki layout to documentation layout - this makes pages available 
visible in a navigation bar on the left - should make it easier to spot 
duplicate/bogus content as well as help users navigate the site.

I didn't yet find out how to get rid of the funny centered formatting of the 
content.

 Rework the wiki
 ---

 Key: MAHOUT-1305
 URL: https://issues.apache.org/jira/browse/MAHOUT-1305
 Project: Mahout
  Issue Type: Bug
  Components: Website
Reporter: Sebastian Schelter
Priority: Blocker
 Fix For: 0.9

 Attachments: MAHOUT-221213-1315-15716.pdf


 We should think about completely redoing our wiki. At the moment, we're 
 listing lots of algorithms that we either never implemented or already 
 removed. I also have the impression that a lot of stuff is outdated.
 It would be awesome if we had an up-to-date documentation of the code with 
 instructions on how to get into using mahout quickly.
 We should also have examples for all our 3 C's.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1305) Rework the wiki

2013-12-22 Thread Isabel Drost-Fromm (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855170#comment-13855170
 ] 

Isabel Drost-Fromm commented on MAHOUT-1305:


One note: Deleting pages from the wiki will mean that several links we used in 
our mailing list in the past will no longer be valid (e.g. How to contribute, 
How to become a committer, How to release, How to get started, Powered by 
etc.). People searching the archives and finding these links will be left with 
404 errors. 

Instead of deleting those procedural pages should we move them underneath a 
common parent and edit them to at least contain a link to the new updated page 
now hosted on Apache CMS?

 Rework the wiki
 ---

 Key: MAHOUT-1305
 URL: https://issues.apache.org/jira/browse/MAHOUT-1305
 Project: Mahout
  Issue Type: Bug
  Components: Website
Reporter: Sebastian Schelter
Priority: Blocker
 Fix For: 0.9

 Attachments: MAHOUT-221213-1315-15716.pdf


 We should think about completely redoing our wiki. At the moment, we're 
 listing lots of algorithms that we either never implemented or already 
 removed. I also have the impression that a lot of stuff is outdated.
 It would be awesome if we had an up-to-date documentation of the code with 
 instructions on how to get into using mahout quickly.
 We should also have examples for all our 3 C's.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1305) Rework the wiki

2013-12-22 Thread Isabel Drost-Fromm (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855202#comment-13855202
 ] 

Isabel Drost-Fromm commented on MAHOUT-1305:


Just received a question for open tasks. Here is what I think should be done in 
terms of (Confluence) Wiki rework:

- Delete all pages that have bogus content (comments that essentially are 
questions best answered on the mailing list only, old release notes, etc.)
- Delete all pages that have been migrated to Apache CMS (or have been migrated 
and later deleted from CMS for being outdated) - with one exception:
- Move all pages that in the past have been referred to in issues and on the 
user/dev lists under a common parent, delete their content and instead add a 
link to the new page's URL so users aren't lost.
- For the rest that remains find a structure that is easy to understand and 
navigate.

The goal should be to keep stable, reliable documentation in CMS. Stuff that is 
in flux or a draft only is fine to remain in Confluence.

 Rework the wiki
 ---

 Key: MAHOUT-1305
 URL: https://issues.apache.org/jira/browse/MAHOUT-1305
 Project: Mahout
  Issue Type: Bug
  Components: Website
Reporter: Sebastian Schelter
Priority: Blocker
 Fix For: 0.9

 Attachments: MAHOUT-221213-1315-15716.pdf


 We should think about completely redoing our wiki. At the moment, we're 
 listing lots of algorithms that we either never implemented or already 
 removed. I also have the impression that a lot of stuff is outdated.
 It would be awesome if we had an up-to-date documentation of the code with 
 instructions on how to get into using mahout quickly.
 We should also have examples for all our 3 C's.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1305) Rework the wiki

2013-12-22 Thread Isabel Drost-Fromm (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855203#comment-13855203
 ] 

Isabel Drost-Fromm commented on MAHOUT-1305:


Oh - and if anyone finds a way to fix the formatting (at this point I'm even 
fine with removing any project specific css stuff we may have) - that would be 
great as well.

Last but not least - at the latest when done, we need to add a link to the wiki 
back to our main homepage - potential somewhere under developer resources.

 Rework the wiki
 ---

 Key: MAHOUT-1305
 URL: https://issues.apache.org/jira/browse/MAHOUT-1305
 Project: Mahout
  Issue Type: Bug
  Components: Website
Reporter: Sebastian Schelter
Priority: Blocker
 Fix For: 0.9

 Attachments: MAHOUT-221213-1315-15716.pdf


 We should think about completely redoing our wiki. At the moment, we're 
 listing lots of algorithms that we either never implemented or already 
 removed. I also have the impression that a lot of stuff is outdated.
 It would be awesome if we had an up-to-date documentation of the code with 
 instructions on how to get into using mahout quickly.
 We should also have examples for all our 3 C's.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Mahout 0.9 Release Notes - First Draft

2013-12-22 Thread Sebastian Schelter
Hi,

the draft looks good overall, I have some minor comments inline:

On 22.12.2013 03:28, Suneel Marthi wrote:
 Hi All,
 
 Please see below the first draft of Release notes for Mahout 0.9. Please feel 
 free to add/edit sections as u see fit.
 (This is a draft only).
 
 Regards,
 Suneel
 
 
 -
 
 
 The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. 
 Mahout's goal is to build scalable machine learning libraries focused 
 primarily in the areas of collaborative filtering (recommenders), 
 clustering and classification (known collectively as the 3Cs), as well as 
 the 
 necessary infrastructure to support those implementations including, but
 not limited to, math packages for statistics, linear algebra and others
 as well as Java primitive collections, local and distributed vector and
 matrix classes and a variety of integrative code to work with popular 
 packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache 
 Cassandra and much more. The 0.9 release is mainly a clean up release in
 preparation for an upcoming 1.0 release targeted for first half of 2014, but 
 there are a few
 significant new features, which are highlighted below.
 
 To get started with Apache Mahout 0.9,
  download the release artifacts and signatures at 
 http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven 
 repository. 
 
 In
  addition to the release highlights and artifacts, please pay attention 
 to the section labelled FUTURE PLANS below for more information about 
 upcoming releases of Mahout.
 
 As with any release, we wish to thank all of the users and contributors 
 to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for 
 individual credits, as there are too many to list here.
 
 GETTING STARTED
 
 In the release package, the examples directory contains several working 
 examples of the core 
 functionality available in Mahout. These can be run via scripts in the 
 examples/bin
  directory and will prompt you for more information to help you try 
 things out. Most examples do not need a Hadoop cluster in 
 order to run.
 
 RELEASE HIGHLIGHTS
 
 The highlights of the Apache Mahout 0.9 release include, but are not 
 limited to the list below. For further information, see the included 
 CHANGELOG file.
 
 - Scala DSL Bindings for Mahout Math Linear Algebra (MAHOUT-1297).
See 
 http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
 - New Multilayer Perceptron Classifier (MAHOUT-1265) 
 - Recommenders as a Search (MAHOUT-1288).  See 
 https://github.com/pferrel/solr-recommender
 - MAHOUT-1364: Upgrade Mahout to be Lucene 4.6.0 compliant
 - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 
 1-dimensional Clustering
   See 
 https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
  for the details.
 
 - Removed Deprecated algorithms.
 
 - the usual bug fixes. See JIRA [?} for more information on the 0.9 release.
 
 
 A total 91 separate JIRA issues were addressed in this release.
 
 The following algorithms that were marked deprecated in 0.8 have been removed 
 in 0.9:
 
 - From Clustering:
   Dirichlet - replaced by Collapsible Variational Bayes (CVB)

I think we switched our LDA implementation to use CVB and removed
Dirichlet clustering, those are two different things, right?

 
   Meanshift 
 
   MinHash - removed due to poor performance and lack of usage
 
   EigenCuts -
 
 
 - From Classification (both are sequential implementations)
 
   Winnow - lack of actual usage
 
   Perceptron - lack of actual usage 
 
 
 - Frequent Pattern Mining
 
 - Collaborative Filtering
 All recommenders in org.apache.mahout.cf.taste.impl.recommender.knn
 SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone 
 and org.apache.mahout.cf.taste.impl.recommender.slopeone
 Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo
 TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender

We should be careful, because the package knn could make people think we
removed our itembased recommenders (already caused confusion on twitter).

I think it would be sufficient to say we removed a couple of rarely used
recommenders, in particular SlopeOne.

 
 - Mahout Math
 Lanczos in favour of SSVD

IIRC, we agreed to not remove Lanczos, although it was initially
deprecated. We should undeprecate it.

 Hadoop entropy stuff in org.apache.mahout.math.stats.entropy
 
 If you are interested in supporting 1 or more of these algorithms, please 
 make it known on dev@mahout.apache.org and via JIRA issues that fix and/or 
 improve them. Please also provide 
 supporting evidence as to their effectiveness for you in production.
 
 
 CONTRIBUTING
 
 Mahout
  is always looking for contributions focused on the 3Cs. If you are 
 interested in contributing, please see our contribution page, 
 

Re: Mahout 0.9 Release Notes - First Draft

2013-12-22 Thread Ted Dunning
On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter 
ssc.o...@googlemail.com wrote:

  - From Clustering:
Dirichlet - replaced by Collapsible Variational Bayes (CVB)

 I think we switched our LDA implementation to use CVB and removed
 Dirichlet clustering, those are two different things, right?


Correct.


[jira] [Commented] (MAHOUT-1305) Rework the wiki

2013-12-22 Thread Isabel Drost-Fromm (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855306#comment-13855306
 ] 

Isabel Drost-Fromm commented on MAHOUT-1305:


The stylesheet currently used in our confluence wiki is broken in that all 
content is centered. Looks like I lack permission to change it. Can someone 
else try please?

https://cwiki.apache.org/confluence/spaces/viewstylesheet.action?key=MAHOUT

(In the admin section click on Look and Feel - from there navigate to 
Stylesheet to see what is currently deployed.)

 Rework the wiki
 ---

 Key: MAHOUT-1305
 URL: https://issues.apache.org/jira/browse/MAHOUT-1305
 Project: Mahout
  Issue Type: Bug
  Components: Documentation
Reporter: Sebastian Schelter
Priority: Blocker
 Fix For: 0.9

 Attachments: MAHOUT-221213-1315-15716.pdf


 We should think about completely redoing our wiki. At the moment, we're 
 listing lots of algorithms that we either never implemented or already 
 removed. I also have the impression that a lot of stuff is outdated.
 It would be awesome if we had an up-to-date documentation of the code with 
 instructions on how to get into using mahout quickly.
 We should also have examples for all our 3 C's.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)