Re: Mahout 0.9 Release Notes - First Draft
Could someone please point me to the URL for adding Mahout release notes? On Monday, February 17, 2014 3:27 PM, Ellen Friedman b.ellen.fried...@gmail.com wrote: Hi Suneel, Thanks for notes. I'm inquiring about status of the notes and update to the website to announce 0.9: Ted has reviewed the release notes - were you waiting for additional input or are they ready to go on the website? Are you the one who updates the site? I've been asked to write a short blog on the release but wanted to wait until the site is updated. Thanks much Ellen On Tue, Feb 11, 2014 at 10:06 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: Here's a draft of the Release Notes for Mahout 0.9, Please review the same. -- The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.9 release is mainly a clean up release in preparation for an upcoming 1.0 release targeted for first half of 2014, but there are a few significant new features, which are highlighted below. To get started with Apache Mahout 0.9, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.9 release include, but are not limited to the list below. For further information, see the included CHANGELOG[1] file. - MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra. See http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html - MAHOUT-1288: Recommenders as a Search. See https://github.com/pferrel/solr-recommender - MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1 - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering See https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf for the details. - MAHOUT-1265: MultiLayer Perceptron (MLP) classifier This is an early implementation of MLP to solicit user feedback, needs to be integrated into Mahout’s processing pipeline to work with Mahout’s vectors. - Removed Deprecated algorithms as they have been either replaced by better performing algorithms or lacked user support and maintenance. - the usual bug fixes. See [2] for more information on the 0.9 release. A total of 113 separate JIRA issues were addressed in this release. The following algorithms that were marked deprecated in 0.8 have been removed in 0.9: - From Clustering: Switched LDA implementation from using Dirtichlet to Collapsed Variational Bayes (CVB) Meanshift MinHash - removed due to poor performance, lack of support and lack of usage - From Classification (both are sequential implementations) Winnow - lack of actual usage and support Perceptron - lack of actual usage and support - Collaborative Filtering SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Hadoop entropy stuff in org.apache.mahout.math.stats.entropy CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page http://mahout.apache.org/developers/how-to-contribute.html or contact us via email at dev@mahout.apache.org. As the project moves towards a 1.0 release, the community will be focused on key algorithms that are proven to scale in production and have seen wide-spread adoption. [1] http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661 [2]
Re: Mahout 0.9 Release Notes - First Draft
Below r the release notes, not sure where they should be going on the website. If someone could point me to a location I will go ahead and update the same. = The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.9 release is mainly a clean up release in preparation for an upcoming 1.0 release targeted for first half of 2014, but there are a few significant new features, which are highlighted below. To get started with Apache Mahout 0.9, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.9 release include, but are not limited to the list below. For further information, see the included CHANGELOG[1] file. - MAHOUT-1245: A new and improved Mahout website based on Apache CMS - MAHOUT-1265: MultiLayer Perceptron (MLP) classifier This is an early implementation of MLP to solicit user feedback, needs to be integrated into Mahout’s processing pipeline to work with Mahout’s vectors. - MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra. See http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html - MAHOUT-1288: Recommenders as a Search. See https://github.com/pferrel/solr-recommender - MAHOUT-1300: Suport for easy functional Matrix views and derivatives - MAHOUT-1343: JSON output format for ClusterDumper - MAHOUT-1345: Enable randomised testing for all Mahout modules using Carrot RandomizedRunner. - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering. See https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf for the details. - MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1 - Removed Deprecated algorithms as they have been either replaced by better performing algorithms or lacked user support and maintenance. - the usual bug fixes. See [2] for more information on the 0.9 release. A total of 113 separate JIRA issues were addressed in this release. The following algorithms that were marked deprecated in 0.8 have been removed in 0.9: - From Clustering: Switched LDA implementation from using Gibbs Sampling to Collapsed Variational Bayes (CVB) Meanshift MinHash - removed due to poor performance, lack of support and lack of usage - From Classification (both are sequential implementations) Winnow - lack of actual usage and support Perceptron - lack of actual usage and support - Collaborative Filtering SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Hadoop entropy stuff in org.apache.mahout.math.stats.entropy CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page http://mahout.apache.org/developers/how-to-contribute.html or contact us via email at dev@mahout.apache.org. As the project moves towards a 1.0 release, the community will be focused on key algorithms that are proven to scale in production and have seen wide-spread adoption. [1] http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661 [2] https://issues.apache.org/jira/browse/MAHOUT-1411?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.9%22 On Monday, February 17, 2014 3:27 PM, Ellen Friedman b.ellen.fried...@gmail.com wrote: Hi Suneel, Thanks for notes. I'm inquiring about status of the notes and update to the website to announce 0.9: Ted has reviewed the release notes - were you waiting
Re: Mahout 0.9 Release Notes - First Draft
On Tue, Feb 11, 2014 at 10:06 AM, Suneel Marthi suneel_mar...@yahoo.comwrote: Switched LDA implementation from using Dirtichlet to Collapsed Variational Bayes (CVB) This line should read: Switched LDA implementation from using Gibb's sampling to Collapsed Variational Bayes (CVB) Otherwise, it looks pretty good.
Re: Mahout 0.9 Release Notes - First Draft
Hi Suneel, Thanks for notes. I'm inquiring about status of the notes and update to the website to announce 0.9: Ted has reviewed the release notes - were you waiting for additional input or are they ready to go on the website? Are you the one who updates the site? I've been asked to write a short blog on the release but wanted to wait until the site is updated. Thanks much Ellen On Tue, Feb 11, 2014 at 10:06 AM, Suneel Marthi suneel_mar...@yahoo.comwrote: Here's a draft of the Release Notes for Mahout 0.9, Please review the same. -- The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.9 release is mainly a clean up release in preparation for an upcoming 1.0 release targeted for first half of 2014, but there are a few significant new features, which are highlighted below. To get started with Apache Mahout 0.9, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.9 release include, but are not limited to the list below. For further information, see the included CHANGELOG[1] file. - MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra. See http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html - MAHOUT-1288: Recommenders as a Search. See https://github.com/pferrel/solr-recommender - MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1 - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering See https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdffor the details. - MAHOUT-1265: MultiLayer Perceptron (MLP) classifier This is an early implementation of MLP to solicit user feedback, needs to be integrated into Mahout's processing pipeline to work with Mahout's vectors. - Removed Deprecated algorithms as they have been either replaced by better performing algorithms or lacked user support and maintenance. - the usual bug fixes. See [2] for more information on the 0.9 release. A total of 113 separate JIRA issues were addressed in this release. The following algorithms that were marked deprecated in 0.8 have been removed in 0.9: - From Clustering: Switched LDA implementation from using Dirtichlet to Collapsed Variational Bayes (CVB) Meanshift MinHash - removed due to poor performance, lack of support and lack of usage - From Classification (both are sequential implementations) Winnow - lack of actual usage and support Perceptron - lack of actual usage and support - Collaborative Filtering SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Hadoop entropy stuff in org.apache.mahout.math.stats.entropy CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page http://mahout.apache.org/developers/how-to-contribute.html or contact us via email at dev@mahout.apache.org. As the project moves towards a 1.0 release, the community will be focused on key algorithms that are proven to scale in production and have seen wide-spread adoption. [1] http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661 [2] https://issues.apache.org/jira/browse/MAHOUT-1411?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.9%22 On Monday, December 23, 2013 7:41 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Sun, Dec 22, 2013 at
Re: Mahout 0.9 Release Notes - First Draft
Here's a draft of the Release Notes for Mahout 0.9, Please review the same. -- The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.9 release is mainly a clean up release in preparation for an upcoming 1.0 release targeted for first half of 2014, but there are a few significant new features, which are highlighted below. To get started with Apache Mahout 0.9, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.9 release include, but are not limited to the list below. For further information, see the included CHANGELOG[1] file. - MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra. See http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html - MAHOUT-1288: Recommenders as a Search. See https://github.com/pferrel/solr-recommender - MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1 - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering See https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf for the details. - MAHOUT-1265: MultiLayer Perceptron (MLP) classifier This is an early implementation of MLP to solicit user feedback, needs to be integrated into Mahout’s processing pipeline to work with Mahout’s vectors. - Removed Deprecated algorithms as they have been either replaced by better performing algorithms or lacked user support and maintenance. - the usual bug fixes. See [2] for more information on the 0.9 release. A total of 113 separate JIRA issues were addressed in this release. The following algorithms that were marked deprecated in 0.8 have been removed in 0.9: - From Clustering: Switched LDA implementation from using Dirtichlet to Collapsed Variational Bayes (CVB) Meanshift MinHash - removed due to poor performance, lack of support and lack of usage - From Classification (both are sequential implementations) Winnow - lack of actual usage and support Perceptron - lack of actual usage and support - Collaborative Filtering SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Hadoop entropy stuff in org.apache.mahout.math.stats.entropy CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page http://mahout.apache.org/developers/how-to-contribute.html or contact us via email at dev@mahout.apache.org. As the project moves towards a 1.0 release, the community will be focused on key algorithms that are proven to scale in production and have seen wide-spread adoption. [1] http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661 [2] https://issues.apache.org/jira/browse/MAHOUT-1411?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.9%22 On Monday, December 23, 2013 7:41 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter ssc.o...@googlemail.com wrote: - Mahout Math Lanczos in favour of SSVD IIRC, we agreed to not remove Lanczos, although it was initially deprecated. We should undeprecate it. Some folks like Lanczos in Mahout (for reasons not really clear to me, aside from accuracy when computing svd of a random noise, there are actually 0 reasons to use Lanczos instead). I agree we don't necessarily want to cull it out -- but IMO there should be a clear steer posted in favor of SSVD in the
Re: Mahout 0.9 Release Notes - First Draft
Hi, one thing I forgot: you once mentioned running into issues with the new kmeans - are those fixed or tracked in jira? In case of the latter we should include a known issues/ call for helping hands section. Isabel
Re: Mahout 0.9 Release Notes - First Draft
On Sat, Dec 21, 2013 at 6:28 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Hi All, Please see below the first draft of Release notes for Mahout 0.9. Please feel free to add/edit sections as u see fit. (This is a draft only). Regards, Suneel - The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.9 release is mainly a clean up release in preparation for an upcoming 1.0 release targeted for first half of 2014, but there are a few significant new features, which are highlighted below. To get started with Apache Mahout 0.9, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. In addition to the release highlights and artifacts, please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.9 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Scala DSL Bindings for Mahout Math Linear Algebra (MAHOUT-1297). See http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html - New Multilayer Perceptron Classifier (MAHOUT-1265) - Recommenders as a Search (MAHOUT-1288). See https://github.com/pferrel/solr-recommender - MAHOUT-1364: Upgrade Mahout to be Lucene 4.6.0 compliant - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering See https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdffor the details. - Removed Deprecated algorithms. - the usual bug fixes. See JIRA [?} for more information on the 0.9 release. A total 91 separate JIRA issues were addressed in this release. The following algorithms that were marked deprecated in 0.8 have been removed in 0.9: - From Clustering: Dirichlet - replaced by Collapsible Variational Bayes (CVB) I think the name of the method i commonly hear is Collapsed Variational Bayes Meanshift MinHash - removed due to poor performance and lack of usage EigenCuts - - From Classification (both are sequential implementations) Winnow - lack of actual usage Perceptron - lack of actual usage - Frequent Pattern Mining - Collaborative Filtering All recommenders in org.apache.mahout.cf.taste.impl.recommender.knn SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Lanczos in favour of SSVD Hadoop entropy stuff in org.apache.mahout.math.stats.entropy If you are interested in supporting 1 or more of these algorithms, please make it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting evidence as to their effectiveness for you in production. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page, https://cwiki.apache.org/MAHOUT/how-to-contribute.html, on the Mahout wiki or contact us via email at dev@mahout.apache.org. FUTURE PLANS 1.0 Plans - New Downpour SGD classifier - Support for Finite State Transducers (FST) as a Dictionary Type. - Support for Hadoop 2.x - Port Mahout's recommenders to Spark (??) - Support for Java 7 - Better API interfaces for Clustering - (what else???) As the project moves towards a 1.0 release, the community will be focused on key algorithms that are proven to scale in
Re: Mahout 0.9 Release Notes - First Draft
On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter ssc.o...@googlemail.com wrote: - Mahout Math Lanczos in favour of SSVD IIRC, we agreed to not remove Lanczos, although it was initially deprecated. We should undeprecate it. Some folks like Lanczos in Mahout (for reasons not really clear to me, aside from accuracy when computing svd of a random noise, there are actually 0 reasons to use Lanczos instead). I agree we don't necessarily want to cull it out -- but IMO there should be a clear steer posted in favor of SSVD in the docs/javadocs.
Re: Mahout 0.9 Release Notes - First Draft
Suneel ran into some issues this weekend; I'm going to try it out and see if I can repro. On Dec 23, 2013, at 1:02 AM, Isabel Drost-Fromm isa...@apache.org wrote: Hi, one thing I forgot: you once mentioned running into issues with the new kmeans - are those fixed or tracked in jira? In case of the latter we should include a known issues/ call for helping hands section. Isabel
Re: Mahout 0.9 Release Notes - First Draft
Hi, the draft looks good overall, I have some minor comments inline: On 22.12.2013 03:28, Suneel Marthi wrote: Hi All, Please see below the first draft of Release notes for Mahout 0.9. Please feel free to add/edit sections as u see fit. (This is a draft only). Regards, Suneel - The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.9 release is mainly a clean up release in preparation for an upcoming 1.0 release targeted for first half of 2014, but there are a few significant new features, which are highlighted below. To get started with Apache Mahout 0.9, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. In addition to the release highlights and artifacts, please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.9 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Scala DSL Bindings for Mahout Math Linear Algebra (MAHOUT-1297). See http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html - New Multilayer Perceptron Classifier (MAHOUT-1265) - Recommenders as a Search (MAHOUT-1288). See https://github.com/pferrel/solr-recommender - MAHOUT-1364: Upgrade Mahout to be Lucene 4.6.0 compliant - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering See https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf for the details. - Removed Deprecated algorithms. - the usual bug fixes. See JIRA [?} for more information on the 0.9 release. A total 91 separate JIRA issues were addressed in this release. The following algorithms that were marked deprecated in 0.8 have been removed in 0.9: - From Clustering: Dirichlet - replaced by Collapsible Variational Bayes (CVB) I think we switched our LDA implementation to use CVB and removed Dirichlet clustering, those are two different things, right? Meanshift MinHash - removed due to poor performance and lack of usage EigenCuts - - From Classification (both are sequential implementations) Winnow - lack of actual usage Perceptron - lack of actual usage - Frequent Pattern Mining - Collaborative Filtering All recommenders in org.apache.mahout.cf.taste.impl.recommender.knn SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender We should be careful, because the package knn could make people think we removed our itembased recommenders (already caused confusion on twitter). I think it would be sufficient to say we removed a couple of rarely used recommenders, in particular SlopeOne. - Mahout Math Lanczos in favour of SSVD IIRC, we agreed to not remove Lanczos, although it was initially deprecated. We should undeprecate it. Hadoop entropy stuff in org.apache.mahout.math.stats.entropy If you are interested in supporting 1 or more of these algorithms, please make it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting evidence as to their effectiveness for you in production. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page,
Re: Mahout 0.9 Release Notes - First Draft
On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter ssc.o...@googlemail.com wrote: - From Clustering: Dirichlet - replaced by Collapsible Variational Bayes (CVB) I think we switched our LDA implementation to use CVB and removed Dirichlet clustering, those are two different things, right? Correct.
Mahout 0.9 Release Notes - First Draft
Hi All, Please see below the first draft of Release notes for Mahout 0.9. Please feel free to add/edit sections as u see fit. (This is a draft only). Regards, Suneel - The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.9 release is mainly a clean up release in preparation for an upcoming 1.0 release targeted for first half of 2014, but there are a few significant new features, which are highlighted below. To get started with Apache Mahout 0.9, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. In addition to the release highlights and artifacts, please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.9 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Scala DSL Bindings for Mahout Math Linear Algebra (MAHOUT-1297). See http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html - New Multilayer Perceptron Classifier (MAHOUT-1265) - Recommenders as a Search (MAHOUT-1288). See https://github.com/pferrel/solr-recommender - MAHOUT-1364: Upgrade Mahout to be Lucene 4.6.0 compliant - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering See https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf for the details. - Removed Deprecated algorithms. - the usual bug fixes. See JIRA [?} for more information on the 0.9 release. A total 91 separate JIRA issues were addressed in this release. The following algorithms that were marked deprecated in 0.8 have been removed in 0.9: - From Clustering: Dirichlet - replaced by Collapsible Variational Bayes (CVB) Meanshift MinHash - removed due to poor performance and lack of usage EigenCuts - - From Classification (both are sequential implementations) Winnow - lack of actual usage Perceptron - lack of actual usage - Frequent Pattern Mining - Collaborative Filtering All recommenders in org.apache.mahout.cf.taste.impl.recommender.knn SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Lanczos in favour of SSVD Hadoop entropy stuff in org.apache.mahout.math.stats.entropy If you are interested in supporting 1 or more of these algorithms, please make it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting evidence as to their effectiveness for you in production. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page, https://cwiki.apache.org/MAHOUT/how-to-contribute.html, on the Mahout wiki or contact us via email at dev@mahout.apache.org. FUTURE PLANS 1.0 Plans - New Downpour SGD classifier - Support for Finite State Transducers (FST) as a Dictionary Type. - Support for Hadoop 2.x - Port Mahout's recommenders to Spark (??) - Support for Java 7 - Better API interfaces for Clustering - (what else???) As the project moves towards a 1.0 release, the community will be focused on key algorithms that are proven to scale in production and have seen wide-spread adoption. Our plans as a community are to focus 1.0 on the support of algorithms and features listed above. The support for the algorithms packaged in 1.0 for atleast two minor