Re: Mahout 0.9 Release Notes - First Draft

2014-02-18 Thread Suneel Marthi
Could someone please point me to the URL for adding Mahout release notes?  




On Monday, February 17, 2014 3:27 PM, Ellen Friedman 
b.ellen.fried...@gmail.com wrote:
 

Hi Suneel,

Thanks for notes. I'm inquiring about status of the notes and update to the 
website to announce 0.9: Ted has reviewed the release notes - were you waiting 
for additional input or are they ready to go on the website? Are you the one 
who updates the site?

I've been asked to write a short blog on the release but wanted to wait until 
the site is updated.

Thanks much
Ellen





On Tue, Feb 11, 2014 at 10:06 AM, Suneel Marthi suneel_mar...@yahoo.com wrote:

Here's a draft of the Release Notes for Mahout 0.9, Please review the same.

--



The Apache Mahout PMC is pleased to announce the release of Mahout 0.9.
Mahout's goal is to build scalable machine learning libraries focused
primarily in the areas of collaborative filtering (recommenders),
clustering and classification (known collectively as the 3Cs), as well as the
necessary infrastructure to support those implementations including, but
not limited to, math packages for statistics, linear algebra and others
as well as Java primitive collections, local and distributed vector and
matrix classes and a variety of integrative code to work with popular
packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache
Cassandra and much more. The 0.9 release is mainly a clean up release in
preparation for an upcoming 1.0 release targeted for first half of 2014, but 
there are a few
significant new features, which are highlighted below.

To get started with Apache Mahout 0.9, download the release artifacts and 
signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central 
Maven repository.


As with any release, we wish to thank all of the users and contributors
to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for
individual credits, as there are too many to list here.

GETTING STARTED

In the release package, the examples directory contains several working 
examples of the core
functionality available in Mahout. These can be run via scripts in the 
examples/bin
directory and will prompt you for more information to help you try things out.
Most examples do not need a Hadoop cluster in order to run.

RELEASE HIGHLIGHTS

The highlights of the Apache Mahout 0.9 release include, but are not
limited to the list below. For further information, see the included 
CHANGELOG[1] file.

-  MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra.
   See 
http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
-  MAHOUT-1288: Recommenders as a Search.  See 
https://github.com/pferrel/solr-recommender
-  MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1

-  MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 
1-dimensional Clustering
  See 
https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
 for the details.
-  MAHOUT-1265: MultiLayer Perceptron (MLP) classifier
   This is an early implementation of MLP to solicit user feedback, needs to 
be integrated into Mahout’s processing pipeline to work with Mahout’s vectors.

- Removed Deprecated algorithms as they have been either replaced by better 
performing algorithms or lacked user support and maintenance.

- the usual bug fixes. See [2] for more information on the 0.9 release.

A total of 113 separate JIRA issues were addressed in this release.


The following algorithms that were marked deprecated in 0.8 have been removed 
in 0.9:

- From Clustering:
   Switched LDA implementation from using Dirtichlet to Collapsed Variational 
Bayes (CVB)

  Meanshift

  MinHash - removed due to poor performance,  lack of support and lack of usage


- From Classification (both are sequential implementations)

  Winnow - lack of actual usage and support

  Perceptron - lack of actual usage and support

- Collaborative Filtering

SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and 
org.apache.mahout.cf.taste.impl.recommender.slopeone
    Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo
    TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender

- Mahout Math

    Hadoop entropy stuff in org.apache.mahout.math.stats.entropy


CONTRIBUTING

Mahout is always looking for contributions focused on the 3Cs. If you are
interested in contributing, please see our contribution page 
http://mahout.apache.org/developers/how-to-contribute.html or contact us via 
email at dev@mahout.apache.org.


As the project moves towards a 1.0 release, the community will be focused on 
key algorithms that are proven to scale in production and have seen 
wide-spread adoption.

[1] 
http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661
[2] 

Re: Mahout 0.9 Release Notes - First Draft

2014-02-18 Thread Suneel Marthi
Below r the release notes, not sure where they should be going on the website. 
If someone could point me to a location I will go ahead and update the same.

=

The Apache Mahout PMC is pleased to announce the release of Mahout 0.9.
Mahout's goal is to build scalable machine learning libraries focused
primarily in the areas of
 collaborative filtering (recommenders),
clustering and classification (known collectively as the 3Cs), as well as the
necessary infrastructure to support those implementations including, but
not limited to, math packages for statistics, linear algebra and others
as well as Java primitive collections, local and distributed vector and
matrix classes and a variety of integrative code to work with popular
packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache
Cassandra and much more. The 0.9 release is mainly a clean up release in
preparation for an upcoming 1.0 release targeted for first half of 2014, but 
there are a few
significant new features, which are highlighted below.

To get started with Apache Mahout 0.9, download the release artifacts and 
signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central 
Maven repository.

As with any release, we wish to thank all of the users and
 contributors
to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for
individual credits, as there are too many to list here.

GETTING STARTED

In the release package, the examples directory contains several working 
examples of the core
functionality available in Mahout. These can be run via scripts in the 
examples/bin
directory and will prompt you for more information to help you try things out. 
Most examples do not need a Hadoop cluster in order to run.

RELEASE HIGHLIGHTS

The highlights of the Apache Mahout 0.9 release include, but are not
limited to the list below. For further information, see the included 
CHANGELOG[1] file.

-  MAHOUT-1245: A new and improved Mahout website based on Apache CMS
-  MAHOUT-1265: MultiLayer Perceptron (MLP) classifier 
   This is an early implementation of MLP to solicit user feedback, needs to be 
integrated into Mahout’s
 processing pipeline to work with Mahout’s vectors.
-  MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra.  See 
http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
-  MAHOUT-1288: Recommenders as a Search.  See 
https://github.com/pferrel/solr-recommender
-  MAHOUT-1300: Suport for easy functional Matrix views and derivatives
-  MAHOUT-1343: JSON output format for ClusterDumper
-  MAHOUT-1345: Enable randomised testing for all Mahout modules using Carrot 
RandomizedRunner. 
-  MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 
1-dimensional Clustering.  See 
https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
 for the details.
-  MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1


- Removed Deprecated algorithms as they have been either replaced by better 
performing algorithms or
 lacked user support and maintenance.

- the usual bug fixes. See [2] for more information on the 0.9 release.

A total of 113 separate JIRA issues were addressed in this release.

The following algorithms that were marked deprecated in 0.8 have been removed 
in 0.9:

- From Clustering:
   Switched LDA implementation from using Gibbs Sampling to Collapsed 
Variational Bayes (CVB)

  Meanshift

  MinHash - removed due to poor performance,  lack of support and lack of usage

- From Classification (both are sequential implementations)

  Winnow - lack of actual usage and support

  Perceptron - lack of actual usage and support

- Collaborative Filtering
SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and 
org.apache.mahout.cf.taste.impl.recommender.slopeone
    Distributed pseudo recommender in
 org.apache.mahout.cf.taste.hadoop.pseudo
    TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender

- Mahout Math
    Hadoop entropy stuff in org.apache.mahout.math.stats.entropy


CONTRIBUTING

Mahout is always looking for contributions focused on the 3Cs. If you are
interested in contributing, please see our contribution page 
http://mahout.apache.org/developers/how-to-contribute.html or contact us via 
email at dev@mahout.apache.org.


As the project moves towards a 1.0 release, the community will be focused on 
key algorithms that are proven to scale in production and have seen wide-spread 
adoption. 

[1] 
http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661
[2] 
https://issues.apache.org/jira/browse/MAHOUT-1411?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.9%22





On Monday, February 17, 2014 3:27 PM, Ellen Friedman 
b.ellen.fried...@gmail.com wrote:
 

Hi Suneel,

Thanks for notes. I'm inquiring about status of the notes and update to the 
website to announce 0.9: Ted has reviewed the release notes - were you waiting 

Re: Mahout 0.9 Release Notes - First Draft

2014-02-17 Thread Ted Dunning
On Tue, Feb 11, 2014 at 10:06 AM, Suneel Marthi suneel_mar...@yahoo.comwrote:

Switched LDA implementation from using Dirtichlet to Collapsed
 Variational Bayes (CVB)


This line should read:

Switched LDA implementation from using Gibb's sampling to Collapsed
Variational Bayes (CVB)


Otherwise, it looks pretty good.


Re: Mahout 0.9 Release Notes - First Draft

2014-02-17 Thread Ellen Friedman
Hi Suneel,

Thanks for notes. I'm inquiring about status of the notes and update to the
website to announce 0.9: Ted has reviewed the release notes - were you
waiting for additional input or are they ready to go on the website? Are
you the one who updates the site?

I've been asked to write a short blog on the release but wanted to wait
until the site is updated.

Thanks much
Ellen



On Tue, Feb 11, 2014 at 10:06 AM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 Here's a draft of the Release Notes for Mahout 0.9, Please review the same.

 --


 The Apache Mahout PMC is pleased to announce the release of Mahout 0.9.
 Mahout's goal is to build scalable machine learning libraries focused
 primarily in the areas of collaborative filtering (recommenders),
 clustering and classification (known collectively as the 3Cs), as well
 as the
 necessary infrastructure to support those implementations including, but
 not limited to, math packages for statistics, linear algebra and others
 as well as Java primitive collections, local and distributed vector and
 matrix classes and a variety of integrative code to work with popular
 packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache
 Cassandra and much more. The 0.9 release is mainly a clean up release in
 preparation for an upcoming 1.0 release targeted for first half of 2014,
 but there are a few
 significant new features, which are highlighted below.

 To get started with Apache Mahout 0.9, download the release artifacts and
 signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the
 central Maven repository.

 As with any release, we wish to thank all of the users and contributors
 to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for
 individual credits, as there are too many to list here.

 GETTING STARTED

 In the release package, the examples directory contains several working
 examples of the core
 functionality available in Mahout. These can be run via scripts in the
 examples/bin
 directory and will prompt you for more information to help you try things
 out.
 Most examples do not need a Hadoop cluster in order to run.

 RELEASE HIGHLIGHTS

 The highlights of the Apache Mahout 0.9 release include, but are not
 limited to the list below. For further information, see the included
 CHANGELOG[1] file.

 -  MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra.
See
 http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
 -  MAHOUT-1288: Recommenders as a Search.  See
 https://github.com/pferrel/solr-recommender
 -  MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1
 -  MAHOUT-1361: Online Algorithm for computing accurate Quantiles using
 1-dimensional Clustering
   See
 https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdffor
  the details.
 -  MAHOUT-1265: MultiLayer Perceptron (MLP) classifier
This is an early implementation of MLP to solicit user feedback, needs
 to be integrated into Mahout's processing pipeline to work with Mahout's
 vectors.

 - Removed Deprecated algorithms as they have been either replaced by
 better performing algorithms or lacked user support and maintenance.

 - the usual bug fixes. See [2] for more information on the 0.9 release.

 A total of 113 separate JIRA issues were addressed in this release.

 The following algorithms that were marked deprecated in 0.8 have been
 removed in 0.9:

 - From Clustering:
Switched LDA implementation from using Dirtichlet to Collapsed
 Variational Bayes (CVB)

   Meanshift

   MinHash - removed due to poor performance,  lack of support and lack of
 usage

 - From Classification (both are sequential implementations)

   Winnow - lack of actual usage and support

   Perceptron - lack of actual usage and support

 - Collaborative Filtering
 SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone
 and org.apache.mahout.cf.taste.impl.recommender.slopeone
 Distributed pseudo recommender in
 org.apache.mahout.cf.taste.hadoop.pseudo
 TreeClusteringRecommender in
 org.apache.mahout.cf.taste.impl.recommender

 - Mahout Math
 Hadoop entropy stuff in org.apache.mahout.math.stats.entropy

 CONTRIBUTING

 Mahout is always looking for contributions focused on the 3Cs. If you are
 interested in contributing, please see our contribution page
 http://mahout.apache.org/developers/how-to-contribute.html or contact us
 via email at dev@mahout.apache.org.

 As the project moves towards a 1.0 release, the community will be focused
 on key algorithms that are proven to scale in production and have seen
 wide-spread adoption.

 [1]
 http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661
 [2]
 https://issues.apache.org/jira/browse/MAHOUT-1411?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.9%22








 On Monday, December 23, 2013 7:41 PM, Dmitriy Lyubimov dlie...@gmail.com
 wrote:

 On Sun, Dec 22, 2013 at 

Re: Mahout 0.9 Release Notes - First Draft

2014-02-11 Thread Suneel Marthi
Here's a draft of the Release Notes for Mahout 0.9, Please review the same.

--


The Apache Mahout PMC is pleased to announce the release of Mahout 0.9.
Mahout's goal is to build scalable machine learning libraries focused
primarily in the areas of collaborative filtering (recommenders),
clustering and classification (known collectively as the 3Cs), as well as the
necessary infrastructure to support those implementations including, but
not limited to, math packages for statistics, linear algebra and others
as well as Java primitive collections, local and distributed vector and
matrix classes and a variety of integrative code to work with popular
packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache
Cassandra and much more. The 0.9 release is mainly a clean up release in
preparation for an upcoming 1.0 release targeted for first half of 2014, but 
there are a few
significant new features, which are highlighted below.

To get started with Apache Mahout 0.9, download the release artifacts and 
signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central 
Maven repository.

As with any release, we wish to thank all of the users and contributors
to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for
individual credits, as there are too many to list here.

GETTING STARTED

In the release package, the examples directory contains several working 
examples of the core
functionality available in Mahout. These can be run via scripts in the 
examples/bin
directory and will prompt you for more information to help you try things out. 
Most examples do not need a Hadoop cluster in order to run.

RELEASE HIGHLIGHTS

The highlights of the Apache Mahout 0.9 release include, but are not
limited to the list below. For further information, see the included 
CHANGELOG[1] file.

-  MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra.
   See 
http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
-  MAHOUT-1288: Recommenders as a Search.  See 
https://github.com/pferrel/solr-recommender
-  MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1
-  MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 
1-dimensional Clustering
  See 
https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
 for the details.
-  MAHOUT-1265: MultiLayer Perceptron (MLP) classifier 
   This is an early implementation of MLP to solicit user feedback, needs to be 
integrated into Mahout’s processing pipeline to work with Mahout’s vectors.

- Removed Deprecated algorithms as they have been either replaced by better 
performing algorithms or lacked user support and maintenance.

- the usual bug fixes. See [2] for more information on the 0.9 release.

A total of 113 separate JIRA issues were addressed in this release.

The following algorithms that were marked deprecated in 0.8 have been removed 
in 0.9:

- From Clustering:
   Switched LDA implementation from using Dirtichlet to Collapsed Variational 
Bayes (CVB)

  Meanshift

  MinHash - removed due to poor performance,  lack of support and lack of usage

- From Classification (both are sequential implementations)

  Winnow - lack of actual usage and support

  Perceptron - lack of actual usage and support

- Collaborative Filtering
SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and 
org.apache.mahout.cf.taste.impl.recommender.slopeone
    Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo
    TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender

- Mahout Math
    Hadoop entropy stuff in org.apache.mahout.math.stats.entropy

CONTRIBUTING

Mahout is always looking for contributions focused on the 3Cs. If you are
interested in contributing, please see our contribution page 
http://mahout.apache.org/developers/how-to-contribute.html or contact us via 
email at dev@mahout.apache.org.

As the project moves towards a 1.0 release, the community will be focused on 
key algorithms that are proven to scale in production and have seen wide-spread 
adoption. 

[1] 
http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661
[2] 
https://issues.apache.org/jira/browse/MAHOUT-1411?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.9%22








On Monday, December 23, 2013 7:41 PM, Dmitriy Lyubimov dlie...@gmail.com 
wrote:
 
On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter 

ssc.o...@googlemail.com wrote:


 
  - Mahout Math
      Lanczos in favour of SSVD

 IIRC, we agreed to not remove Lanczos, although it was initially
 deprecated. We should undeprecate it.


Some folks like Lanczos in Mahout (for reasons not really clear to me,
aside from accuracy when computing svd of a random noise, there are
actually 0 reasons to use Lanczos instead). I agree we don't  necessarily
want to cull it out -- but IMO there should be a clear steer posted in
favor of SSVD in the 

Re: Mahout 0.9 Release Notes - First Draft

2013-12-23 Thread Isabel Drost-Fromm
Hi,

one thing I forgot: you once mentioned running into issues with the new kmeans 
- are those fixed or tracked in jira? In case of the latter we should include a 
known issues/ call for helping hands section.

Isabel


Re: Mahout 0.9 Release Notes - First Draft

2013-12-23 Thread Dmitriy Lyubimov
On Sat, Dec 21, 2013 at 6:28 PM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 Hi All,

 Please see below the first draft of Release notes for Mahout 0.9. Please
 feel free to add/edit sections as u see fit.
 (This is a draft only).

 Regards,
 Suneel


 -


 The Apache Mahout PMC is pleased to announce the release of Mahout 0.9.
 Mahout's goal is to build scalable machine learning libraries focused
 primarily in the areas of collaborative filtering (recommenders),
 clustering and classification (known collectively as the 3Cs), as well
 as the
 necessary infrastructure to support those implementations including, but
 not limited to, math packages for statistics, linear algebra and others
 as well as Java primitive collections, local and distributed vector and
 matrix classes and a variety of integrative code to work with popular
 packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache
 Cassandra and much more. The 0.9 release is mainly a clean up release in
 preparation for an upcoming 1.0 release targeted for first half of 2014,
 but there are a few
 significant new features, which are highlighted below.

 To get started with Apache Mahout 0.9,
  download the release artifacts and signatures at
 http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven
 repository.

 In
  addition to the release highlights and artifacts, please pay attention
 to the section labelled FUTURE PLANS below for more information about
 upcoming releases of Mahout.

 As with any release, we wish to thank all of the users and contributors
 to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for
 individual credits, as there are too many to list here.

 GETTING STARTED

 In the release package, the examples directory contains several working
 examples of the core
 functionality available in Mahout. These can be run via scripts in the
 examples/bin
  directory and will prompt you for more information to help you try
 things out. Most examples do not need a Hadoop cluster in
 order to run.

 RELEASE HIGHLIGHTS

 The highlights of the Apache Mahout 0.9 release include, but are not
 limited to the list below. For further information, see the included
 CHANGELOG file.

 - Scala DSL Bindings for Mahout Math Linear Algebra (MAHOUT-1297).
See
 http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
 - New Multilayer Perceptron Classifier (MAHOUT-1265)
 - Recommenders as a Search (MAHOUT-1288).  See
 https://github.com/pferrel/solr-recommender
 - MAHOUT-1364: Upgrade Mahout to be Lucene 4.6.0 compliant
 - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using
 1-dimensional Clustering
   See
 https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdffor
  the details.

 - Removed Deprecated algorithms.

 - the usual bug fixes. See JIRA [?} for more information on the 0.9
 release.


 A total 91 separate JIRA issues were addressed in this release.

 The following algorithms that were marked deprecated in 0.8 have been
 removed in 0.9:

 - From Clustering:
   Dirichlet - replaced by Collapsible Variational Bayes (CVB)


I think the name of the method i commonly hear is Collapsed Variational
Bayes


   Meanshift

   MinHash - removed due to poor performance and lack of usage

   EigenCuts -


 - From Classification (both are sequential implementations)

   Winnow - lack of actual usage

   Perceptron - lack of actual usage


 - Frequent Pattern Mining

 - Collaborative Filtering
 All recommenders in org.apache.mahout.cf.taste.impl.recommender.knn
 SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone
 and org.apache.mahout.cf.taste.impl.recommender.slopeone
 Distributed pseudo recommender in
 org.apache.mahout.cf.taste.hadoop.pseudo
 TreeClusteringRecommender in
 org.apache.mahout.cf.taste.impl.recommender

 - Mahout Math
 Lanczos in favour of SSVD
 Hadoop entropy stuff in org.apache.mahout.math.stats.entropy

 If you are interested in supporting 1 or more of these algorithms, please
 make it known on dev@mahout.apache.org and via JIRA issues that fix
 and/or improve them. Please also provide
 supporting evidence as to their effectiveness for you in production.


 CONTRIBUTING

 Mahout
  is always looking for contributions focused on the 3Cs. If you are
 interested in contributing, please see our contribution page,
 https://cwiki.apache.org/MAHOUT/how-to-contribute.html, on the Mahout
 wiki or contact us via email at dev@mahout.apache.org.

 FUTURE PLANS

 1.0 Plans
 


 - New Downpour SGD classifier

 - Support for Finite State Transducers (FST) as a Dictionary Type.
 - Support for Hadoop 2.x
 - Port Mahout's recommenders to Spark (??)
 - Support for Java 7
 - Better API interfaces for Clustering
 - (what else???)


 As the project moves towards a 1.0 release, the community will be focused
 on
 key algorithms that are proven to scale in 

Re: Mahout 0.9 Release Notes - First Draft

2013-12-23 Thread Dmitriy Lyubimov
On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter 
ssc.o...@googlemail.com wrote:


 
  - Mahout Math
  Lanczos in favour of SSVD

 IIRC, we agreed to not remove Lanczos, although it was initially
 deprecated. We should undeprecate it.


Some folks like Lanczos in Mahout (for reasons not really clear to me,
aside from accuracy when computing svd of a random noise, there are
actually 0 reasons to use Lanczos instead). I agree we don't  necessarily
want to cull it out -- but IMO there should be a clear steer posted in
favor of SSVD in the docs/javadocs.


Re: Mahout 0.9 Release Notes - First Draft

2013-12-23 Thread Andrew Musselman
Suneel ran into some issues this weekend; I'm going to try it out and see if I 
can repro.

 On Dec 23, 2013, at 1:02 AM, Isabel Drost-Fromm isa...@apache.org wrote:
 
 Hi,
 
 one thing I forgot: you once mentioned running into issues with the new 
 kmeans - are those fixed or tracked in jira? In case of the latter we should 
 include a known issues/ call for helping hands section.
 
 Isabel


Re: Mahout 0.9 Release Notes - First Draft

2013-12-22 Thread Sebastian Schelter
Hi,

the draft looks good overall, I have some minor comments inline:

On 22.12.2013 03:28, Suneel Marthi wrote:
 Hi All,
 
 Please see below the first draft of Release notes for Mahout 0.9. Please feel 
 free to add/edit sections as u see fit.
 (This is a draft only).
 
 Regards,
 Suneel
 
 
 -
 
 
 The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. 
 Mahout's goal is to build scalable machine learning libraries focused 
 primarily in the areas of collaborative filtering (recommenders), 
 clustering and classification (known collectively as the 3Cs), as well as 
 the 
 necessary infrastructure to support those implementations including, but
 not limited to, math packages for statistics, linear algebra and others
 as well as Java primitive collections, local and distributed vector and
 matrix classes and a variety of integrative code to work with popular 
 packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache 
 Cassandra and much more. The 0.9 release is mainly a clean up release in
 preparation for an upcoming 1.0 release targeted for first half of 2014, but 
 there are a few
 significant new features, which are highlighted below.
 
 To get started with Apache Mahout 0.9,
  download the release artifacts and signatures at 
 http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven 
 repository. 
 
 In
  addition to the release highlights and artifacts, please pay attention 
 to the section labelled FUTURE PLANS below for more information about 
 upcoming releases of Mahout.
 
 As with any release, we wish to thank all of the users and contributors 
 to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for 
 individual credits, as there are too many to list here.
 
 GETTING STARTED
 
 In the release package, the examples directory contains several working 
 examples of the core 
 functionality available in Mahout. These can be run via scripts in the 
 examples/bin
  directory and will prompt you for more information to help you try 
 things out. Most examples do not need a Hadoop cluster in 
 order to run.
 
 RELEASE HIGHLIGHTS
 
 The highlights of the Apache Mahout 0.9 release include, but are not 
 limited to the list below. For further information, see the included 
 CHANGELOG file.
 
 - Scala DSL Bindings for Mahout Math Linear Algebra (MAHOUT-1297).
See 
 http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
 - New Multilayer Perceptron Classifier (MAHOUT-1265) 
 - Recommenders as a Search (MAHOUT-1288).  See 
 https://github.com/pferrel/solr-recommender
 - MAHOUT-1364: Upgrade Mahout to be Lucene 4.6.0 compliant
 - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 
 1-dimensional Clustering
   See 
 https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
  for the details.
 
 - Removed Deprecated algorithms.
 
 - the usual bug fixes. See JIRA [?} for more information on the 0.9 release.
 
 
 A total 91 separate JIRA issues were addressed in this release.
 
 The following algorithms that were marked deprecated in 0.8 have been removed 
 in 0.9:
 
 - From Clustering:
   Dirichlet - replaced by Collapsible Variational Bayes (CVB)

I think we switched our LDA implementation to use CVB and removed
Dirichlet clustering, those are two different things, right?

 
   Meanshift 
 
   MinHash - removed due to poor performance and lack of usage
 
   EigenCuts -
 
 
 - From Classification (both are sequential implementations)
 
   Winnow - lack of actual usage
 
   Perceptron - lack of actual usage 
 
 
 - Frequent Pattern Mining
 
 - Collaborative Filtering
 All recommenders in org.apache.mahout.cf.taste.impl.recommender.knn
 SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone 
 and org.apache.mahout.cf.taste.impl.recommender.slopeone
 Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo
 TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender

We should be careful, because the package knn could make people think we
removed our itembased recommenders (already caused confusion on twitter).

I think it would be sufficient to say we removed a couple of rarely used
recommenders, in particular SlopeOne.

 
 - Mahout Math
 Lanczos in favour of SSVD

IIRC, we agreed to not remove Lanczos, although it was initially
deprecated. We should undeprecate it.

 Hadoop entropy stuff in org.apache.mahout.math.stats.entropy
 
 If you are interested in supporting 1 or more of these algorithms, please 
 make it known on dev@mahout.apache.org and via JIRA issues that fix and/or 
 improve them. Please also provide 
 supporting evidence as to their effectiveness for you in production.
 
 
 CONTRIBUTING
 
 Mahout
  is always looking for contributions focused on the 3Cs. If you are 
 interested in contributing, please see our contribution page, 
 

Re: Mahout 0.9 Release Notes - First Draft

2013-12-22 Thread Ted Dunning
On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter 
ssc.o...@googlemail.com wrote:

  - From Clustering:
Dirichlet - replaced by Collapsible Variational Bayes (CVB)

 I think we switched our LDA implementation to use CVB and removed
 Dirichlet clustering, those are two different things, right?


Correct.


Mahout 0.9 Release Notes - First Draft

2013-12-21 Thread Suneel Marthi
Hi All,

Please see below the first draft of Release notes for Mahout 0.9. Please feel 
free to add/edit sections as u see fit.
(This is a draft only).

Regards,
Suneel


-


The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. 
Mahout's goal is to build scalable machine learning libraries focused 
primarily in the areas of collaborative filtering (recommenders), 
clustering and classification (known collectively as the 3Cs), as well as the 
necessary infrastructure to support those implementations including, but
not limited to, math packages for statistics, linear algebra and others
as well as Java primitive collections, local and distributed vector and
matrix classes and a variety of integrative code to work with popular 
packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache 
Cassandra and much more. The 0.9 release is mainly a clean up release in
preparation for an upcoming 1.0 release targeted for first half of 2014, but 
there are a few
significant new features, which are highlighted below.

To get started with Apache Mahout 0.9,
 download the release artifacts and signatures at 
http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven 
repository. 

In
 addition to the release highlights and artifacts, please pay attention 
to the section labelled FUTURE PLANS below for more information about 
upcoming releases of Mahout.

As with any release, we wish to thank all of the users and contributors 
to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for 
individual credits, as there are too many to list here.

GETTING STARTED

In the release package, the examples directory contains several working 
examples of the core 
functionality available in Mahout. These can be run via scripts in the 
examples/bin
 directory and will prompt you for more information to help you try 
things out. Most examples do not need a Hadoop cluster in 
order to run.

RELEASE HIGHLIGHTS

The highlights of the Apache Mahout 0.9 release include, but are not 
limited to the list below. For further information, see the included 
CHANGELOG file.

- Scala DSL Bindings for Mahout Math Linear Algebra (MAHOUT-1297).
   See 
http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
- New Multilayer Perceptron Classifier (MAHOUT-1265) 
- Recommenders as a Search (MAHOUT-1288).  See 
https://github.com/pferrel/solr-recommender
- MAHOUT-1364: Upgrade Mahout to be Lucene 4.6.0 compliant
- MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 
1-dimensional Clustering
  See 
https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
 for the details.

- Removed Deprecated algorithms.

- the usual bug fixes. See JIRA [?} for more information on the 0.9 release.


A total 91 separate JIRA issues were addressed in this release.

The following algorithms that were marked deprecated in 0.8 have been removed 
in 0.9:

- From Clustering:
  Dirichlet - replaced by Collapsible Variational Bayes (CVB)

  Meanshift 

  MinHash - removed due to poor performance and lack of usage

  EigenCuts -


- From Classification (both are sequential implementations)

  Winnow - lack of actual usage

  Perceptron - lack of actual usage 


- Frequent Pattern Mining

- Collaborative Filtering
    All recommenders in org.apache.mahout.cf.taste.impl.recommender.knn
    SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and 
org.apache.mahout.cf.taste.impl.recommender.slopeone
    Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo
    TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender

- Mahout Math
    Lanczos in favour of SSVD    
    Hadoop entropy stuff in org.apache.mahout.math.stats.entropy

If you are interested in supporting 1 or more of these algorithms, please make 
it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve 
them. Please also provide 
supporting evidence as to their effectiveness for you in production.


CONTRIBUTING

Mahout
 is always looking for contributions focused on the 3Cs. If you are 
interested in contributing, please see our contribution page, 
https://cwiki.apache.org/MAHOUT/how-to-contribute.html, on the Mahout wiki or 
contact us via email at dev@mahout.apache.org.

FUTURE PLANS

1.0 Plans



- New Downpour SGD classifier 

- Support for Finite State Transducers (FST) as a Dictionary Type.
- Support for Hadoop 2.x
- Port Mahout's recommenders to Spark (??)
- Support for Java 7
- Better API interfaces for Clustering
- (what else???)


As the project moves towards a 1.0 release, the community will be focused on
key algorithms that are proven to scale in production 
and have seen wide-spread adoption.  

Our plans as a community are to focus 1.0 on the support of algorithms and 
features listed above.
The support for the algorithms packaged in 1.0 for atleast two minor