Re: 0.2 status

Isabel Drost Thu, 12 Nov 2009 01:55:27 -0800

Adding and revising a little:

Apache Mahout 0.2 has been released and is now available for public
download at http://www.apache.org/dyn/closer.cgi/lucene/mahout


Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/


Apache Mahout is a subproject of Apache Lucene with the goal
of delivering scalable machine learning algorithm implementations
under the Apache license. http://www.apache.org/licenses/LICENSE-2.0

> Mahout is a machine learning library meant to scale to the size of
> data we manage today. Built on top of the powerful map/reduce
> paradigm of Apache Hadoop project, Mahout lets you run popular
> machine learning methods like clustering, collaborative filtering,
> classification over Terabytes of data over thousands of computers.

> <- We may want to emphasize that using Mahout makes sense also for
> those people that do not have clusters with thousands of nodes?

Mahout is a machine learning library meant to scale: Scale in terms of
community to support anyone interested in using machine learning. Scale
in terms of business by providing the library under a commercially
friendly, free software license. Scale in terms of computation to the
size of data we manage today.

Built on top of the powerful map/reduce paradigm of the Apache Hadoop
project, Mahout lets you solve popular machine learning problem
settings like clustering, collaborative filtering and classification
over Terabytes of data over thousands of computers.

Implemented with scalability in mind the latest release brings many
performance optimizations so that even in a single node setup the
library performs well.

> <- As mentioned earlier by Grant, we do need performance benchmarks at
> least for the the next release to prove that.


The complete changelist can be found here:
http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include
 
- Major performance enhancements in Collaborative Filtering,
Classification and Clustering
- New: Latent Dirichlet Allocation(LDA) implementation for topic
modelling
- New: Frequent Itemset Mining for mining top-k patterns from a list
of transactions
- New: Decision Forests implementation for Decision Tree classification
(In Memory & Partial Data)
- New: HBase storage support for Naive Bayes model building and
classification
- New: Generation of vectors from Text documents for use with Mahout
Algorithms
- Performance improvements in various Vector implementations
- Tons of bug fixes and code cleanup

Getting started: New to Mahout? 

1) Download Mahout at http://www.apache.org/dyn/closer.cgi/lucene/mahout
2) Check out the Quick start:
http://cwiki.apache.org/MAHOUT/quickstart.html 

3) Read the Mahout Wiki: http://cwiki.apache.org/MAHOUT
4) Join the community by subscribing to mahout-u...@lucene.apache.org
5) Give back: http://www.apache.org/foundation/getinvolved.html
6) Consider adding yourself to the power by Wiki page:
http://cwiki.apache.org/MAHOUT/poweredby.html

For more information on Apache Mahout, see
http://lucene.apache.org/mahout


Additional comment: I suppose, I will copy this over to my personal
blog once the release is out. I would like to invite those interested
in or using Mahout to do so as well.

Re: 0.2 status

Reply via email to