Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: Monthly Progress 
(https://cwiki.apache.org/confluence/display/MAHOUT/Monthly+Progress)


Edited by Isabel Drost-Fromm:
---------------------------------------------------------------------
h2. Board Report May 2013

(template taken from http://community.apache.org/boardreport.html )

Apache Mahout has implementations of a wide range of machine learning and data 
mining algorithms: clustering, classification, collaborative filtering and 
frequent pattern mining

Project Status
--------------

The project continues to have a large and active user base. With the book 
Mahout in Action it has become simpler for beginners to get started using the 
project.

Community
---------

Beginning of 2013 was suprising quiet though with many committers and PMC 
members being swamped with tasks IRL (see also 
<http://markmail.org/thread/ju537zfwx3mrc6no>).

In the past weeks traction on the dev list has started to increase with people 
otherwise watching starting to get active, volunteering to help with 
documentation, tests and even code patches.

We added two new committers to the project: Suneel Marthi and Dan Filimon

There are a few committers who volunteered to become GSoC mentors. As for them 
it will be the first year participating as mentors on behalf of Mahout they 
will need some guidance on what the process looks like at the ASF.


Community Objectives
--------------------

...


Releases
--------

No releases since the last report. 0.8 is still under discussion.


h2. Project Report for Q1 of 2013: February Summary in progress

h3. Issues:

Sean Owen wishes to leave the Mahout PMC (but retain his commit rights), but 
this is the only issue which needs the Board attention.

h3. Current Activity: How has the community developed since the last report? In 
February:

Originally planned for 0.8 release by March 8, but will be letting that slip 
forward a few weeks.

h3. Selection of Presentations, Articles and Outreach:

* Ted Dunning on new fast streaming clustering 
[http://www.slideshare.net/tdunning/news-frommahout20130305]
* Fast clustering at ACM [http://www.slideshare.net/tdunning/acm-20130225]
* Real time learning [http://www.slideshare.net/tdunning/real-time-learning]
* MapR-Lucidworks on reflected intelligence 
[http://www.slideshare.net/tdunning/mapr-lucidworks-joint-webinar]
* Ted Dunning at Strata on Mahout 
[http://www.slideshare.net/tdunning/strata-newyork2012]
* Ted Dunning on fast clustering at Oxford 
[http://www.slideshare.net/tdunning/oxford-05oct2012]
* MapR and Amex speak about large-scale analytics with Mahout 
[http://www.slideshare.net/tdunning/customer-analysisatscalestrata10022012]
* Overstock and Mahout [http://www.wired.com/wiredenterprise/2012/12/mahout/]
* Advanced Analytics in Mahout 
[http://portfortune.wordpress.com/2012/12/05/advanced-analytics-in-hadoop-part-one/]
* London Data Science [http://datasciencelondon.org/tag/mahout/]
* Mahout Updated in CDH 4.1 
[http://blog.cloudera.com/blog/2012/11/whats-new-in-cdh4-1-mahout/]

h3. Scientific publications based on Mahout

* _Sebastian Schelter, Sean Owen: Collaborative Filtering with Apache Mahout_,
Recommender Systems Challenge Workshop in conjunction with ACM RecSys 2012 
[pdf|http://ssc.io/wp-content/uploads/2013/02/cf-mahout.pdf]
* _Sebastian Schelter, Christoph Boden, Volker Markl: Scalable Similarity-Based 
Neighborhood Methods with MapReduce_, 
ACM Conference on Recommender Systems 2012, Dublin 
[ACM|http://dl.acm.org/citation.cfm?id=2365984] 
[pdf|http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf]

h3. Code

We were able to attract the developer of one of the leading scientific 
recommender libraries [http://mymedialite.net/] to port a few implementations 
to Mahout ([MAHOUT-1106|https://issues.apache.org/jira/browse/MAHOUT-1106], 
[MAHOUT-1089|https://issues.apache.org/jira/browse/MAHOUT-1089])

However, new code contributions have slowed to a crawl, the number of commits 
in the past few months, compared to prior years:

Feb 2013, 7
Jan 2013, 20
Dec 2012, 7

Feb 2012, 98
Jan 2012, 27
Dec 2011, 99

Feb 2011, 35
Jan 2011, 52
Dec 2010, 37

Feb 2010, 207
Jan 2010, 132
Dec 2009, 135

h4. New Commercial Integrations

* Predixion Readmission Insight, a "a preventable readmission healthcare 
solution" announced 
[http://www.virtual-strategy.com/2013/03/05/predixion-software-wins-microsoft-health-users-group-innovation-award]
 integration with Mahout, Greenplumb, Hive, and Microsoft's BI stack.
* Overstock and Mahout [http://www.wired.com/wiredenterprise/2012/12/mahout/]

h4. New Open Source Integrations

* The recommendation and advertisement network 
[plista|http://www.plista.com/en] has built an open source weblayer for 
Mahout's recommenders [https://github.com/plista/kornakapi]
* Mahout seems to be the framework of choice for 
[PredictionIO|http://prediction.io/], an open source prediction server for 
software developers to create predictive features, such as personalization, 
recommendation and content discovery


h3. Mailing List Summary:

User list discussions are currently focussed primarily on bug reporting and 
helping new users, but very little about future feature work.

h5. Developer Mailing List Posting:

[http://mail-archives.apache.org/mod_mbox/mahout-dev/]
February 2013, 123
January 2013, 213
Dec 2012, 155

as compared to the same months in previous years:
Feb 2012, 578
Jan 2012, 545
Dec 2011, 1079

and

Feb 2011, 352 
Jan 2011, 473
Dec 2010, 267

We've not had this low developer involvement since the first half of 2009.

h5. User Mailing List Posting

[http://mail-archives.apache.org/mod_mbox/mahout-user/]
User list discussions are primarily in support of very new users, as well as 
bug reporting on released versions (0.6 and sometimes even 0.5), highlighting 
the need for 0.8 to be released.

While the traffic to the user mailing list has gone down slightly from previous 
years:

Feb 2012, 288
Jan 2012, 367

Feb 2011, 359
Jan 2011, 458

Feb 2010, 497
Jan 2010, 272

This is not a dramatic decrease, as there is still considerable interest in the 
user community.

h3. Summary: How has the project developed since the last report:

A 1.0 release is not yet on the horizon.

== Milestones ==
1.) Working towards a 0.8 release
2.) Development on new, faster clustering code

Change your notification preferences: 
https://cwiki.apache.org/confluence/users/viewnotifications.action    

Reply via email to