Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT) Page: Monthly Progress (https://cwiki.apache.org/confluence/display/MAHOUT/Monthly+Progress)
Edited by Isabel Drost-Fromm: --------------------------------------------------------------------- h2. Board Report May 2013 (template taken from http://community.apache.org/boardreport.html ) Apache Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining Project Status -------------- The project continues to have a large and active user base. With the book Mahout in Action it has become simpler for beginners to get started using the project. Community --------- Beginning of 2013 was suprising quiet though with many committers and PMC members being swamped with tasks IRL (see also <http://markmail.org/thread/ju537zfwx3mrc6no>). In the past weeks traction on the dev list has started to increase with people otherwise watching starting to get active, volunteering to help with documentation, tests and even code patches. We added two new committers to the project: Suneel Marthi and Dan Filimon There are a few committers who volunteered to become GSoC mentors. As for them it will be the first year participating as mentors on behalf of Mahout they will need some guidance on what the process looks like at the ASF. Community Objectives -------------------- ... Releases -------- No releases since the last report. 0.8 is still under discussion. h2. Project Report for Q1 of 2013: February Summary in progress h3. Issues: Sean Owen wishes to leave the Mahout PMC (but retain his commit rights), but this is the only issue which needs the Board attention. h3. Current Activity: How has the community developed since the last report? In February: Originally planned for 0.8 release by March 8, but will be letting that slip forward a few weeks. h3. Selection of Presentations, Articles and Outreach: * Ted Dunning on new fast streaming clustering [http://www.slideshare.net/tdunning/news-frommahout20130305] * Fast clustering at ACM [http://www.slideshare.net/tdunning/acm-20130225] * Real time learning [http://www.slideshare.net/tdunning/real-time-learning] * MapR-Lucidworks on reflected intelligence [http://www.slideshare.net/tdunning/mapr-lucidworks-joint-webinar] * Ted Dunning at Strata on Mahout [http://www.slideshare.net/tdunning/strata-newyork2012] * Ted Dunning on fast clustering at Oxford [http://www.slideshare.net/tdunning/oxford-05oct2012] * MapR and Amex speak about large-scale analytics with Mahout [http://www.slideshare.net/tdunning/customer-analysisatscalestrata10022012] * Overstock and Mahout [http://www.wired.com/wiredenterprise/2012/12/mahout/] * Advanced Analytics in Mahout [http://portfortune.wordpress.com/2012/12/05/advanced-analytics-in-hadoop-part-one/] * London Data Science [http://datasciencelondon.org/tag/mahout/] * Mahout Updated in CDH 4.1 [http://blog.cloudera.com/blog/2012/11/whats-new-in-cdh4-1-mahout/] h3. Scientific publications based on Mahout * _Sebastian Schelter, Sean Owen: Collaborative Filtering with Apache Mahout_, Recommender Systems Challenge Workshop in conjunction with ACM RecSys 2012 [pdf|http://ssc.io/wp-content/uploads/2013/02/cf-mahout.pdf] * _Sebastian Schelter, Christoph Boden, Volker Markl: Scalable Similarity-Based Neighborhood Methods with MapReduce_, ACM Conference on Recommender Systems 2012, Dublin [ACM|http://dl.acm.org/citation.cfm?id=2365984] [pdf|http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf] h3. Code We were able to attract the developer of one of the leading scientific recommender libraries [http://mymedialite.net/] to port a few implementations to Mahout ([MAHOUT-1106|https://issues.apache.org/jira/browse/MAHOUT-1106], [MAHOUT-1089|https://issues.apache.org/jira/browse/MAHOUT-1089]) However, new code contributions have slowed to a crawl, the number of commits in the past few months, compared to prior years: Feb 2013, 7 Jan 2013, 20 Dec 2012, 7 Feb 2012, 98 Jan 2012, 27 Dec 2011, 99 Feb 2011, 35 Jan 2011, 52 Dec 2010, 37 Feb 2010, 207 Jan 2010, 132 Dec 2009, 135 h4. New Commercial Integrations * Predixion Readmission Insight, a "a preventable readmission healthcare solution" announced [http://www.virtual-strategy.com/2013/03/05/predixion-software-wins-microsoft-health-users-group-innovation-award] integration with Mahout, Greenplumb, Hive, and Microsoft's BI stack. * Overstock and Mahout [http://www.wired.com/wiredenterprise/2012/12/mahout/] h4. New Open Source Integrations * The recommendation and advertisement network [plista|http://www.plista.com/en] has built an open source weblayer for Mahout's recommenders [https://github.com/plista/kornakapi] * Mahout seems to be the framework of choice for [PredictionIO|http://prediction.io/], an open source prediction server for software developers to create predictive features, such as personalization, recommendation and content discovery h3. Mailing List Summary: User list discussions are currently focussed primarily on bug reporting and helping new users, but very little about future feature work. h5. Developer Mailing List Posting: [http://mail-archives.apache.org/mod_mbox/mahout-dev/] February 2013, 123 January 2013, 213 Dec 2012, 155 as compared to the same months in previous years: Feb 2012, 578 Jan 2012, 545 Dec 2011, 1079 and Feb 2011, 352 Jan 2011, 473 Dec 2010, 267 We've not had this low developer involvement since the first half of 2009. h5. User Mailing List Posting [http://mail-archives.apache.org/mod_mbox/mahout-user/] User list discussions are primarily in support of very new users, as well as bug reporting on released versions (0.6 and sometimes even 0.5), highlighting the need for 0.8 to be released. While the traffic to the user mailing list has gone down slightly from previous years: Feb 2012, 288 Jan 2012, 367 Feb 2011, 359 Jan 2011, 458 Feb 2010, 497 Jan 2010, 272 This is not a dramatic decrease, as there is still considerable interest in the user community. h3. Summary: How has the project developed since the last report: A 1.0 release is not yet on the horizon. == Milestones == 1.) Working towards a 0.8 release 2.) Development on new, faster clustering code Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action
