Mahout has, if not an identity crisis, an identity question. One model would be the 'commons' model: a compendium of algorithms with something in common. As Isabel eloquently points out, however, a 'commons' is very challenging from a support and management standpoint, because the active community has to, somehow, provide support into the indefinite future for an ever-growing body of complex algorithms.
At Apache Commons itself, this issue is addressed (in part) by a very high bar for inclusion of code, as some members of this community have discovered. Another model would be to severely tighten the focus to a set of related functionality. At the extreme, that could imply a tight focus on recommendation. Yet another approach would be to focus on the Hadoop framework for a certain area of NLP, more than an ever-growing collection of the algorithms themselves. Mostly, I want to emphasize my support for Isabel's view that being a sort of general store of NLP-ish Hadoop-ish algorithms is going to be hard. On Sun, Mar 24, 2013 at 1:45 PM, Saikat Kanjilal <sxk1...@hotmail.com>wrote: > Hey Guys, > I've definitely been interested in being a committer for a while now, have > build services around a few of the algorithms, however I'm usually at a > loss on where to start, maybe docs, I'm interested in building a neural net > or genetic algorithm implementation as well as building out an > infrastructure that surrounds mahout /graphlab that allows non technical > analysts to train data and pick algorithms and make tradeoffs with results. > My goal is to build a tool around low level frameworks that bridges the > gap for analysts to setup a recommendations plugin that can be embedded > into any ecommerce app. Would love to hear best places I can help that are > of immediate need. > > Sent from my iPhone > > On Mar 24, 2013, at 10:36 AM, Grant Ingersoll <gsing...@apache.org> wrote: > > > > > On Mar 24, 2013, at 1:31 PM, Sebastian Schelter wrote: > > > >> Hi Grant, > >> > >> how would/could such a scale back look like? > > > > It's a good question and I don't have a good answer. The Recommender > stuff always seems to be the most active (you and Sean do a ton of work!), > so that is one possibility, but I can't see I really like it, since I'm a > heavy user of both clustering and classification (but I really restrict > myself to what I know works). I also use the colocation work and it pretty > much just works too, so that covers a lot of the code base for me. > > > > In the end, it probably doesn't make sense to scale back, but instead > look at getting more committers on board sooner rather than later. > > > > -Grant > > > > > > > >> > >> Best, > >> Sebastian > >> > >> > >> On 24.03.2013 18:30, Grant Ingersoll wrote: > >>> Personally, I think the bigger issue is that most of the committers > (me included) are not very active, so we either need to identify other > committers sooner rather than later or really scale back the project to > just those areas where we have activity. > >>> > >>> I know I struggle to find time to contribute, esp. in moving the ball > forward on issues that are non-trivial usually requires a significant > amount of effort to understand the math, etc. > >>> > >>> > >>> On Mar 24, 2013, at 6:08 AM, Isabel Drost-Fromm wrote: > >>> > >>>> > >>>> > >>>> Hello, > >>>> > >>>> > >>>> this is to those of you using Mahout or lurking on the mailing list > somewhere. > >>>> > >>>> > >>>> In the current Mahout board report [1] it became apparent that Mahout > has a > >>>> large number of users. However looking at the dev list there's barely > any > >>>> activity left: Committers including myself are drowning in help > requests that > >>>> they cannot keep up with or have been surprised by life taking away > more of > >>>> their time than a few months and years ago. Contributors wait for > long until > >>>> they get feedback on patches getting frustrated along the way. > >>>> > >>>> > >>>> In the software world if there are no more resources to support a > released > >>>> version that version usually is marked as “no longer maintaned”, being > >>>> subsequently retired and replaced with a new version. > >>>> > >>>> > >>>> At Apache projects that are lacking resources, energy and support go > through a > >>>> similar process: Usually they get moved into the Attic – which means > that > >>>> mailing lists are closed though archives remain searchable, bug > trackers are > >>>> marked as read only. Honestly as a project founder my personal goal > for Mahout > >>>> always was to build a sustainable community that would survive core > people > >>>> having less time for the project at some point in time. I'd be > distressed to > >>>> see Mahout go to the Attic. > >>>> > >>>> > >>>> If you are an active Mahout user and want to help – what can you do? > >>>> > >>>> > >>>> At the current point Mahout doesn't need any new algorithms (though > high > >>>> quality contributions that come with people maintaining them within > the > >>>> project are of course welcome). What the project needs is much > simpler even > >>>> for beginners: > >>>> > >>>> > >>>> - help answering mails on both dev and user list > >>>> > >>>> - help reviewing patches that come in: Having another contributor say > “yes, > >>>> this looks valuable and correct” can be a big help for committers – > and can be > >>>> the first step for you to become one yourself. > >>>> > >>>> - help with documentation – both for developers and users of the > project. > >>>> > >>>> - help with structuring documentation to make it easier for others to > find the > >>>> relevant information. > >>>> > >>>> - help with making our build faster and easier: There are a few quick > wins in > >>>> terms of long running unit tests, there certainly are areas that lack > testing. > >>>> > >>>> - help with code cleanup – there are areas that do not adhere to our > coding > >>>> conventions (standard Java, but with two spaces for indentation) – > make > >>>> changes in small batches > >>>> > >>>> - help with optimising existing implementations > >>>> > >>>> - if you truly believe that your algorithm or implementation is > faster: Be > >>>> bold. Prove that it really is faster for all relevant use cases and > work with > >>>> the community to replace existing code with your optimised version. > >>>> > >>>> > >>>> Also help with what areas you are using and what exactly you see > missing is > >>>> welcome. > >>>> > >>>> > >>>> It would be awesome to see Mahout gain activity. But in order to > achieve that > >>>> the project really does need your help. > >>>> > >>>> > >>>> > >>>> > >>>> Isabel > >>>> > >>>> > >>>> [1] < > https://cwiki.apache.org/confluence/display/MAHOUT/Monthly+Progress> > >>> > >>> -------------------------------------------- > >>> Grant Ingersoll | @gsingers > >>> http://www.lucidworks.com > > > > -------------------------------------------- > > Grant Ingersoll | @gsingers > > http://www.lucidworks.com > > > > > > > > > > >