Hello,

first things first - I'm personally all for adding more committers. During the 
past few months I had to make somewhat the same experience that Grant 
describes: When constraint time-wise it is really hard to move more complex 
issues forward even if there are patches available. Bringing in more people 
who are already familiar with the code base from a users perspective could 
bring more time to the table.


On Sunday, March 24, 2013 03:11:22 PM Benson Margulies wrote:
> Mahout has, if not an identity crisis, an identity question.

Or put another way: Good times for people who don't have the commit bit yet to 
chime in and provide their perspective.


> One model would be the 'commons' model: a compendium of algorithms with
> something in common. As Isabel eloquently points out, however, a 'commons'
> is very challenging from a support and management standpoint, because the
> active community has to, somehow, provide support into the indefinite
> future for an ever-growing body of complex algorithms.
> 
> At Apache Commons itself, this issue is addressed (in part) by a very high
> bar for inclusion of code, as some members of this community have
> discovered.

That doesn't sound too far off from one of the original ideas of splitting the 
code base into separate sub modules and having a common math base underneath. 

Correct me if I'm wrong but most people seem to understand machine learning in 
general only on a use case driven way. The split in recommendations, 
classification and clustering (which is also what the book already covers) 
makes that approachable - though currently not really reflected in the code.

I'd very much appreciate learning from the "commons" perspective here.


> Another model would be to severely tighten the focus to a set of related
> functionality. At the extreme, that could imply a tight focus on
> recommendation.

We've tried this a couple of times. I think it did help a great deal to cut 
out unused code last summer. There are still a few areas that could be removed 
and focussed down.

I would really love to hear users' input here on what is actually being used 
and found to be useful out in the wild.

However I'm not entirely sure that we could tighten the focus to just mean 
recommendations. What do others think here?


> Yet another approach would be to focus on the Hadoop framework for a
> certain area of NLP, more than an ever-growing collection of the algorithms
> themselves.
>
> Mostly, I want to emphasize my support for Isabel's view that being a sort
> of general store of NLP-ish Hadoop-ish algorithms is going to be hard.

Currently Mahout cannot decide whether it's Hadoop or not. It also cannot 
decide whether it's NLP only or not. To that seems like a hard to sell 
message.

What about an experiment: If you (reading this mail) were to write a two 
sentence vision statement for Mahout as you see it - what would that be?


Isabel




Reply via email to