Mahout has, if not an identity crisis, an identity question.

One model would be the 'commons' model: a compendium of algorithms with
something in common. As Isabel eloquently points out, however, a 'commons'
is very challenging from a support and management standpoint, because the
active community has to, somehow, provide support into the indefinite
future for an ever-growing body of complex algorithms.

At Apache Commons itself, this issue is addressed (in part) by a very high
bar for inclusion of code, as some members of this community have
discovered.

Another model would be to severely tighten the focus to a set of related
functionality. At the extreme, that could imply a tight focus on
recommendation.

Yet another approach would be to focus on the Hadoop framework for a
certain area of NLP, more than an ever-growing collection of the algorithms
themselves.

Mostly, I want to emphasize my support for Isabel's view that being a sort
of general store of NLP-ish Hadoop-ish algorithms is going to be hard.





On Sun, Mar 24, 2013 at 1:45 PM, Saikat Kanjilal <sxk1...@hotmail.com>wrote:

> Hey Guys,
> I've definitely been interested in being a committer for a while now, have
> build services around a few of the algorithms, however I'm usually at a
> loss on where to start, maybe docs, I'm interested in building a neural net
> or genetic algorithm implementation as well as building out an
> infrastructure that surrounds mahout /graphlab that allows non technical
> analysts to train data and pick algorithms and make tradeoffs with results.
>  My goal is to build a tool around low level frameworks that bridges the
> gap for analysts to setup a recommendations plugin that can be embedded
> into any ecommerce app.  Would love to hear best places I can help that are
> of immediate need.
>
> Sent from my iPhone
>
> On Mar 24, 2013, at 10:36 AM, Grant Ingersoll <gsing...@apache.org> wrote:
>
> >
> > On Mar 24, 2013, at 1:31 PM, Sebastian Schelter wrote:
> >
> >> Hi Grant,
> >>
> >> how would/could such a scale back look like?
> >
> > It's a good question and I don't have a good answer.  The Recommender
> stuff always seems to be the most active (you and Sean do a ton of work!),
> so that is one possibility, but I can't see I really like it, since I'm a
> heavy user of both clustering and classification (but I really restrict
> myself to what I know works).  I also use the colocation work and it pretty
> much just works too, so that covers a lot of the code base for me.
> >
> > In the end, it probably doesn't make sense to scale back, but instead
> look at getting more committers on board sooner rather than later.
> >
> > -Grant
> >
> >
> >
> >>
> >> Best,
> >> Sebastian
> >>
> >>
> >> On 24.03.2013 18:30, Grant Ingersoll wrote:
> >>> Personally, I think the bigger issue is that most of the committers
> (me included) are not very active, so we either need to identify other
> committers sooner rather than later or really scale back the project to
> just those areas where we have activity.
> >>>
> >>> I know I struggle to find time to contribute, esp. in moving the ball
> forward on issues that are non-trivial usually requires a significant
> amount of effort to understand the math, etc.
> >>>
> >>>
> >>> On Mar 24, 2013, at 6:08 AM, Isabel Drost-Fromm wrote:
> >>>
> >>>>
> >>>>
> >>>> Hello,
> >>>>
> >>>>
> >>>> this is to those of you using Mahout or lurking on the mailing list
> somewhere.
> >>>>
> >>>>
> >>>> In the current Mahout board report [1] it became apparent that Mahout
> has a
> >>>> large number of users. However looking at the dev list there's barely
> any
> >>>> activity left: Committers including myself are drowning in help
> requests that
> >>>> they cannot keep up with or have been surprised by life taking away
> more of
> >>>> their time than a few months and years ago. Contributors wait for
> long until
> >>>> they get feedback on patches getting frustrated along the way.
> >>>>
> >>>>
> >>>> In the software world if there are no more resources to support a
> released
> >>>> version that version usually is marked as “no longer maintaned”, being
> >>>> subsequently retired and replaced with a new version.
> >>>>
> >>>>
> >>>> At Apache projects that are lacking resources, energy and support go
> through a
> >>>> similar process: Usually they get moved into the Attic – which means
> that
> >>>> mailing lists are closed though archives remain searchable, bug
> trackers are
> >>>> marked as read only. Honestly as a project founder my personal goal
> for Mahout
> >>>> always was to build a sustainable community that would survive core
> people
> >>>> having less time for the project at some point in time. I'd be
> distressed to
> >>>> see Mahout go to the Attic.
> >>>>
> >>>>
> >>>> If you are an active Mahout user and want to help – what can you do?
> >>>>
> >>>>
> >>>> At the current point Mahout doesn't need any new algorithms (though
> high
> >>>> quality contributions that come with people maintaining them within
> the
> >>>> project are of course welcome). What the project needs is much
> simpler even
> >>>> for beginners:
> >>>>
> >>>>
> >>>> - help answering mails on both dev and user list
> >>>>
> >>>> - help reviewing patches that come in: Having another contributor say
> “yes,
> >>>> this looks valuable and correct” can be a big help for committers –
> and can be
> >>>> the first step for you to become one yourself.
> >>>>
> >>>> - help with documentation – both for developers and users of the
> project.
> >>>>
> >>>> - help with structuring documentation to make it easier for others to
> find the
> >>>> relevant information.
> >>>>
> >>>> - help with making our build faster and easier: There are a few quick
> wins in
> >>>> terms of long running unit tests, there certainly are areas that lack
> testing.
> >>>>
> >>>> - help with code cleanup – there are areas that do not adhere to our
> coding
> >>>> conventions (standard Java, but with two spaces for indentation) –
> make
> >>>> changes in small batches
> >>>>
> >>>> - help with optimising existing implementations
> >>>>
> >>>> - if you truly believe that your algorithm or implementation is
> faster: Be
> >>>> bold. Prove that it really is faster for all relevant use cases and
> work with
> >>>> the community to replace existing code with your optimised version.
> >>>>
> >>>>
> >>>> Also help with what areas you are using and what exactly you see
> missing is
> >>>> welcome.
> >>>>
> >>>>
> >>>> It would be awesome to see Mahout gain activity. But in order to
> achieve that
> >>>> the project really does need your help.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Isabel
> >>>>
> >>>>
> >>>> [1] <
> https://cwiki.apache.org/confluence/display/MAHOUT/Monthly+Progress>
> >>>
> >>> --------------------------------------------
> >>> Grant Ingersoll | @gsingers
> >>> http://www.lucidworks.com
> >
> > --------------------------------------------
> > Grant Ingersoll | @gsingers
> > http://www.lucidworks.com
> >
> >
> >
> >
> >
>

Reply via email to