Re: What's the plan for Mahout?

Grant Ingersoll Fri, 04 Sep 2009 14:04:51 -0700

First off, thanks for bringing this up!

On Sep 4, 2009, at 9:13 AM, Sean Owen wrote:

Guys, quick and broad question -- what's the roadmap for Mahout look
like? Even just for the next two releases?

I asked a little while back about this. I think we can put out 0.2out after Robin and Deneche get their pieces in (Random Forests,classification refactoring), which hopefully should be soon since theyare now committers.

We've cleaned up a lot and made a number of improvements in the codesince 0.1, it would be good to get them out to a broader audience.

After that, I don't think we particularly have to go through all the0.X (0.3, 0.4, ...) integers on our way to 1.0. The primary goalbefore 1.0 is to make sure we are happy with the APIs before (to someextent) "locking them down" for 1.0, but I'm not sure we need to bethat worried about locking down, since most of our code isn't publicAPIs anyway and we need not necessarily worry about backcompatibility. I think the other primary thing we need is to get somelarger scale testing in place. I believe Amazon still has in placeit's committers program such that committers can get access to EC2credits for testing. Let me know if anyone needs an account.


Now, much of the project is mostly a space for tinkering, tossing
around bits of code for now, and that's OK for 0.1 or 0.2. I just
wonder what the path to a proper finished product is like. It'll take
some agreement on who exactly the audience is, what they need and
don't need, what interface it presents to those users. It takes work
to design for that, bring the project into line around that design,
document and test, etc. And -- it takes people with responsibility and
authority to make it happen.

I think what we have now goes beyond tinkering, but yes, we areexploring what works and what doesn't. We've got several activecommitters and some active contributors, which are all good signs andwe actually have a pretty healthy base of mailing list subscriberslurking. We also have users coming in and kicking the tires, we needto capture their needs and keep them interested by responding quicklyand in a helpful way. We also need to find a way to pull the lurkersout to help by providing an ever more compelling story.

Open source is always incremental and it takes time to build. Itreally is never done and I find O/S is often much more fluid thanproducts.


I'm not clear we quite have those things yet. Until we do this will be
an 0.x project that nobody can really get into using for production.
It doesn't have to happen tomorrow, but, what's our path like from
here to there? Spare time from even 10 people won't get the docs
written, tidy the code, refactor / redesign / unify the lot of
copy/paste that's going on, etc. People definitely have ideas about
what the project should do -- I see lots of little bits of
functionality being thrown into the pot. But is it adding up to
something consistent and coherent? should we talk seriously about it?
"Machine learning" is too broad a remit.

I think we are getting there. Some of the answer is above in thefirst part where I talk about releases. I do think the bits areadding up to real machine learning functionality. We've got utilitiesin place for getting data into formats that are consumable, we've gotimplementations that consume those formats and produce outputs. Moreexamples, etc. will always help and of course documentation.

It took Lucene 6+ years to reach what I would call a really capablesystem. The early stages were promising and worked for many, butit was not until 2004-05 that it really started taking off. Notsaying that Mahout will take that long, especially given how widelyadopted/accepted Open Source is now as compared to the early days ofLucene, but it does take time. That being said, we certainly need toget more people looking at more parts of the code and proposing andimplementing improvements.


It's not ruining my day or anything but I'm sitting on a piece of the
project that I put effort into making clearly do a few things, do them
well, and not try to do other things, designed for practical use
cases, and documented and polished and tested it. So I'll be a little
concerned if it's attached to an early-0.x tinkering project this time
next year. That's not cool for an Apache project anyway.

Agreed. Let's get what is marked for 0.2 done and look to releasesoon thereafter (mid October?) From there, I likely guess we could doa 0.3 (or even 0.9) in the early Jan.- March time frame and then lookto make a 1.0 in early Summer. People contributing and pushing canobviously push this up. Our job as the committers is to make sure, tosome extent, that their efforts don't go wasted.


It may be presumptuous but I volunteer to try to lead answers to these
questions. It's going to lead to some tough answers and more work in
some cases, no matter who drives it. Hoping to do it sooner than
later.

Not at all presumptuous. This is in fact how it works at Apache.Right or wrong, those who do get to make the decisions. That's howthe meritocracy works. I personally am committed and I know severalothers are (obviously, including you) as well.


-Grant

Re: What's the plan for Mahout?

Reply via email to