Re: Mahout 1.0 goals

Andrew Musselman Sat, 08 Mar 2014 20:00:07 -0800

Me too.

To answer the question:
>> Ask yourself this: Is Mahout a sandbox for experimentation on cutting edge
>> algorithms or is Mahout a scalable, performant ML library that is targeted
>> for production environments?



I think it is important to clean up a lot of wiring and user experience issues 
and make it production-ready, and have the sandbox too.

To make it more formal and try to prevent "sandbox creep" may mean putting new 
and experimental things into an internal incubator bucket wherever possible.

> On Mar 8, 2014, at 7:19 PM, Dmitriy Lyubimov <[email protected]> wrote:
> 
> very close to my position.
> 
> 
>> On Sat, Mar 8, 2014 at 2:40 PM, Pat Ferrel <[email protected]> wrote:
>> 
>> Ah, now back to freely babbling on the dev list.
>> 
>> Mahout wishlist:
>> 1) scaling:  I don't get the need for R integration or running without
>> hadoop or spark. You can run hadoop in local mode on your native file
>> system even using a debugger--then run the exact same code on a cluster. If
>> you don't care about scaling there are plenty of great libs for R already,
>> why worry about Mahout? One project I worked on started with the in-memory
>> recommender but within months had hopelessly outgrown it. If there isn't at
>> least a path to scaling we would never have started with Mahout.
>> Non-scalable code is fine and solves many applications but I hope it's not
>> the primary design point.
>> 2) speed: read below, Hadoop now (speed means buying more computers) More
>> Spark later (buy less computers)
>> 3) ease of data input/output. The conversion of external ids into Mahout
>> sequential integers is deceptively difficult and has to be re-created with
>> every project. I'm trying to submit an example, which includes an
>> input/output pipeline that is mostly scalable. It takes delimited logfiles
>> with external ids, creates Mahout input, then takes the output of Mahout
>> and converts back to external Ids. It is not worthy of core inclusion but
>> is at least a prototype or example of how to do this.
>> 
>> My $0.02 worth about the future of Mahout:
>> 1) the future will be in moving lots of the current code to Spark and that
>> may not be the end of it. If yet another faster platform emerges Mahout
>> will have to go there too. If Mahout doesn't move (pretty quickly) someone
>> will fill the gap and Mahout will be left behind.
>> 2) the future of Mahout is tied to big data, at least I hope so.
>> 
>> Ask yourself this: Is Mahout a sandbox for experimentation on cutting edge
>> algorithms or is Mahout a scalable, performant ML library that is targeted
>> for production environments?
>> 
>> I hope most people think it is the later.
>> 
>>

Re: Mahout 1.0 goals

Reply via email to