Re: Mahout 1.0 goals

Dmitriy Lyubimov Sat, 08 Mar 2014 19:20:06 -0800

very close to my position.


On Sat, Mar 8, 2014 at 2:40 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> Ah, now back to freely babbling on the dev list.
>
> Mahout wishlist:
> 1) scaling:  I don't get the need for R integration or running without
> hadoop or spark. You can run hadoop in local mode on your native file
> system even using a debugger--then run the exact same code on a cluster. If
> you don't care about scaling there are plenty of great libs for R already,
> why worry about Mahout? One project I worked on started with the in-memory
> recommender but within months had hopelessly outgrown it. If there isn't at
> least a path to scaling we would never have started with Mahout.
>  Non-scalable code is fine and solves many applications but I hope it's not
> the primary design point.
> 2) speed: read below, Hadoop now (speed means buying more computers) More
> Spark later (buy less computers)
> 3) ease of data input/output. The conversion of external ids into Mahout
> sequential integers is deceptively difficult and has to be re-created with
> every project. I'm trying to submit an example, which includes an
> input/output pipeline that is mostly scalable. It takes delimited logfiles
> with external ids, creates Mahout input, then takes the output of Mahout
> and converts back to external Ids. It is not worthy of core inclusion but
> is at least a prototype or example of how to do this.
>
> My $0.02 worth about the future of Mahout:
> 1) the future will be in moving lots of the current code to Spark and that
> may not be the end of it. If yet another faster platform emerges Mahout
> will have to go there too. If Mahout doesn't move (pretty quickly) someone
> will fill the gap and Mahout will be left behind.
> 2) the future of Mahout is tied to big data, at least I hope so.
>
> Ask yourself this: Is Mahout a sandbox for experimentation on cutting edge
> algorithms or is Mahout a scalable, performant ML library that is targeted
> for production environments?
>
> I hope most people think it is the later.
>
>

Re: Mahout 1.0 goals

Reply via email to