very close to my position.
On Sat, Mar 8, 2014 at 2:40 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > Ah, now back to freely babbling on the dev list. > > Mahout wishlist: > 1) scaling: I don't get the need for R integration or running without > hadoop or spark. You can run hadoop in local mode on your native file > system even using a debugger--then run the exact same code on a cluster. If > you don't care about scaling there are plenty of great libs for R already, > why worry about Mahout? One project I worked on started with the in-memory > recommender but within months had hopelessly outgrown it. If there isn't at > least a path to scaling we would never have started with Mahout. > Non-scalable code is fine and solves many applications but I hope it's not > the primary design point. > 2) speed: read below, Hadoop now (speed means buying more computers) More > Spark later (buy less computers) > 3) ease of data input/output. The conversion of external ids into Mahout > sequential integers is deceptively difficult and has to be re-created with > every project. I'm trying to submit an example, which includes an > input/output pipeline that is mostly scalable. It takes delimited logfiles > with external ids, creates Mahout input, then takes the output of Mahout > and converts back to external Ids. It is not worthy of core inclusion but > is at least a prototype or example of how to do this. > > My $0.02 worth about the future of Mahout: > 1) the future will be in moving lots of the current code to Spark and that > may not be the end of it. If yet another faster platform emerges Mahout > will have to go there too. If Mahout doesn't move (pretty quickly) someone > will fill the gap and Mahout will be left behind. > 2) the future of Mahout is tied to big data, at least I hope so. > > Ask yourself this: Is Mahout a sandbox for experimentation on cutting edge > algorithms or is Mahout a scalable, performant ML library that is targeted > for production environments? > > I hope most people think it is the later. > >