Me too. To answer the question: >> Ask yourself this: Is Mahout a sandbox for experimentation on cutting edge >> algorithms or is Mahout a scalable, performant ML library that is targeted >> for production environments?
I think it is important to clean up a lot of wiring and user experience issues and make it production-ready, and have the sandbox too. To make it more formal and try to prevent "sandbox creep" may mean putting new and experimental things into an internal incubator bucket wherever possible. > On Mar 8, 2014, at 7:19 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > very close to my position. > > >> On Sat, Mar 8, 2014 at 2:40 PM, Pat Ferrel <p...@occamsmachete.com> wrote: >> >> Ah, now back to freely babbling on the dev list. >> >> Mahout wishlist: >> 1) scaling: I don't get the need for R integration or running without >> hadoop or spark. You can run hadoop in local mode on your native file >> system even using a debugger--then run the exact same code on a cluster. If >> you don't care about scaling there are plenty of great libs for R already, >> why worry about Mahout? One project I worked on started with the in-memory >> recommender but within months had hopelessly outgrown it. If there isn't at >> least a path to scaling we would never have started with Mahout. >> Non-scalable code is fine and solves many applications but I hope it's not >> the primary design point. >> 2) speed: read below, Hadoop now (speed means buying more computers) More >> Spark later (buy less computers) >> 3) ease of data input/output. The conversion of external ids into Mahout >> sequential integers is deceptively difficult and has to be re-created with >> every project. I'm trying to submit an example, which includes an >> input/output pipeline that is mostly scalable. It takes delimited logfiles >> with external ids, creates Mahout input, then takes the output of Mahout >> and converts back to external Ids. It is not worthy of core inclusion but >> is at least a prototype or example of how to do this. >> >> My $0.02 worth about the future of Mahout: >> 1) the future will be in moving lots of the current code to Spark and that >> may not be the end of it. If yet another faster platform emerges Mahout >> will have to go there too. If Mahout doesn't move (pretty quickly) someone >> will fill the gap and Mahout will be left behind. >> 2) the future of Mahout is tied to big data, at least I hope so. >> >> Ask yourself this: Is Mahout a sandbox for experimentation on cutting edge >> algorithms or is Mahout a scalable, performant ML library that is targeted >> for production environments? >> >> I hope most people think it is the later. >> >>