The last I heard from Ted he was proposing new committers who had lots to contribute. I did not hear from him that this would mean h20 in the roadmap for all or Mahout.
This then leaves the question; *What IS in the roadmap?* I can only vote from relative technical ignorance but with some knowledge of things like momentum, number of users and committers, and buzz in the industry. Any time I’ve had to evaluate a piece of OSS for inclusion in a project these were some of the important things we looked at. I would then place one user (non-committer) vote that Spark be #1 on the roadmap. As a user it’s Spark I’ll be looking at for other problems I need to solve, at least for now. I’m not on the PMC so can’t really issue a call to action. Who will? Does anyone think Mahout is not in need of a rallying call? On Mar 16, 2014, at 9:17 AM, Pat Ferrel <[email protected]> wrote: So your Mahout DRM work was targeted for production at your company and was working well but other parts of the project fell through and it didn’t get deployed. Some of it is almost a year old and pretty mature. --This is very good news. You are also saying that the integration model you used for Spark would probably mostly work for other solver frameworks like Stratosphere but it doesn’t look appropriate for h2o. —Good to know Your last point is that speed is not so much a deciding factor as other less tangible things. Your example is R which has 5000 packages and counting but is notoriously slow. By that I assume you are saying a speed comparison is not nearly as important as other factors, most of which have to do with attracting the largest community of users and contributors. —Here we agree for sure. Getting a faster regression or random forest implementation (as long as it takes Mahout formats as input) is great. But if it implies that committers move to the platform (h2o) used in these implementations then someone must make a case for why it’s in the roadmap. On Mar 14, 2014, at 3:55 PM, Dmitriy Lyubimov <[email protected]> wrote: Pat, sorry for offtop -- this code is actually about a year old at heart. I was using it to run some custom methods back in my company but I had to largely reshape it to fit Mahout once i got a permission to contribute. So this took a while, but the idea is certainly not new. At least parts of this code (e.g. drm serialization) used to run something real at some point. Actually initial materialization of this code predates MLI talks that i was referring to (at least when i first heard of MLI). Unfortunately our experiments with big data solvers currently nowhere close to production due to product priorities -- so that was in part why i said, well, let's at least make it public if we don't use it. But you can potentially develop this idea to further optimize and support basic data frame operators as well, all while independent of the back. Unfortunately, the back has to pass certain programming model maturity test, right now that would be Spark, Stratosphere and other Flume-java-like models, but i don't think 0xdata in particular, as it stands, passes it. Another thing is (also used at our office) you can simply write it as a driver-script and run in a scala shell akin to R. The next step would be fire up developers to wright algorithms, I think R is closing now on about 5,000 packages. I probably will not miss the truth here by much by saying this is exactly because of it being ML environment (and certainly not because of its performance -- R is notoriously slow). On Fri, Mar 14, 2014 at 3:39 PM, Pat Ferrel <[email protected]> wrote: > Cool, I'm super excited to see RSJ on Spark integrated into the mainline > with Dimitriy's work. I really really hope that it is seen as important > and doesn't get stalled by committers being demotivated. I had no idea that > what I consider the heart of Mahout was so close to being real on Spark. > > I'm also happy to hear that you are full speed ahead for this Spark work. > I obviously got the wrong impression. > > As to "new contributors who have some interesting capabilities" great, as > long as it doesn't end up defocusing people. Old committers are naturally > going to wonder where to put their efforts with this proposal. Some may > just give up until the dust settles. I'm sure we can agree that that would > not be good. > > The question of roadmap is, more than ever, up for discussion. I would > just plead one last time that Spark work not be stalled while this is > worked out. > > On Mar 14, 2014, at 1:00 PM, Ted Dunning <[email protected]> wrote: > > > Pat > > I am not suggesting that we walk away from anything. > > I am suggesting that we welcome new contributors who have some interesting > capabilities. > > I also suggest that those efforts should be made to work well with > existing efforts. > > Sent from my iPhone > >> On Mar 14, 2014, at 10:58, Pat Ferrel <[email protected]> wrote: >> >> I think people (including me) have underestimated how much you and > Sebastian have done on Spark. Realistically it sounds like we are talking > about walking away from that in favor of an unknown. >> >> 0xdata's community has not been solving the problems I care about. You > guys have. > >
