Pat, sorry for offtop -- this code is actually about a year old at heart. I was using it to run some custom methods back in my company but I had to largely reshape it to fit Mahout once i got a permission to contribute. So this took a while, but the idea is certainly not new. At least parts of this code (e.g. drm serialization) used to run something real at some point. Actually initial materialization of this code predates MLI talks that i was referring to (at least when i first heard of MLI). Unfortunately our experiments with big data solvers currently nowhere close to production due to product priorities -- so that was in part why i said, well, let's at least make it public if we don't use it.
But you can potentially develop this idea to further optimize and support basic data frame operators as well, all while independent of the back. Unfortunately, the back has to pass certain programming model maturity test, right now that would be Spark, Stratosphere and other Flume-java-like models, but i don't think 0xdata in particular, as it stands, passes it. Another thing is (also used at our office) you can simply write it as a driver-script and run in a scala shell akin to R. The next step would be fire up developers to wright algorithms, I think R is closing now on about 5,000 packages. I probably will not miss the truth here by much by saying this is exactly because of it being ML environment (and certainly not because of its performance -- R is notoriously slow). On Fri, Mar 14, 2014 at 3:39 PM, Pat Ferrel <[email protected]> wrote: > Cool, I'm super excited to see RSJ on Spark integrated into the mainline > with Dimitriy's work. I really really hope that it is seen as important > and doesn't get stalled by committers being demotivated. I had no idea that > what I consider the heart of Mahout was so close to being real on Spark. > > I'm also happy to hear that you are full speed ahead for this Spark work. > I obviously got the wrong impression. > > As to "new contributors who have some interesting capabilities" great, as > long as it doesn't end up defocusing people. Old committers are naturally > going to wonder where to put their efforts with this proposal. Some may > just give up until the dust settles. I'm sure we can agree that that would > not be good. > > The question of roadmap is, more than ever, up for discussion. I would > just plead one last time that Spark work not be stalled while this is > worked out. > > On Mar 14, 2014, at 1:00 PM, Ted Dunning <[email protected]> wrote: > > > Pat > > I am not suggesting that we walk away from anything. > > I am suggesting that we welcome new contributors who have some interesting > capabilities. > > I also suggest that those efforts should be made to work well with > existing efforts. > > Sent from my iPhone > > > On Mar 14, 2014, at 10:58, Pat Ferrel <[email protected]> wrote: > > > > I think people (including me) have underestimated how much you and > Sebastian have done on Spark. Realistically it sounds like we are talking > about walking away from that in favor of an unknown. > > > > 0xdata's community has not been solving the problems I care about. You > guys have. > >
