+1 and agree I might have a little longer off ramp for the old style. I don't see a strong need to completely revamp the map-reduce based code. Nor is the legacy stuff around the preference database worth salvaging.
It cannot reasonably argued that usage is low and declining while simultaneously saying that perpetual support of old code is required. Sent from my iPhone > On Apr 7, 2014, at 4:08, Suneel Marthi <[email protected]> wrote: > > +1 and agree with ssc's suggestion. > > > > Sent from my iPhone > >> On Apr 7, 2014, at 3:30 AM, Sebastian Schelter <[email protected]> wrote: >> >> I agree that the state of the MR code is something that needs to be >> addressed. There have been several attempts to rework/refactor it, but none >> of them had a satisfactory result unfortunately. >> >> I'm hearing that there is lack for a coherent vision for the future of >> Mahout. Let me suggest a radical one. >> >> - call the next release 0.10 not 1.0, as the latter implies a maturity which >> does not reflect the radical changes I'm proposing >> >> - move all the MR code to a new maven module, deprecate it and announce that >> we delete it in the release after 0.11 >> >> - make the new DSL the heart of Mahout, aim for the following algorithms to >> be implemented in the DSL as a new basis: >> >> Collaborative Filtering: >> >> * Cooccurrence-based recommender (work started in MAHOUT-1464) >> * ALS (work started in MAHOUT-1365) >> >> Clustering: >> >> * k-Means >> * Streaming k-Means >> >> Classification: >> >> * NaiveBayes (work started in MAHOUT-1493) >> * either Random Forests or an ensemble of SGD classifiers >> >> Dimensionality Reduction / Topic Models >> >> * SSVD (prototype in trunk) >> * PCA (prototype in trunk) >> * LDA >> >> >> - integrate Stratosphere / h20 as follows: >> >> * the Stratosphere guys can choose to implement the physical operators of >> the DSL to make our algos run on Stratosphere. If they do, this is great for >> Mahout as it allows people to run code on different backends. If they don't, >> we don't lose anything. >> >> * a major point in porting the algorithms to the DSL would be to make the >> input formats of all algorithms consistent. That would allow h20 to work off >> the same inputs the scala DSL. >> >> Let me know what you think. >> >> -s >> >> >> >> >> >>> On 04/06/2014 05:54 PM, Sean Owen wrote: >>> On Sun, Apr 6, 2014 at 4:16 PM, Andrew Musselman >>> <[email protected]> wrote: >>>> Seems to me there has been a renewed effort to eat our broccoli, along with >>>> the other ideas people have been bringing on board. >>>> >>>> What are you proposing to put in the board report? >>> >>> I have not seen significant activity to unify or update the existing >>> code. It's still the same different chunks with different styles, >>> input/output, distributed/not, etc. The doc updates look very >>> positive. To be fair the task of really addressing the technical debt >>> is very large, so even making said dent would be a lot of work. A >>> clean-slate reboot therefore actually seems like a good plan, but >>> that's another question... >>> >>> Concretely, in a board report, I personally would not agree with >>> representing the Spark or H2O work as an agreed future plan or >>> roadmap, right now. Being in the board report makes that impression, >>> as have recent articles/tweets I've seen, so it deserves care. That's >>> why I chimed in, maybe tilting at windmills. >>> >>> From where I sit with customers, the overall impression is negative >>> among those that have tried to use the code, and usage has gone from >>> few to almost none. I doubt my sample is so different from the whole >>> user population. Much of it is consistency/quality, but some of it's >>> just an interest in non-M/R frameworks. >>> >>> So, I think that current state and set of problems is far more >>> important to acknowledge in a board report than just mentioning some >>> future possibilities, and the latter was the impression I got of the >>> likely content. In fact, it makes the talk about large upcoming >>> possible changes make so much more sense. >>
