First steps towards the "loving care" (in my view) :- a) Address the issues that Sean's brought up. I wasn't aware of (i) in that list else I would have ensured that they were addressed in 0.9.
b) Most of the backlog JIRAs (about 28 of them today) go all the way back to the initial stages of Mahout's evolution (pre 0.5). Some of them may just have to be closed and resolved as "Will not do" or "Times Immemorial". c) Fix algorithms that presently have half-baked code in them like Naive Bayes classifier (why is the thetaSummer commented out - either we don't need it or does it need fixing?), Streaming KMeans - lacks adequate test coverage and still fails along the different paths and the same goes for other clustering algorithms too. On Friday, February 28, 2014 3:30 PM, Andrew Musselman <andrew.mussel...@gmail.com> wrote: > > > > > To be constructive, here are four items that seem more important for > > something like "1.0.0" and are even a lot less work: > > > > - Use Hadoop .mapreduce API consistently > > - Standardize input output formats of all jobs > > - Remove use of deprecated code > > - Clear even a third of the open JIRA backlog > > > > Like i said, i believe the future is in moving ahead, build on strengths > and finding unique proposition. I agree with the above in a sense that > out-of-core stuff that runs over MR could use some unification. I know you > have done a lot in that department and I assume since you are writing to > dev list, you are looking to help with that going forward. Cause if not... > the dev lists are not exactly created to be an open forum for just giving > lectures. > Can we agree that before we put an integer version on Mahout that it needs some tender-loving care, and that we can still have high hopes?