To get things moving for 1.0:
a) Address the 4 issues that Sean had raised - we have already started looking at Backlog and closing them, started looking at converting old MapReduce to newer MapReduce API. If someone could start looking at standardizing the input/output formats across classifiers, clustering and recommenders that would be great. Guess Frank S. has already started work in that direction. b) Need a better and cleaner serialized form of Vectors to handle names and other kind'a stuff, this is gonna impact everything that's presently implemented. c) Agree with ssc, to start looking at Spark-Mahout integration. d) Need volunteers to QA/address issues with the present classifiers/clustering algorithms. I personally can vouch for how disastrous it is to deploy any of Mahout's classifiers/clustering implementations in an Operations environment. A good example of that is Sean's recent patch for RDF. Naive Bayes code as it is now seems half-baked and is incomplete. Not every code path has been tested on Streaming KMeans. This should go some way in addressing the technical debt that's been piled over the years. On Monday, March 3, 2014 1:05 PM, Sebastian Schelter <s...@apache.org> wrote: I would like to discuss whether we should start to have some Spark-related code in Mahout. --sebastian On 03/03/2014 06:56 PM, Suneel Marthi wrote: > Grant had setup a Google Hangout for Mahout sometime last year before 0.8 > release. I had one setup too for 0.9 release. I definitely wouldn't want to > have a hangout on Saturday or weekend. > > > > > > On Monday, March 3, 2014 12:52 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > > Happy to organize a google hangout. That has the advantage of allowing more > attendees and supporting YouTube archiving. > > Sent from my iPhone > > >> On Mar 3, 2014, at 9:34, Giorgio Zoppi <giorgio.zo...@gmail.com> wrote: >> >> Hello All, >> Dr.Dunning could you set a meeting next Sat morning, so we can chat and >> discuss by skype improvements and what to do and indentify volunteer and >> tasks. >> Best Regards, >> Giorgio >> >> >> 2014-03-03 18:30 GMT+01:00 peng <pc...@uowmail.edu.au>: >> >>> Me three >>> >>> >>>> On Sun 02 Mar 2014 11:45:33 AM EST, Ted Dunning wrote: >>>> >>>> Ravi, >>>> >>>> Good points. >>>> >>>> On Sun, Mar 2, 2014 at 12:38 AM, Ravi Mummulla <ravi.mummu...@gmail.com> >>>> wrote: >>>> >>>> - Natively support Windows (guidance, etc. No documentation exists today, >>>>> for instance) >>>> There is a bit of demand for that. >>>> >>>> - Faster time to first application (from discovery to first application >>>> >>>>> currently takes a non-trivial amount of effort; how can we lower the bar >>>>> and reduce the friction for adoption?) >>>> There is huge evidence that this is important. >>>> >>>> >>>> - Better documenting use cases with working samples/examples >>>>> (Documentation >>>>> on https://mahout.apache.org/users/basics/algorithms.html is spread out >>>>> and >>>>> there is too much focus on algorithms as opposed to use cases - this is >>>>> an >>>>> adoption blocker) >>>> This is also important. >>>> >>>> >>>> - Uniformity of the API set across all algorithms (are we providing the >>>>> same experience across all APIs?) >>>> And many people have been tripped up by this. >>>> >>>> >>>> - Measuring/publishing scalability metrics of various algorithms (why >>>>> would >>>>> we want users to adopt Mahout vs. other frameworks for ML at scale?) >>>> I don't see this as important as some of your other points, but is still >>>> useful. >> >> >> -- >> Quiero ser el rayo de sol que cada día te despierta >> para hacerte respirar y vivir en me. >> "Favola -Moda".