only 6 issues mostly recent tickets including the things on the list no old bugs so once people see that, we’ll have a common measure of doneness
• MAHOUT-1648 Update Mahout's CMS for 0.10.0 • MAHOUT-1647 The release build is incomplete • MAHOUT-1638 H2O bindings fail at drmParallelizeWithRowLabels(...) • MAHOUT-1586 Downloads must have hashes • MAHOUT-1522 Handle logging levels via log4j.xml • MAHOUT-1512 Hadoop 2 compatibility On Mar 18, 2015, at 9:03 AM, Andrew Palumbo <ap....@outlook.com> wrote: Yeah makes sense- i don't think there are any Blocker legacy issues at the moment. On 03/18/2015 11:56 AM, Andrew Musselman wrote: > Yep > > On Wednesday, March 18, 2015, Andrew Palumbo <ap....@outlook.com> wrote: > >> Andrew- by the first block and second do you mean 1,2,3 for 0.10 and 3,4 >> for 0.10.1? >> >> On 03/17/2015 08:26 PM, Shannon Quinn wrote: >> >>> +1 >>> >>> On 3/17/15 8:19 PM, Andrew Musselman wrote: >>> >>>> How about 0.10 is the first block and 0.10.1 is the second? >>>> >>>> On Wed, Mar 18, 2015 at 1:12 AM, Andrew Palumbo <ap....@outlook.com> >>>> wrote: >>>> >>>> I like this timeline... though mid April is coming up quickly.. Going >>>>> back >>>>> to Pat's list for 0.10.0: >>>>> >>>>> 1) refactor mrlegacy out of scala deps. >>>>> >>>>>> 2) build fixes for release. >>>>>> 3) docs — might be good to guinea-pig the new CMS with git pubsub so we >>>>>> don’t have to do svn, not sure when that will be ready >>>>>> >>>>>> I would add: >>>>> 4) Fix any remaining legacy bugs. >>>>> >>>>>> 5) docs, docs, docs >>>>>> >>>>>> along with just some general cleanup. >>>>> Is anything else missing? >>>>> >>>>> >>>>> >>>>> >>>>> On 03/17/2015 07:16 PM, Andrew Musselman wrote: >>>>> >>>>> I'm good with that timing pending scope.. >>>>>> On Wed, Mar 18, 2015 at 12:13 AM, Dmitriy Lyubimov <dlie...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> i was thinking 0.10.0 mid-april, update 0.10.1 end of spring. >>>>>> >>>>>>> i would suggest feature extraction topics for 0.11.x. Esp. w.r.t. >>>>>>> SchemaRDD aka DataFrame -- vectorizing, hashing, ML schema support, >>>>>>> imputation of missing data, outlier cleanups etc. There's a lot. >>>>>>> >>>>>>> Hardware backs integration -- i will certainly be looking at those, >>>>>>> but perhaps the easiest is to start with automatic detection and >>>>>>> configuration of capabilities via netlib, since it is already in the >>>>>>> path and it seems likely that it will (eventually) support cuda as >>>>>>> well in some form. This is for 0.11 or 0.12.x, depends on >>>>>>> availability. >>>>>>> >>>>>>> Higher order methods are somewhat a matter of inspiration. I think i >>>>>>> could offer some stuff there too as I already have implemented a lot >>>>>>> of those on top of Mahout before. I did bayesian optimization (aka >>>>>>> "spearmint", GP-EI etc.) on Mahout algebra, line search, (L)bfgs, >>>>>>> stats including Gaussian Process support. BFGS and line search are >>>>>>> fairly simple methods and i will give a reference if anybody is >>>>>>> interested. also, breeze also has line search with strong wolfe >>>>>>> conditions (if a coded reference is needed). All that is up for grabs >>>>>>> as a fairly well understood subject. >>>>>>> >>>>>>> (5-6 months out) Once GP-EI is available, it becomes a fairly >>>>>>> interesting topic to resurrect implicit feedback issue. Important >>>>>>> insight there is that in fact feature incoding can be done by a custom >>>>>>> scheme (not necessarily using encoding schme done in paper; in fact, >>>>>>> there are 2 of them there; or the way mllib encodes that as well). >>>>>>> once custom encoding schemes are adjusted, using bayesian optimization >>>>>>> is increasingly important, especially if there are more than just 2 >>>>>>> parameters there. >>>>>>> >>>>>>> >>>>>>>