Yep On Wednesday, March 18, 2015, Andrew Palumbo <ap....@outlook.com> wrote:
> Andrew- by the first block and second do you mean 1,2,3 for 0.10 and 3,4 > for 0.10.1? > > On 03/17/2015 08:26 PM, Shannon Quinn wrote: > >> +1 >> >> On 3/17/15 8:19 PM, Andrew Musselman wrote: >> >>> How about 0.10 is the first block and 0.10.1 is the second? >>> >>> On Wed, Mar 18, 2015 at 1:12 AM, Andrew Palumbo <ap....@outlook.com> >>> wrote: >>> >>> I like this timeline... though mid April is coming up quickly.. Going >>>> back >>>> to Pat's list for 0.10.0: >>>> >>>> 1) refactor mrlegacy out of scala deps. >>>> >>>>> 2) build fixes for release. >>>>> 3) docs — might be good to guinea-pig the new CMS with git pubsub so we >>>>> don’t have to do svn, not sure when that will be ready >>>>> >>>>> I would add: >>>> >>>> 4) Fix any remaining legacy bugs. >>>> >>>>> 5) docs, docs, docs >>>>> >>>>> along with just some general cleanup. >>>> >>>> Is anything else missing? >>>> >>>> >>>> >>>> >>>> On 03/17/2015 07:16 PM, Andrew Musselman wrote: >>>> >>>> I'm good with that timing pending scope.. >>>>> >>>>> On Wed, Mar 18, 2015 at 12:13 AM, Dmitriy Lyubimov <dlie...@gmail.com> >>>>> wrote: >>>>> >>>>> i was thinking 0.10.0 mid-april, update 0.10.1 end of spring. >>>>> >>>>>> i would suggest feature extraction topics for 0.11.x. Esp. w.r.t. >>>>>> SchemaRDD aka DataFrame -- vectorizing, hashing, ML schema support, >>>>>> imputation of missing data, outlier cleanups etc. There's a lot. >>>>>> >>>>>> Hardware backs integration -- i will certainly be looking at those, >>>>>> but perhaps the easiest is to start with automatic detection and >>>>>> configuration of capabilities via netlib, since it is already in the >>>>>> path and it seems likely that it will (eventually) support cuda as >>>>>> well in some form. This is for 0.11 or 0.12.x, depends on >>>>>> availability. >>>>>> >>>>>> Higher order methods are somewhat a matter of inspiration. I think i >>>>>> could offer some stuff there too as I already have implemented a lot >>>>>> of those on top of Mahout before. I did bayesian optimization (aka >>>>>> "spearmint", GP-EI etc.) on Mahout algebra, line search, (L)bfgs, >>>>>> stats including Gaussian Process support. BFGS and line search are >>>>>> fairly simple methods and i will give a reference if anybody is >>>>>> interested. also, breeze also has line search with strong wolfe >>>>>> conditions (if a coded reference is needed). All that is up for grabs >>>>>> as a fairly well understood subject. >>>>>> >>>>>> (5-6 months out) Once GP-EI is available, it becomes a fairly >>>>>> interesting topic to resurrect implicit feedback issue. Important >>>>>> insight there is that in fact feature incoding can be done by a custom >>>>>> scheme (not necessarily using encoding schme done in paper; in fact, >>>>>> there are 2 of them there; or the way mllib encodes that as well). >>>>>> once custom encoding schemes are adjusted, using bayesian optimization >>>>>> is increasingly important, especially if there are more than just 2 >>>>>> parameters there. >>>>>> >>>>>> >>>>>> >> >