I'm good with that timing pending scope.. On Wed, Mar 18, 2015 at 12:13 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> i was thinking 0.10.0 mid-april, update 0.10.1 end of spring. > > i would suggest feature extraction topics for 0.11.x. Esp. w.r.t. > SchemaRDD aka DataFrame -- vectorizing, hashing, ML schema support, > imputation of missing data, outlier cleanups etc. There's a lot. > > Hardware backs integration -- i will certainly be looking at those, > but perhaps the easiest is to start with automatic detection and > configuration of capabilities via netlib, since it is already in the > path and it seems likely that it will (eventually) support cuda as > well in some form. This is for 0.11 or 0.12.x, depends on > availability. > > Higher order methods are somewhat a matter of inspiration. I think i > could offer some stuff there too as I already have implemented a lot > of those on top of Mahout before. I did bayesian optimization (aka > "spearmint", GP-EI etc.) on Mahout algebra, line search, (L)bfgs, > stats including Gaussian Process support. BFGS and line search are > fairly simple methods and i will give a reference if anybody is > interested. also, breeze also has line search with strong wolfe > conditions (if a coded reference is needed). All that is up for grabs > as a fairly well understood subject. > > (5-6 months out) Once GP-EI is available, it becomes a fairly > interesting topic to resurrect implicit feedback issue. Important > insight there is that in fact feature incoding can be done by a custom > scheme (not necessarily using encoding schme done in paper; in fact, > there are 2 of them there; or the way mllib encodes that as well). > once custom encoding schemes are adjusted, using bayesian optimization > is increasingly important, especially if there are more than just 2 > parameters there. >