Andrew- by the first block and second do you mean 1,2,3 for 0.10 and 3,4 for 0.10.1?

On 03/17/2015 08:26 PM, Shannon Quinn wrote:
+1

On 3/17/15 8:19 PM, Andrew Musselman wrote:
How about 0.10 is the first block and 0.10.1 is the second?

On Wed, Mar 18, 2015 at 1:12 AM, Andrew Palumbo <ap....@outlook.com> wrote:

I like this timeline... though mid April is coming up quickly.. Going back
to Pat's list for 0.10.0:

  1) refactor mrlegacy out of scala deps.
2) build fixes for release.
3) docs — might be good to guinea-pig the new CMS with git pubsub so we
don’t have to do svn, not sure when that will be ready

I would add:

  4) Fix any remaining legacy bugs.
5) docs, docs, docs

along with just some general cleanup.

Is anything else missing?




On 03/17/2015 07:16 PM, Andrew Musselman wrote:

I'm good with that timing pending scope..

On Wed, Mar 18, 2015 at 12:13 AM, Dmitriy Lyubimov <dlie...@gmail.com>
wrote:

  i was thinking 0.10.0 mid-april, update 0.10.1 end of spring.
   i would suggest feature extraction topics for 0.11.x. Esp. w.r.t.
SchemaRDD aka DataFrame -- vectorizing, hashing, ML schema support,
imputation of missing data, outlier cleanups etc. There's a lot.

Hardware backs integration -- i will certainly be looking at those,
but perhaps the easiest is to start with automatic detection and
configuration of capabilities via netlib, since it is already in the
path and it seems likely that it will (eventually) support cuda as
well in some form. This is for 0.11 or 0.12.x, depends on
availability.

Higher order methods are somewhat a matter of inspiration. I think i
could offer some stuff there too as I already have implemented a lot
of those on top of Mahout before. I did bayesian optimization (aka
"spearmint", GP-EI etc.) on Mahout algebra, line search, (L)bfgs,
stats including Gaussian Process support. BFGS and line search are
fairly simple methods and i will give a reference if anybody is
interested. also, breeze also has line search with strong wolfe
conditions (if a coded reference is needed). All that is up for grabs
as a fairly well understood subject.

(5-6 months out) Once GP-EI is available, it becomes a fairly
interesting topic to resurrect implicit feedback issue. Important
insight there is that in fact feature incoding can be done by a custom
scheme (not necessarily using encoding schme done in paper; in fact,
there are 2 of them there; or the way mllib encodes that as well).
once custom encoding schemes are adjusted, using bayesian optimization
is increasingly important, especially if there are more than just 2
parameters there.




Reply via email to