On Thu, Mar 13, 2014 at 1:09 PM, Dmitriy Lyubimov <[email protected]> wrote:

> > The proof is in the pudding, I think.  The 0xdata team think that they
> can
> knock out a Mahout matrix and vector data type pretty quickly.  They also
> think that the SSVD algorithm will follow from that pretty
> straightforwardly.
>
> It is not a problem to write the algorithm. As it happens, the algorithm is
> simple and is already written in formalisms. [2] The problem is (1) can it
> be translated via physical operator layer to yet-another-engine, and (2)
> why the heck do we need a new engine as a part of the project at all? Why
> not to include MR as well, after all, majority of our solvers are written
> for it specifically? If h2o  was in open space for some time, how will its
> embedment will help either H2o or Mahout?


I think that the proposal under discussion involves adding a dependency on
a maven released h2o artifact plus a contribution of Mahout translation
layers.  These layers would give a sub-class of Matrix (and Vector) which
allow direct control over life span across multiple jobs but would
otherwise behave like their in-memory counter-parts.

This means that the question of whether we need a new engine is moot.  It
isn't even the suggestion.

I think that it is critical for this proposal that the abstraction barrier
not be diffuse.  The current mess that our MR code finds itself in is
specifically due to the fact that you can't write actually isolate the
details of writing Hadoop map-reduce.  This is completely separate from the
performance issue.

Reply via email to