On Thu, Mar 13, 2014 at 2:10 AM, Sebastian Schelter <[email protected]> wrote:

> @All
>
> I have one big question regarding h2o (maybe SriSatish can help me with
> that). I haven't been able to find a detailed writeup about the execution
> model yet, but on first sight it seems like a big aggregation tree to me:
> Data is partitioned, then operations are conducted independently on the
> partitions (e.g. gradients are computed) and the outputs are aggregated
> (e.g. summed up) and sent back to the individual machines. It also seems to
> support a lightweight version of MapReduce. I think this approach is fine
> for ML algorithms that can be efficiently formulated by the statistical
> query model [2]. A lot of other algos like SSVD or Cooccurrence Analysis or
> graph-based computations are hard to fit into this model however.
>

I worked through cooccurrence analysis including down-sampling with Cliff
from 0xdata and he was able to show me pretty convincingly that h2o is able
to do these computations.

The proof is in the pudding, I think.  The 0xdata team think that they can
knock out a Mahout matrix and vector data type pretty quickly.  They also
think that the SSVD algorithm will follow from that pretty
straightforwardly.

Reply via email to