On Thu, Mar 13, 2014 at 2:10 AM, Sebastian Schelter <[email protected]> wrote:
> @All > > I have one big question regarding h2o (maybe SriSatish can help me with > that). I haven't been able to find a detailed writeup about the execution > model yet, but on first sight it seems like a big aggregation tree to me: > Data is partitioned, then operations are conducted independently on the > partitions (e.g. gradients are computed) and the outputs are aggregated > (e.g. summed up) and sent back to the individual machines. It also seems to > support a lightweight version of MapReduce. I think this approach is fine > for ML algorithms that can be efficiently formulated by the statistical > query model [2]. A lot of other algos like SSVD or Cooccurrence Analysis or > graph-based computations are hard to fit into this model however. > I worked through cooccurrence analysis including down-sampling with Cliff from 0xdata and he was able to show me pretty convincingly that h2o is able to do these computations. The proof is in the pudding, I think. The 0xdata team think that they can knock out a Mahout matrix and vector data type pretty quickly. They also think that the SSVD algorithm will follow from that pretty straightforwardly.
