Re: 0xdata interested in contributing

Ted Dunning Thu, 13 Mar 2014 13:19:55 -0700

On Thu, Mar 13, 2014 at 1:09 PM, Dmitriy Lyubimov <[email protected]> wrote:


> > with numerical computing in Mahout because the problems are different.
>  (To
> > my mind, the key problems for numerical computing include:
> >
> > a) efficient, very fine-grained parallelism (think microseconds)
> >
> > b) efficient in-memory mutable storage
>
>
> > c) no serialization of data between steps
> >
>
>
> > These problems are not even addressed by most data-flow architectures
> (...)
>
>
> -1 . b) and c) directly addressed by Spark and Stratosphere. all partitions
> are mutable not only between fused operands, but also between different
> pipelines  if you instruct it do do so. There's no  deserialization
> happening if physical operator instructs the block manager to do so (and as
> it happens that's exactly what it instructs to do by default). My
> Implementation of say elementwise A*B or 5.0 *A is a mutable fused operand
> that directly update matrix blocks. Reduce function looks like e.g.
> reduceFunc = (a, b) => a *= b here (retaining modified a matrix block).[1]
> Yes,
> the block manager then slaps the blocks whith a new RDD id once the fused
> sequence is finsihed, but they are not going anywhere and de-facto operand
> is mutable.  Are you sure you are familiar with the basics of these
> engines?
>

I actually pretty sure that I am not as familiar as I need to be.

At the same time, I am pretty sure that there is no direct support for
fine-grained parallelism of the sort that h2o supports and I am pretty sure
that there is no current code for keeping compressed forms of matrices that
has comparable efficiency to the h2o code.

The fine grained parallelism in h2o is done by capitalizing on the inherent
capabilities of the JVM and by supporting a fork/join style which (insofar
as I know) is fairly different from what Spark does.

Re: 0xdata interested in contributing

Reply via email to