Re MBrace: very interesting work. I'm a bit surprised though that the paper
makes no mention of DryadLINQ (
http://research.microsoft.com/en-us/projects/dryadlinq/dryadlinq.pdf).

Architecturally it's a lot easier to see an MBrace implementation
specialized to a MapReduce (or more generically, a BSP) computation, than
to have a Spark implement the fully async DAG model of an MBrace/Dryad
engine.

More practically, as interesting as it might be as a side effort, I think
for the core Spark effort to attempt something like that would be "off
mission". Spark's success to date has been more due to beautiful
implementation of a known architecture, than beautiful new architecture.
Basically, Spark does MapReduce 10-100x faster than Hadoop, and more people
by now understand how to get MapReduce to solve their problems than any
other parallel model. Spark sits natively on HDFS so that makes adoption a
lot easier to swallow. So at present, for Spark to mature quickly along
that successful trajectory, the key problems to address are more practical
"user interface" or "productivity" things like manageability,
deployability, fault-tolerance improvements, multi-user access, a bigger
library of pre-packaged algorithms, etc.

Whether MapReduce's own success is an accident of history or something more
fundamental is subject to interesting debate. I remember being constantly
amazed by the number of problems that when squinted at the right way
becomes an MR-soluble problem at Google (starting ironically with PageRank
itself). Yes, apparently sometimes it does pay to see many things as a nail
when you have invested in a powerful hammer.

Along those lines, here are some interesting perspectives on the beauty of
Dryad/DryadLINQ, and at least one practical reason why it didn't succeed as
an implementation.

   -
   
http://blogs.msdn.com/b/dryad/archive/2010/02/15/some-dryad-and-dryadlinq-history.aspx
   -
   
http://geekswithblogs.net/johnsPerfBlog/archive/2011/12/12/rip-dryadlinq-or-long-live-linq-to-hadoop.aspx



--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Wed, Oct 23, 2013 at 2:33 PM, Alex Boisvert <[email protected]>wrote:

> (Resending to @apache list instead of old google-group)
>
> A bit of a random question but I was wondering if there were efforts
> underway to generalize / expand the Spark API towards something that would
> be similar to the MBrace [1] model ... there's certainly an overlap between
> the features of the systems already ... so I guess I'm thinking about an
> API that's less centered around RDDs (as a collection) and more towards
> distributed dataflow that would feel more like composing Promises/Futures
> ... or even generalizing to support various sorts of container/context
> monads.
>
> [1] "MBrace: Cloud Computing with Monads"
> http://plosworkshop.org/2013/preprint/dzik.pdf
>

Reply via email to