Re MBrace: very interesting work. I'm a bit surprised though that the paper makes no mention of DryadLINQ ( http://research.microsoft.com/en-us/projects/dryadlinq/dryadlinq.pdf).
Architecturally it's a lot easier to see an MBrace implementation specialized to a MapReduce (or more generically, a BSP) computation, than to have a Spark implement the fully async DAG model of an MBrace/Dryad engine. More practically, as interesting as it might be as a side effort, I think for the core Spark effort to attempt something like that would be "off mission". Spark's success to date has been more due to beautiful implementation of a known architecture, than beautiful new architecture. Basically, Spark does MapReduce 10-100x faster than Hadoop, and more people by now understand how to get MapReduce to solve their problems than any other parallel model. Spark sits natively on HDFS so that makes adoption a lot easier to swallow. So at present, for Spark to mature quickly along that successful trajectory, the key problems to address are more practical "user interface" or "productivity" things like manageability, deployability, fault-tolerance improvements, multi-user access, a bigger library of pre-packaged algorithms, etc. Whether MapReduce's own success is an accident of history or something more fundamental is subject to interesting debate. I remember being constantly amazed by the number of problems that when squinted at the right way becomes an MR-soluble problem at Google (starting ironically with PageRank itself). Yes, apparently sometimes it does pay to see many things as a nail when you have invested in a powerful hammer. Along those lines, here are some interesting perspectives on the beauty of Dryad/DryadLINQ, and at least one practical reason why it didn't succeed as an implementation. - http://blogs.msdn.com/b/dryad/archive/2010/02/15/some-dryad-and-dryadlinq-history.aspx - http://geekswithblogs.net/johnsPerfBlog/archive/2011/12/12/rip-dryadlinq-or-long-live-linq-to-hadoop.aspx -- Christopher T. Nguyen Co-founder & CEO, Adatao <http://adatao.com> linkedin.com/in/ctnguyen On Wed, Oct 23, 2013 at 2:33 PM, Alex Boisvert <[email protected]>wrote: > (Resending to @apache list instead of old google-group) > > A bit of a random question but I was wondering if there were efforts > underway to generalize / expand the Spark API towards something that would > be similar to the MBrace [1] model ... there's certainly an overlap between > the features of the systems already ... so I guess I'm thinking about an > API that's less centered around RDDs (as a collection) and more towards > distributed dataflow that would feel more like composing Promises/Futures > ... or even generalizing to support various sorts of container/context > monads. > > [1] "MBrace: Cloud Computing with Monads" > http://plosworkshop.org/2013/preprint/dzik.pdf >
