Re: Opinions stratosphere

Michael Malak Fri, 02 May 2014 09:51:13 -0700

"looks like Spark outperforms Stratosphere fairly consistently in the 
experiments"


There was one exception the paper noted, which was when memory resources were 
constrained. In that case, Stratosphere seemed to have degraded more gracefully 
than Spark, but the author did not explore it deeper. The author did insert 
into his conclusion section, though, "However, in our experiments, for 
iterative algorithms, the Spark programs may show the poor results in 
performance in the environment of limited memory resources."

I recently blogged a fuller list of alternatives/competitors to Spark:
http://datascienceassn.org/content/alternatives-spark-memory-distributed-computing

 
On Friday, May 2, 2014 10:39 AM, Philip Ogren <philip.og...@oracle.com> wrote:
 
Great reference!  I just skimmed through the results without reading much of 
the methodology - but it looks like Spark outperforms Stratosphere fairly 
consistently in the experiments.  It's too bad the data sources only range from 
2GB to 8GB.  Who knows if the apparent pattern would extend out to 64GB, 128GB, 
1TB, and so on...




On 05/01/2014 06:02 PM, Christopher Nguyen wrote:

Someone (Ze Ni, https://www.sics.se/people/ze-ni) has actually attempted such a 
comparative study as a Masters thesis: 
>
>
>http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf
>
>
>
>According to this snapshot (c. 2013), Stratosphere is different from Spark in 
>not having an explicit concept of an in-memory dataset (e.g., RDD).
>
>
>In principle this could be argued to be an implementation detail; the 
>operators and execution plan/data flow are of primary concern in the API, and 
>the data representation/materializations are otherwise unspecified.
>
>
>But in practice, for long-running interactive applications, I consider RDDs to 
>be of fundamental, first-class citizen importance, and the key distinguishing 
>feature of Spark's model vs other "in-memory" approaches that treat memory 
>merely as an implicit cache.
>
>
>--
>
>Christopher T. Nguyen
>Co-founder & CEO, Adatao
>linkedin.com/in/ctnguyen
>
>
>
>
>On Tue, Nov 26, 2013 at 1:26 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
>
>I don’t know a lot about it except from the research side, where the team has 
>done interesting optimization stuff for these types of applications. In terms 
>of the engine, one thing I’m not sure of is whether Stratosphere allows 
>explicit caching of datasets (similar to RDD.cache()) and interactive queries 
>(similar to spark-shell). But it’s definitely an interesting project to watch.
>>
>>Matei
>> 
>>
>>On Nov 22, 2013, at 4:17 PM, Ankur Chauhan <achau...@brightcove.com> wrote:
>>
>>> Hi,
>>>
>>> That's what I thought but as per the slides on http://www.stratosphere.eu 
>>> they seem to "know" about spark and the scala api does look similar.
>>> I found the PACT model interesting. Would like to
                  know if matei or other core comitters have something
                  to weight in on.
>>>
>>> -- Ankur
>>> On 22 Nov 2013, at 16:05, Patrick Wendell <pwend...@gmail.com> wrote:
>>>
>>>> I've never seen that project before, would be
                  interesting to get a
>>>> comparison. Seems to offer a much lower level
                  API. For instance this
>>>> is a wordcount program:
>>>>
>>>> https://github.com/stratosphere/stratosphere/blob/master/pact/pact-examples/src/main/java/eu/stratosphere/pact/example/wordcount/WordCount.java
>>>>
>>>> On Thu, Nov 21, 2013 at 3:15 PM, Ankur
                  Chauhan <achau...@brightcove.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I was just curious about https://github.com/stratosphere/stratosphere
>>>>> and how does spark compare to it. Anyone
                  has any experience with it to make
>>>>> any comments?
>>>>>
>>>>> -- Ankur
>>>
>>
>>
>

Re: Opinions stratosphere

Reply via email to