Re: Opinions stratosphere

Philip Ogren Fri, 02 May 2014 09:40:21 -0700

Great reference! I just skimmed through the results without readingmuch of the methodology - but it looks like Spark outperformsStratosphere fairly consistently in the experiments. It's too bad thedata sources only range from 2GB to 8GB. Who knows if the apparentpattern would extend out to 64GB, 128GB, 1TB, and so on...



On 05/01/2014 06:02 PM, Christopher Nguyen wrote:

Someone (Ze Ni, https://www.sics.se/people/ze-ni) has actuallyattempted such a comparative study as a Masters thesis:


http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf

According to this snapshot (c. 2013), Stratosphere is different fromSpark in not having an explicit concept of an in-memory dataset (e.g.,RDD).

In principle this could be argued to be an implementation detail; theoperators and execution plan/data flow are of primary concern in theAPI, and the data representation/materializations are otherwiseunspecified.

But in practice, for long-running interactive applications, I considerRDDs to be of fundamental, first-class citizen importance, and the keydistinguishing feature of Spark's model vs other "in-memory"approaches that treat memory merely as an implicit cache.


--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen <http://linkedin.com/in/ctnguyen>

On Tue, Nov 26, 2013 at 1:26 PM, Matei Zaharia<matei.zaha...@gmail.com <mailto:matei.zaha...@gmail.com>> wrote:


    I don’t know a lot about it except from the research side, where
    the team has done interesting optimization stuff for these types
    of applications. In terms of the engine, one thing I’m not sure of
    is whether Stratosphere allows explicit caching of datasets
    (similar to RDD.cache()) and interactive queries (similar to
    spark-shell). But it’s definitely an interesting project to watch.

    Matei

    On Nov 22, 2013, at 4:17 PM, Ankur Chauhan
    <achau...@brightcove.com <mailto:achau...@brightcove.com>> wrote:

    > Hi,
    >
    > That's what I thought but as per the slides on
    http://www.stratosphere.eu they seem to "know" about spark and the
    scala api does look similar.
    > I found the PACT model interesting. Would like to know if matei
    or other core comitters have something to weight in on.
    >
    > -- Ankur
    > On 22 Nov 2013, at 16:05, Patrick Wendell <pwend...@gmail.com
    <mailto:pwend...@gmail.com>> wrote:
    >
    >> I've never seen that project before, would be interesting to get a
    >> comparison. Seems to offer a much lower level API. For instance
    this
    >> is a wordcount program:
    >>
    >>
    
https://github.com/stratosphere/stratosphere/blob/master/pact/pact-examples/src/main/java/eu/stratosphere/pact/example/wordcount/WordCount.java
    >>
    >> On Thu, Nov 21, 2013 at 3:15 PM, Ankur Chauhan
    <achau...@brightcove.com <mailto:achau...@brightcove.com>> wrote:
    >>> Hi,
    >>>
    >>> I was just curious about
    https://github.com/stratosphere/stratosphere
    >>> and how does spark compare to it. Anyone has any experience
    with it to make
    >>> any comments?
    >>>
    >>> -- Ankur
    >

Re: Opinions stratosphere

Reply via email to