Re: Structured Streaming in Spark 2.0 and DStreams

Dood Mon, 16 May 2016 10:10:07 -0700

On 5/16/2016 9:53 AM, Yuval Itzchakov wrote:

AFAIK, the underlying data represented under the DataSet[T]abstraction will be formatted in Tachyon under the hood, but as withRDD's if needed they will be spilled to local disk on the worker ofneeded.

There is another option in case of RDDs - the Apache Ignite project - amemory grid/distributed cache that supports Spark RDDs. The nice thingabout Ignite is that everything is done automatically for you, you canalso duplicate caches for resiliency, load caches from disk, partitionthem etc. and you also get automatic spillover to SQL (and NoSQL)capable backends via read/write through capabilities. I think there isalso effort to support dataframes. Ignite supports standard SQL to querythe caches too.

On Mon, May 16, 2016, 19:47 Benjamin Kim <bbuil...@gmail.com<mailto:bbuil...@gmail.com>> wrote:


    I have a curiosity question. These forever/unlimited
    DataFrames/DataSets will persist and be query capable. I still am
    foggy about how this data will be stored. As far as I know, memory
    is finite. Will the data be spilled to disk and be retrievable if
    the query spans data not in memory? Is Tachyon (Alluxio), HDFS
    (Parquet), NoSQL (HBase, Cassandra), RDBMS (PostgreSQL, MySQL),
    Object Store (S3, Swift), or any else I can’t think of going to be
    the underlying near real-time storage system?

    Thanks,
    Ben


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Structured Streaming in Spark 2.0 and DStreams

Reply via email to