RE: SQL with Spark Streaming

Huang, Jie Thu, 12 Mar 2015 01:47:52 -0700

@Tobias,

According to my understanding, your approach is to register a series of tables 
by using transformWith, right? And then, you can get a new Dstream (i.e., 
SchemaDstream), which consists of lots of SchemaRDDs.


Please correct me if my understanding is wrong.

Thank you && Best Regards,
Grace （Huang Jie)

From: Jason Dai [mailto:jason....@gmail.com]
Sent: Wednesday, March 11, 2015 10:45 PM
To: Irfan Ahmad
Cc: Tobias Pfeiffer; Cheng, Hao; Mohit Anchlia; user@spark.apache.org; Shao, 
Saisai; Dai, Jason; Huang, Jie
Subject: Re: SQL with Spark Streaming

Sorry typo; should be https://github.com/intel-spark/stream-sql

Thanks,
-Jason

On Wed, Mar 11, 2015 at 10:19 PM, Irfan Ahmad 
<ir...@cloudphysics.com<mailto:ir...@cloudphysics.com>> wrote:
Got a 404 on that link: https://github.com/Intel-bigdata/spark-streamsql


Irfan Ahmad
CTO | Co-Founder | CloudPhysics<http://www.cloudphysics.com>
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 11, 2015 at 6:41 AM, Jason Dai 
<jason....@gmail.com<mailto:jason....@gmail.com>> wrote:
Yes, a previous prototype is available 
https://github.com/Intel-bigdata/spark-streamsql, and a talk is given at last 
year's Spark Summit 
(http://spark-summit.org/2014/talk/streamsql-on-spark-manipulating-streams-by-sql-using-spark)

We are currently porting the prototype to use the latest DataFrame API, and 
will provide a stable version for people to try soon.

Thabnks,
-Jason


On Wed, Mar 11, 2015 at 9:12 AM, Tobias Pfeiffer 
<t...@preferred.jp<mailto:t...@preferred.jp>> wrote:
Hi,

On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao 
<hao.ch...@intel.com<mailto:hao.ch...@intel.com>> wrote:
Intel has a prototype for doing this, SaiSai and Jason are the authors. 
Probably you can ask them for some materials.

The github repository is here: https://github.com/intel-spark/stream-sql

Also, what I did is writing a wrapper class SchemaDStream that internally holds 
a DStream[Row] and a DStream[StructType] (the latter having just one element in 
every RDD) and then allows to do
- operations SchemaRDD => SchemaRDD using 
`rowStream.transformWith(schemaStream, ...)`
- in particular you can register this stream's data as a table this way
- and via a companion object with a method `fromSQL(sql: String): 
SchemaDStream` you can get a new stream from previously registered tables.

However, you are limited to batch-internal operations, i.e., you can't 
aggregate across batches.

I am not able to share the code at the moment, but will within the next months. 
It is not very advanced code, though, and should be easy to replicate. Also, I 
have no idea about the performance of transformWith....

Tobias

RE: SQL with Spark Streaming

Reply via email to