Got a 404 on that link: https://github.com/Intel-bigdata/spark-streamsql


*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com>
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 11, 2015 at 6:41 AM, Jason Dai <jason....@gmail.com> wrote:

> Yes, a previous prototype is available
> https://github.com/Intel-bigdata/spark-streamsql, and a talk is given at
> last year's Spark Summit (
> http://spark-summit.org/2014/talk/streamsql-on-spark-manipulating-streams-by-sql-using-spark
> )
>
> We are currently porting the prototype to use the latest DataFrame API,
> and will provide a stable version for people to try soon.
>
> Thabnks,
> -Jason
>
>
> On Wed, Mar 11, 2015 at 9:12 AM, Tobias Pfeiffer <t...@preferred.jp> wrote:
>
>> Hi,
>>
>> On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao <hao.ch...@intel.com> wrote:
>>
>>>  Intel has a prototype for doing this, SaiSai and Jason are the
>>> authors. Probably you can ask them for some materials.
>>>
>>
>> The github repository is here: https://github.com/intel-spark/stream-sql
>>
>> Also, what I did is writing a wrapper class SchemaDStream that internally
>> holds a DStream[Row] and a DStream[StructType] (the latter having just one
>> element in every RDD) and then allows to do
>> - operations SchemaRDD => SchemaRDD using
>> `rowStream.transformWith(schemaStream, ...)`
>> - in particular you can register this stream's data as a table this way
>> - and via a companion object with a method `fromSQL(sql: String):
>> SchemaDStream` you can get a new stream from previously registered tables.
>>
>> However, you are limited to batch-internal operations, i.e., you can't
>> aggregate across batches.
>>
>> I am not able to share the code at the moment, but will within the next
>> months. It is not very advanced code, though, and should be easy to
>> replicate. Also, I have no idea about the performance of transformWith....
>>
>> Tobias
>>
>>
>

Reply via email to