Re: Adhoc queries on Spark 2.0 with Structured Streaming

Sunita Arvind Fri, 06 May 2016 15:17:58 -0700

Thanks for the clarification Michael and good luck with Spark 2.0. It
really looks promising.


I am especially interested in adhoc queries aspect. Probably that is what
is being referred to as Continuous SQL in the slides. What is the timeframe
for availability this functionality?

regards
Sunita

On Fri, May 6, 2016 at 2:24 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> That is a forward looking design doc and not all of it has been
> implemented yet.  With Spark 2.0 the main sources and sinks will be file
> based, though we hope to quickly expand that now that a lot of
> infrastructure is in place.
>
> On Fri, May 6, 2016 at 2:11 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> I was
>> reading 
>> StructuredStreamingProgrammingAbstractionSemanticsandAPIs-ApacheJIRA.pdf
>> attached to SPARK-8360
>>
>> On page 12, there was mentioning of .format(“kafka”) but I searched the
>> codebase and didn't find any occurrence.
>>
>> FYI
>>
>> On Fri, May 6, 2016 at 1:06 PM, Michael Malak <
>> michaelma...@yahoo.com.invalid> wrote:
>>
>>> At first glance, it looks like the only streaming data sources available
>>> out of the box from the github master branch are
>>> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
>>>  and
>>> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala
>>>  .
>>> Out of the Jira epic for Structured Streaming
>>> https://issues.apache.org/jira/browse/SPARK-8360 it would seem the
>>> still-open https://issues.apache.org/jira/browse/SPARK-10815 "API
>>> design: data sources and sinks" is relevant here.
>>>
>>> In short, it would seem the code is not there yet to create a Kafka-fed
>>> Dataframe/Dataset that can be queried with Structured Streaming; or if it
>>> is, it's not obvious how to write such code.
>>>
>>>
>>> ------------------------------
>>> *From:* Anthony May <anthony...@gmail.com>
>>> *To:* Deepak Sharma <deepakmc...@gmail.com>; Sunita Arvind <
>>> sunitarv...@gmail.com>
>>> *Cc:* "user@spark.apache.org" <user@spark.apache.org>
>>> *Sent:* Friday, May 6, 2016 11:50 AM
>>> *Subject:* Re: Adhoc queries on Spark 2.0 with Structured Streaming
>>>
>>> Yeah, there isn't even a RC yet and no documentation but you can work
>>> off the code base and test suites:
>>> https://github.com/apache/spark
>>> And this might help:
>>>
>>> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/streaming/DataFrameReaderWriterSuite.scala
>>>
>>> On Fri, 6 May 2016 at 11:07 Deepak Sharma <deepakmc...@gmail.com> wrote:
>>>
>>> Spark 2.0 is yet to come out for public release.
>>> I am waiting to get hands on it as well.
>>> Please do let me know if i can download source and build spark2.0 from
>>> github.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Fri, May 6, 2016 at 9:51 PM, Sunita Arvind <sunitarv...@gmail.com>
>>> wrote:
>>>
>>> Hi All,
>>>
>>> We are evaluating a few real time streaming query engines and spark is
>>> my personal choice. The addition of adhoc queries is what is getting me
>>> further excited about it, however the talks I have heard so far only
>>> mention about it but do not provide details. I need to build a prototype to
>>> ensure it works for our use cases.
>>>
>>> Can someone point me to relevant material for this.
>>>
>>> regards
>>> Sunita
>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>>
>>>
>>>
>>
>

Re: Adhoc queries on Spark 2.0 with Structured Streaming

Reply via email to