Ofir, Thanks for the clarification. I was confused for the moment. The links will be very helpful.
> On May 15, 2016, at 2:32 PM, Ofir Manor <ofir.ma...@equalum.io> wrote: > > Ben, > I'm just a Spark user - but at least in March Spark Summit, that was the main > term used. > Taking a step back from the details, maybe this new post from Reynold is a > better intro to Spark 2.0 highlights.... > https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html > > <https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html> > > If you want to drill down, go to SPARK-8360 "Structured Streaming (aka > Streaming DataFrames)". The design doc (written by Reynold in March) is very > readable: > https://issues.apache.org/jira/browse/SPARK-8360 > <https://issues.apache.org/jira/browse/SPARK-8360> > > Regarding directly querying (SQL) the state managed by a streaming process - > I don't know if that will land in 2.0 or only later. > > Hope that helps, > > Ofir Manor > > Co-Founder & CTO | Equalum > > > Mobile: +972-54-7801286 <tel:%2B972-54-7801286> | Email: > ofir.ma...@equalum.io <mailto:ofir.ma...@equalum.io> > On Sun, May 15, 2016 at 11:58 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Ofir, > > I just recently saw the webinar with Reynold Xin. He mentioned the Spark > Session unification efforts, but I don’t remember the DataSet for Structured > Streaming aka Continuous Applications as he put it. He did mention streaming > or unlimited DataFrames for Structured Streaming so one can directly query > the data from it. Has something changed since then? > > Thanks, > Ben > > >> On May 15, 2016, at 1:42 PM, Ofir Manor <ofir.ma...@equalum.io >> <mailto:ofir.ma...@equalum.io>> wrote: >> >> Hi Yuval, >> let me share my understanding based on similar questions I had. >> First, Spark 2.x aims to replace a whole bunch of its APIs with just two >> main ones - SparkSession (replacing Hive/SQL/Spark Context) and Dataset >> (merging of Dataset and Dataframe - which is why it inherits all the >> SparkSQL goodness), while RDD seems as a low-level API only for special >> cases. The new Dataset should also support both batch and streaming - >> replacing (eventually) DStream as well. See the design docs in SPARK-13485 >> (unified API) and SPARK-8360 (StructuredStreaming) for a good intro. >> However, as you noted, not all will be fully delivered in 2.0. For example, >> it seems that streaming from / to Kafka using StructuredStreaming didn't >> make it (so far?) to 2.0 (which is a showstopper for me). >> Anyway, as far as I understand, you should be able to apply stateful >> operators (non-RDD) on Datasets (for example, the new event-time window >> processing SPARK-8360). The gap I see is mostly limited streaming sources / >> sinks migrated to the new (richer) API and semantics. >> Anyway, I'm pretty sure once 2.0 gets to RC, the documentation and examples >> will align with the current offering... >> >> >> Ofir Manor >> >> Co-Founder & CTO | Equalum >> >> >> Mobile: +972-54-7801286 <tel:%2B972-54-7801286> | Email: >> ofir.ma...@equalum.io <mailto:ofir.ma...@equalum.io> >> On Sun, May 15, 2016 at 1:52 PM, Yuval.Itzchakov <yuva...@gmail.com >> <mailto:yuva...@gmail.com>> wrote: >> I've been reading/watching videos about the upcoming Spark 2.0 release which >> brings us Structured Streaming. One thing I've yet to understand is how this >> relates to the current state of working with Streaming in Spark with the >> DStream abstraction. >> >> All examples I can find, in the Spark repository/different videos is someone >> streaming local JSON files or reading from HDFS/S3/SQL. Also, when browsing >> the source, SparkSession seems to be defined inside org.apache.spark.sql, so >> this gives me a hunch that this is somehow all related to SQL and the likes, >> and not really to DStreams. >> >> What I'm failing to understand is: Will this feature impact how we do >> Streaming today? Will I be able to consume a Kafka source in a streaming >> fashion (like we do today when we open a stream using KafkaUtils)? Will we >> be able to do state-full operations on a Dataset[T] like we do today using >> MapWithStateRDD? Or will there be a subset of operations that the catalyst >> optimizer can understand such as aggregate and such? >> >> I'd be happy anyone could shed some light on this. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Structured-Streaming-in-Spark-2-0-and-DStreams-tp26959.html >> >> <http://apache-spark-user-list.1001560.n3.nabble.com/Structured-Streaming-in-Spark-2-0-and-DStreams-tp26959.html> >> Sent from the Apache Spark User List mailing list archive at Nabble.com >> <http://nabble.com/>. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> For additional commands, e-mail: user-h...@spark.apache.org >> <mailto:user-h...@spark.apache.org> >> >> > >