I've been reading/watching videos about the upcoming Spark 2.0 release which
brings us Structured Streaming. One thing I've yet to understand is how this
relates to the current state of working with Streaming in Spark with the
DStream abstraction.

All examples I can find, in the Spark repository/different videos is someone
streaming local JSON files or reading from HDFS/S3/SQL. Also, when browsing
the source, SparkSession seems to be defined inside org.apache.spark.sql, so
this gives me a hunch that this is somehow all related to SQL and the likes,
and not really to DStreams.

What I'm failing to understand is: Will this feature impact how we do
Streaming today? Will I be able to consume a Kafka source in a streaming
fashion (like we do today when we open a stream using KafkaUtils)? Will we
be able to do state-full operations on a Dataset[T] like we do today using
MapWithStateRDD? Or will there be a subset of operations that the catalyst
optimizer can understand such as aggregate and such?

I'd be happy anyone could shed some light on this.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Structured-Streaming-in-Spark-2-0-and-DStreams-tp26959.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to