I've been reading/watching videos about the upcoming Spark 2.0 release which brings us Structured Streaming. One thing I've yet to understand is how this relates to the current state of working with Streaming in Spark with the DStream abstraction.
All examples I can find, in the Spark repository/different videos is someone streaming local JSON files or reading from HDFS/S3/SQL. Also, when browsing the source, SparkSession seems to be defined inside org.apache.spark.sql, so this gives me a hunch that this is somehow all related to SQL and the likes, and not really to DStreams. What I'm failing to understand is: Will this feature impact how we do Streaming today? Will I be able to consume a Kafka source in a streaming fashion (like we do today when we open a stream using KafkaUtils)? Will we be able to do state-full operations on a Dataset[T] like we do today using MapWithStateRDD? Or will there be a subset of operations that the catalyst optimizer can understand such as aggregate and such? I'd be happy anyone could shed some light on this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Structured-Streaming-in-Spark-2-0-and-DStreams-tp26959.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org