Hi dev, I'd like to propose the deprecation of DStream in Spark 3.4, in favor of promoting Structured Streaming. (Sorry for the late proposal, if we don't make the change in 3.4, we will have to wait for another 6 months.)
We have been focusing on Structured Streaming for years (across multiple major and minor versions), and during the time we haven't made any improvements for DStream. Furthermore, recently we updated the DStream doc to explicitly say DStream is a legacy project. https://spark.apache.org/docs/latest/streaming-programming-guide.html#note The baseline of deprecation is that we don't see a particular use case which only DStream solves. This is a different story with GraphX and MLLIB, as we don't have replacements for that. The proposal does not mean we will remove the API soon, as the Spark project has been making deprecation against public API. I don't intend to propose the target version for removal. The goal is to guide users to refrain from constructing a new workload with DStream. We might want to go with this in future, but it would require a new discussion thread at that time. What do you think? Thanks, Jungtaek Lim (HeartSaVioR)