Hi devs,

While Spark 2.4.0 is still in progress of release votes, I'm seeing some
pull requests on non-SS are being reviewed and merged into master branch,
so I guess discussion about next release is OK.

Looks like there's a major TODO left on structured streaming: allowing
stateful operation in continuous mode (watermark, stateful exactly-once)
and no other major milestone is shared. (Please let me know if I'm missing
here!) As a structured streaming contributor's point of view, there're
another features we could discuss and see which are good to have, and
prioritize if possible (NOTE: just a brainstorming and some items might not
be valid for structured streaming):

* Native support on session window (SPARK-10816 [1])
  ** patch available
* Support delegation token on Kafka (SPARK-25501 [2])
  ** patch available
* Queryable State (SPARK-16738 [3])
  ** some discussion took place, but no action is taken yet
* End to end exactly-once with Kafka sink
  ** given Kafka is the first class on streaming source/sink nowadays
* Custom window / custom watermark
* Physically scale (up/down) streaming state
* State TTL (especially for non-watermark state)
  ** "timeout" in map/flatmapGroupsWithState fits it, but just to check
whether we want to have it for normal streaming aggregation
* Provide discarded events due to late via side output or similar feature
  ** for me it looks like tricky one, since Spark's RDD as well as SQL
semantic provide one output
* more?

Would like to hear others opinions about this. Please also share if
there're ongoing efforts on other items for structured streaming. Happy to
help out if it needs another hand.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/SPARK-10816
2. https://issues.apache.org/jira/browse/SPARK-25501
3. https://issues.apache.org/jira/browse/SPARK-16738

Reply via email to