Hi devs, While Spark 2.4.0 is still in progress of release votes, I'm seeing some pull requests on non-SS are being reviewed and merged into master branch, so I guess discussion about next release is OK.
Looks like there's a major TODO left on structured streaming: allowing stateful operation in continuous mode (watermark, stateful exactly-once) and no other major milestone is shared. (Please let me know if I'm missing here!) As a structured streaming contributor's point of view, there're another features we could discuss and see which are good to have, and prioritize if possible (NOTE: just a brainstorming and some items might not be valid for structured streaming): * Native support on session window (SPARK-10816 [1]) ** patch available * Support delegation token on Kafka (SPARK-25501 [2]) ** patch available * Queryable State (SPARK-16738 [3]) ** some discussion took place, but no action is taken yet * End to end exactly-once with Kafka sink ** given Kafka is the first class on streaming source/sink nowadays * Custom window / custom watermark * Physically scale (up/down) streaming state * State TTL (especially for non-watermark state) ** "timeout" in map/flatmapGroupsWithState fits it, but just to check whether we want to have it for normal streaming aggregation * Provide discarded events due to late via side output or similar feature ** for me it looks like tricky one, since Spark's RDD as well as SQL semantic provide one output * more? Would like to hear others opinions about this. Please also share if there're ongoing efforts on other items for structured streaming. Happy to help out if it needs another hand. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK-10816 2. https://issues.apache.org/jira/browse/SPARK-25501 3. https://issues.apache.org/jira/browse/SPARK-16738