Currently, all sorting is disallowed with structured streaming queries. Not allowing global sorting makes sense, as you can't sort an infinite list, but could non-global sorting (i.e. sortWithinPartitions) be allowed? I'm running into this with an external source I'm using, but not sure if this would be useful to file sources as well. I have to foreachBatch so that I can do a sortWithinPartitions.
Two main questions: - Does a local sort cause issues with any exactly-once guarantees streaming queries provides? I can't say I know or understand how these semantics work. Or are there other issues I can't think of this would cause? - Is the change as simple as changing the unsupported operations check to only look for global sorts instead of all sorts? The only other discussion on this topic I found is here <http://apache-spark-user-list.1001560.n3.nabble.com/How-to-preserve-event-order-per-key-in-Structured-Streaming-Repartitioning-By-Key-td34096.html> , which suggested the local sort might be something to consider allowing in structured streaming. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org