HeartSaVioR commented on issue #23859: [SPARK-26956][SS] remove streaming output mode from data source v2 APIs URL: https://github.com/apache/spark/pull/23859#issuecomment-574034418 Btw, does the concern based on the real world workload? Because I cannot imagine "complete mode" works with decent amount of traffic, especially you're running the query for long time. "complete mode" means you cannot evict any state regardless of watermark, which won't make sense except you have finite set of group key (if then the cardinality of group keys will define the overall size of state). > If my assumption is right aren't we going back to Dstream behaviour of applying window transformation over the batch interval? That's why "state" comes into play in structured streaming. The state retains the values across micro-batches, "windows" in case of window transformations. In fact, as previous comments in this PR stated already, the only mode works without any tweak in production is append mode. In update mode you can tweak with custom sink to make it correctly upsert with the output, but there's no API to define "group keys" in existing sinks. Btw, the streaming output mode is all about how to emit output for the stateful operation. If you don't do any stateful operation, output mode is no-op.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org