HeartSaVioR commented on issue #23859: [SPARK-26956][SS] remove streaming 
output mode from data source v2 APIs
URL: https://github.com/apache/spark/pull/23859#issuecomment-574034418
 
 
   Btw, does the concern based on the real world workload? Because I cannot 
imagine "complete mode" works with decent amount of traffic, especially you're 
running the query for long time. "complete mode" means you cannot evict any 
state regardless of watermark, which won't make sense except you have finite 
set of group key (if then the cardinality of group keys will define the overall 
size of state).
   
   > If my assumption is right aren't we going back to Dstream behaviour of 
applying window transformation over the batch interval?
   
   That's why "state" comes into play in structured streaming. The state 
retains the values across micro-batches, "windows" in case of window 
transformations.
   
   In fact, as previous comments in this PR stated already, the only mode works 
without any tweak in production is append mode. In update mode you can tweak 
with custom sink to make it correctly upsert with the output, but there's no 
API to define "group keys" in existing sinks.
   
   Btw, the streaming output mode is all about how to emit output for the 
stateful operation. If you don't do any stateful operation, output mode is 
no-op.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to