Hi:
I am using Apache Spark Structured Streaming (2.2.1) to implement custom 
sessionization for events.  The processing is in two steps:1. 
flatMapGroupsWithState (based on user id) - which stores the state of user and 
emits events every minute until a expire event is received 
2. The next step is a aggregation (group by count)

I am using outputMode - Update.

I have a few questions:
1. If I don't use watermark at all -      (a) is the state for 
flatMapGroupsWithState state stored forever ?      (b) is the state for groupBy 
count stored for ever ?2. Is watermark applicable for cleaning up groupBy 
aggregates only ?3. Can we use watermark to manage state in by 
flatMapGroupsWithState ? If so, how ?
4. Can watermark be used for other state clean up - are there any examples for 
those ?
Thanks

Reply via email to