Hi: I am using Apache Spark Structured Streaming (2.2.1) to implement custom sessionization for events. The processing is in two steps:1. flatMapGroupsWithState (based on user id) - which stores the state of user and emits events every minute until a expire event is received 2. The next step is a aggregation (group by count)
I am using outputMode - Update. I have a few questions: 1. If I don't use watermark at all - (a) is the state for flatMapGroupsWithState state stored forever ? (b) is the state for groupBy count stored for ever ?2. Is watermark applicable for cleaning up groupBy aggregates only ?3. Can we use watermark to manage state in by flatMapGroupsWithState ? If so, how ? 4. Can watermark be used for other state clean up - are there any examples for those ? Thanks