Hi Devs, For quite some time i've been looking at the structured streaming API to solve lots of use cases at my workplace, I've have some doubts I wanted to clarify regarding stateful aggregations over structured streaming.
Currently, spark provides flatMapGroupWithState (FMGWS) / mapGroupWithState (MGWS) APIs to allow custom streaming aggregations by setting/ updating intermediate `GroupedState` which may or may not expire. This GroupedState is stored in form of snapshots and the latest snapshot is entirely in memory, what might be memory consuming approach and may result in OOMs. Other than this, in my opinion, FMGWS is not very flexible in terms of usage (aggregation logic and needs to be written on Rows and spark sql inbuilt functions can be used) and the timeouts require query to progress in order expire keys. To remedy this i have contributed to this project <https://github.com/chermenin/spark-states> which basically moves the expiration logic to state store (rocks db) and the state store is no longer managed by the executor jvm allowing true expiration of state with nano sec precision. My question is, is there a specific reason FMGWS API is designed the way it is and are there any down sides to the approach I have mentioned above. Do let me know you thoughts. Thanks