HeartSaVioR opened a new pull request #31989: URL: https://github.com/apache/spark/pull/31989
Introduction: this PR is a part of SPARK-10816 (`EventTime based sessionization (session window)`). Please refer #31937 to see the overall view of the code change. (Note that code diff could be diverged a bit.) ### What changes were proposed in this pull request? This PR introduces state store manager for session window in streaming query. Session window in batch query wouldn't need to leverage state store manager. This PR ensures versioning on state format for state store manager, so that we can apply further optimization after releasing Spark version. StreamingSessionWindowStateManager is a trait defining the available methods in session window state store manager. StreamingSessionWindowStateManagerBaseImpl and its subclasses are classes implementing the trait with versioning. The format of version 1 leverages two state stores to represent the session windows: * key -> list of start times (in session window spec) * key + start time in session window -> value This structure is simpler compared to what we tried to implement in history, and also less sub-optimal as it doesn't require all values to be rewritten when any of session window is added/modified/removed. ### Why are the changes needed? This part is a one of required on implementing SPARK-10816. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? WIP (new test suite is expected to be added, or can be skipped if we agree it can be skipped) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org