HeartSaVioR opened a new pull request #31989:
URL: https://github.com/apache/spark/pull/31989


   Introduction: this PR is a part of SPARK-10816 (`EventTime based 
sessionization (session window)`). Please refer #31937 to see the overall view 
of the code change. (Note that code diff could be diverged a bit.)
   
   ### What changes were proposed in this pull request?
   
   This PR introduces state store manager for session window in streaming 
query. Session window in batch query wouldn't need to leverage state store 
manager.
   
   This PR ensures versioning on state format for state store manager, so that 
we can apply further optimization after releasing Spark version. 
StreamingSessionWindowStateManager is a trait defining the available methods in 
session window state store manager. StreamingSessionWindowStateManagerBaseImpl 
and its subclasses are classes implementing the trait with versioning.
   
   The format of version 1 leverages two state stores to represent the session 
windows:
   
   * key -> list of start times (in session window spec)
   * key + start time in session window -> value
   
   This structure is simpler compared to what we tried to implement in history, 
and also less sub-optimal as it doesn't require all values to be rewritten when 
any of session window is added/modified/removed.
   
   ### Why are the changes needed?
   
   This part is a one of required on implementing SPARK-10816.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   WIP (new test suite is expected to be added, or can be skipped if we agree 
it can be skipped)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to