HeartSaVioR opened a new pull request #31987:
URL: https://github.com/apache/spark/pull/31987


   Introduction: this PR is a part of SPARK-10816 (`EventTime based 
sessionization (session window)`). Please refer #31937 to see the overall view 
of the code change. (Note that code diff could be diverged a bit.)
   
   ### What changes were proposed in this pull request?
   
   This PR introduces MergingSessionsIterator, which enables to merge elements 
belong to the same session directly.
   
   MergingSessionsIterator is a variant of SortAggregateIterator which merges 
the session windows based on the fact input rows are sorted by "group keys + 
the start time of session window". When merging windows, 
MergingSessionsIterator also applies aggregations on merged window, which 
eliminates the necessity on buffering inputs (which requires copying rows) and 
update the session spec for each input.
   
   MergingSessionsIterator is quite performant compared to 
UpdatingSessionsIterator brought by SPARK-34888. Note that 
MergingSessionsIterator can only apply to the cases aggregation can be applied 
altogether, so there're still rooms for UpdatingSessionIterator to be used.
   
   This issue also introduces MergingSessionsExec which is the physical node on 
leveraging MergingSessionsIterator to sort the input rows and aggregate rows 
according to the session windows.
   
   ### Why are the changes needed?
   
   This part is a one of required on implementing SPARK-10816.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   WIP (new test suite is expected to be added, or can be skipped if we agree 
it can be skipped)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to