HeartSaVioR opened a new pull request #31986:
URL: https://github.com/apache/spark/pull/31986


   Introduction: this PR is a part of SPARK-10816 (`EventTime based 
sessionization (session window)`). Please refer #31937 to see the overall view 
of the code change. (Note that code diff could be diverged a bit.)
   
   ### What changes were proposed in this pull request?
   
   This PR introduces UpdatingSessionsIterator, which analyzes neighbor 
elements and adjust session information on elements.
   
   UpdatingSessionsIterator calculates and updates the session window for each 
element in the given iterator, which makes elements in the same session window 
having same session spec. Downstream can apply aggregation to finally merge 
these elements bound to the same session window.
   
   UpdatingSessionsIterator works on the precondition that given iterator is 
sorted by "group keys + start time of session window", and the iterator still 
retains the characteristic of the sort.
   
   UpdatingSessionsIterator copies the elements to safely update on each 
element, as well as buffers elements which are bound to the same session 
window. Due to such overheads, MergingSessionsIterator which will be introduced 
via SPARK-34889 should be used whenever possible.
   
   This PR also introduces UpdatingSessionsExec which is the physical node on 
leveraging UpdatingSessionsIterator to sort the input rows and updates session 
information on input rows.
   
   ### Why are the changes needed?
   
   This part is a one of required on implementing SPARK-10816.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New test suite added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to