Thilo Schneider created SPARK-27424:
---------------------------------------

             Summary: Joining of one stream against the most recent update in 
another stream
                 Key: SPARK-27424
                 URL: https://issues.apache.org/jira/browse/SPARK-27424
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 2.4.1
            Reporter: Thilo Schneider
         Attachments: join-last-update-design.pdf

Currently, adding the most recent update of a row with a given key to another 
stream is not possible. This situation arises if one wants to use the current 
state, of one object, for example when joining the room temperature with the 
current weather.

This ticket covers creation of a {{stream_lead}} and modification of the 
streaming join logic (and state store) to additionally allow joins of the form 

{code:sql}
SELECT *
FROM A, B
WHERE 
    A.key = B.key 
    AND A.time >= B.time 
    AND A.time < stream_lead(B.time)
{code}

The major aspect of this change is that we actually need a third watermark to 
cover how late updates may come. 

A rough sketch may be found in the attached document.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to