Elias Levy created FLINK-6243:
---------------------------------

             Summary: Continuous Joins:  True Sliding Window Joins
                 Key: FLINK-6243
                 URL: https://issues.apache.org/jira/browse/FLINK-6243
             Project: Flink
          Issue Type: New Feature
          Components: Streaming
    Affects Versions: 1.1.4
            Reporter: Elias Levy


Flink defines sliding window joins as the join of elements of two streams that 
share a window of time, where the windows are defined by advancing them forward 
some amount of time that is less than the window time span.  More generally, 
such windows are just overlapping hopping windows. 

Other systems, such as Kafka Streams, support a different notion of sliding 
window joins.  In these systems, two elements of a stream are joined if the 
absolute time difference between the them is less or equal the time window 
length.

This alternate notion of sliding window joins has some advantages in some 
applications over the current implementation.  

Elements to be joined may both fall within multiple overlapping sliding 
windows, leading them to be joined multiple times, when we only wish them to be 
joined once.

The implementation need not instantiate window objects to keep track of stream 
elements, which becomes problematic in the current implementation if the window 
size is very large and the slide is very small.

It allows for asymmetric time joins.  E.g. join if elements from stream A are 
no more than X time behind and Y time head of an element from stream B.

It is currently possible to implement a join with these semantics using 
{{CoProcessFunction}}, but the capability should be a first class feature, such 
as it is in Kafka Streams.

To perform the join, elements of each stream must be buffered for at least the 
window time length.  To allow for large window sizes and high volume of 
elements, the state, possibly optionally, should be buffered such as it can 
spill to disk (e.g. by using RocksDB).

The same stream may be joined multiple times in a complex topology.  As an 
optimization, it may be wise to reuse any element buffer among colocated join 
operators.  Otherwise, there may write amplification and increased state that 
must be snapshotted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to