Hi devs,

I’d like to start a discussion on FLIP-415: Introduce a new join operator to 
support minibatch[1].

Currently, when performing cascading connections in Flink, there is a pain 
point of record amplification. Every record join operator receives would 
trigger join process. However, if records of +I and -D matches , they could be 
folded to reduce two times of join process. Besides, records of  -U +U might 
output 4 records in which two records are redundant when encountering outer 
join . 

To address this issue, this FLIP introduces a new  
MiniBatchStreamingJoinOperator to achieve batch processing which could reduce 
number of outputting redundant messages and avoid unnecessary join processes. 
A new option is added to control the operator to avoid influencing existing 
jobs.

Please find more details in the FLIP wiki document [1]. Looking
forward to your feedback.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch

Best,
Xu Shuai

Reply via email to