Re: [DISCUSS] FLIP-415: Introduce a new join operator to support minibatch

Benchao Li Wed, 10 Jan 2024 06:56:48 -0800

Thanks shuai for driving this, mini-batch Join is a very useful
optimization, +1 for the general idea.


Regarding the configuration
"table.exec.stream.join.mini-batch-enabled", I'm not sure it's really
necessary. The semantic of changelog emitted by the Join operator is
eventual consistency, so there is no much difference between original
Join and mini-batch Join from this aspect. Besides, introducing more
options would make it more complex for users, harder to understand and
maintain, which we should be careful about.

One thing about the implementation, could you make the new operator
share the same state definition with the original one?

shuai xu <[email protected]> 于2024年1月10日周三 21:23写道：
>
> Hi devs,
>
> I’d like to start a discussion on FLIP-415: Introduce a new join operator to 
> support minibatch[1].
>
> Currently, when performing cascading connections in Flink, there is a pain 
> point of record amplification. Every record join operator receives would 
> trigger join process. However, if records of +I and -D matches , they could 
> be folded to reduce two times of join process. Besides, records of  -U +U 
> might output 4 records in which two records are redundant when encountering 
> outer join .
>
> To address this issue, this FLIP introduces a new  
> MiniBatchStreamingJoinOperator to achieve batch processing which could reduce 
> number of outputting redundant messages and avoid unnecessary join processes.
> A new option is added to control the operator to avoid influencing existing 
> jobs.
>
> Please find more details in the FLIP wiki document [1]. Looking
> forward to your feedback.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
>
> Best,
> Xu Shuai



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-415: Introduce a new join operator to support minibatch

Reply via email to