Hi all, This is a relatively large optimization that may pose a significant risk of bugs, so I like to keep it from being enabled by default for now.
Best, Jingsong On Fri, Jan 12, 2024 at 3:01 PM shuai xu <xushuai...@gmail.com> wrote: > > Suppose we currently have a job that joins two CDC sources after > de-duplicating them and the output is available for audit analysis, and the > user turns off the parameter > "table.exec.deduplicate.mini-batch.compact-changes-enabled" to ensure that it > does not lose update details. If we don't introduce this parameter, after the > user upgrades the version, some update details may be lost due to the > mini-batch connection being enabled by default, resulting in distorted audit > results. > > > 2024年1月11日 16:19,Benchao Li <libenc...@apache.org> 写道: > > > >> the change might not be supposed for the downstream of the job which > >> requires details of changelog > > > > Could you elaborate on this a bit? I've never met such kinds of > > requirements before, I'm curious what is the scenario that requires > > this. > > > > shuai xu <xushuai...@gmail.com> 于2024年1月11日周四 13:08写道: > >> > >> Thanks for your response, Benchao. > >> > >> Here is my thought on the newly added option. > >> Users' current jobs are running on a version without minibatch join. If > >> the existing option to enable minibatch join is utilized, then when users' > >> jobs are migrated to the new version, the internal behavior of the join > >> operation within the jobs will change. Although the semantic of changelog > >> emitted by the Join operator is eventual consistency, the change might not > >> be supposed for the downstream of the job which requires details of > >> changelog. This newly added option also refers to > >> 'table.exec.deduplicate.mini-batch.compact-changes-enabled'. > >> > >> As for the implementation,The new operator shares the state of the > >> original operator and it merely has an additional minibatch for storing > >> records to do some optimization. The storage remains consistent, and there > >> is minor modification to the computational logic. > >> > >> Best, > >> Xu Shuai > >> > >>> 2024年1月10日 22:56,Benchao Li <libenc...@apache.org> 写道: > >>> > >>> Thanks shuai for driving this, mini-batch Join is a very useful > >>> optimization, +1 for the general idea. > >>> > >>> Regarding the configuration > >>> "table.exec.stream.join.mini-batch-enabled", I'm not sure it's really > >>> necessary. The semantic of changelog emitted by the Join operator is > >>> eventual consistency, so there is no much difference between original > >>> Join and mini-batch Join from this aspect. Besides, introducing more > >>> options would make it more complex for users, harder to understand and > >>> maintain, which we should be careful about. > >>> > >>> One thing about the implementation, could you make the new operator > >>> share the same state definition with the original one? > >>> > >>> shuai xu <xushuai...@gmail.com> 于2024年1月10日周三 21:23写道: > >>>> > >>>> Hi devs, > >>>> > >>>> I’d like to start a discussion on FLIP-415: Introduce a new join > >>>> operator to support minibatch[1]. > >>>> > >>>> Currently, when performing cascading connections in Flink, there is a > >>>> pain point of record amplification. Every record join operator receives > >>>> would trigger join process. However, if records of +I and -D matches , > >>>> they could be folded to reduce two times of join process. Besides, > >>>> records of -U +U might output 4 records in which two records are > >>>> redundant when encountering outer join . > >>>> > >>>> To address this issue, this FLIP introduces a new > >>>> MiniBatchStreamingJoinOperator to achieve batch processing which could > >>>> reduce number of outputting redundant messages and avoid unnecessary > >>>> join processes. > >>>> A new option is added to control the operator to avoid influencing > >>>> existing jobs. > >>>> > >>>> Please find more details in the FLIP wiki document [1]. Looking > >>>> forward to your feedback. > >>>> > >>>> [1] > >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch > >>>> > >>>> Best, > >>>> Xu Shuai > >>> > >>> > >>> > >>> -- > >>> > >>> Best, > >>> Benchao Li > >> > > > > > > -- > > > > Best, > > Benchao Li > > Best, > Xu Shuai >