[
https://issues.apache.org/jira/browse/FLINK-37859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gustavo de Morais updated FLINK-37859:
--------------------------------------
Description:
We currently always rely on a chain of binary joins operators for multiple
non-temporal regular joins in a flink streaming job. This often generates a lot
of intermediate state which considerably increases the state size and leads to
decrease performance and performance degradation.
[FLIP-516|https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator]
proposes and implements an operator that performs a join accross N inputs with
zero intermediate state and better performance for pipelines with records
amplification. This is a challenging task since real-world implementations of a
MultiJoin operator for a changelog stream processor aren't well known and [the
algorithm itself is
complicated|https://issues.apache.org/jira/browse/SPARK-2211]. However, there's
a lot of potential to address one of the main stream processing issues of large
joins, which is state growth.
was:
We currently always rely on a chain of binary joins operators for multiple
non-temporal regular joins in a flink streaming job. This often generates a lot
of intermediate state which considerably increases the state size and leads to
decrease performance and performance degradation.
[FLIP-516|https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator]
proposes and implements an operator that performs a join accross N inputs with
zero intermediate state and better performance for pipelines with records
amplification. This is a challenging task since real-world implementations of a
MultiJoin operator for a changelog stream processor [are not known and the
algorithm itself is
complicated.|https://issues.apache.org/jira/browse/SPARK-2215?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=14038513]
> FLIP-516: Streaming Multi-Way Join Optimization
> -----------------------------------------------
>
> Key: FLINK-37859
> URL: https://issues.apache.org/jira/browse/FLINK-37859
> Project: Flink
> Issue Type: New Feature
> Reporter: dalongliu
> Assignee: Gustavo de Morais
> Priority: Major
>
> We currently always rely on a chain of binary joins operators for multiple
> non-temporal regular joins in a flink streaming job. This often generates a
> lot of intermediate state which considerably increases the state size and
> leads to decrease performance and performance degradation.
>
> [FLIP-516|https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator]
> proposes and implements an operator that performs a join accross N inputs
> with zero intermediate state and better performance for pipelines with
> records amplification. This is a challenging task since real-world
> implementations of a MultiJoin operator for a changelog stream processor
> aren't well known and [the algorithm itself is
> complicated|https://issues.apache.org/jira/browse/SPARK-2211]. However,
> there's a lot of potential to address one of the main stream processing
> issues of large joins, which is state growth.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)