[ 
https://issues.apache.org/jira/browse/FLINK-37859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gustavo de Morais updated FLINK-37859:
--------------------------------------
    Description: 
We currently always rely on a chain of binary joins operators for multiple 
non-temporal regular joins in a flink streaming job. This often generates a lot 
of intermediate state which considerably increases the state size and leads to 
decrease performance and performance degradation.

 

[FLIP-516|https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator]
 proposes and implements an operator that performs a join accross N inputs with 
zero intermediate state and better performance for pipelines with records 
amplification. This is a challenging task since real-world implementations of a 
MultiJoin operator for a changelog stream processor aren't well known and [the 
algorithm itself is 
complicated|https://issues.apache.org/jira/browse/SPARK-2211]. However, there's 
a lot of potential to address one of the main stream processing issues of large 
joins, which is state growth.

  was:
We currently always rely on a chain of binary joins operators for multiple 
non-temporal regular joins in a flink streaming job. This often generates a lot 
of intermediate state which considerably increases the state size and leads to 
decrease performance and performance degradation.

 

[FLIP-516|https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator]
 proposes and implements an operator that performs a join accross N inputs with 
zero intermediate state and better performance for pipelines with records 
amplification. This is a challenging task since real-world implementations of a 
MultiJoin operator for a changelog stream processor [are not known and the 
algorithm itself is 
complicated.|https://issues.apache.org/jira/browse/SPARK-2215?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=14038513]


> FLIP-516: Streaming Multi-Way Join Optimization
> -----------------------------------------------
>
>                 Key: FLINK-37859
>                 URL: https://issues.apache.org/jira/browse/FLINK-37859
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: dalongliu
>            Assignee: Gustavo de Morais
>            Priority: Major
>
> We currently always rely on a chain of binary joins operators for multiple 
> non-temporal regular joins in a flink streaming job. This often generates a 
> lot of intermediate state which considerably increases the state size and 
> leads to decrease performance and performance degradation.
>  
> [FLIP-516|https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator]
>  proposes and implements an operator that performs a join accross N inputs 
> with zero intermediate state and better performance for pipelines with 
> records amplification. This is a challenging task since real-world 
> implementations of a MultiJoin operator for a changelog stream processor 
> aren't well known and [the algorithm itself is 
> complicated|https://issues.apache.org/jira/browse/SPARK-2211]. However, 
> there's a lot of potential to address one of the main stream processing 
> issues of large joins, which is state growth.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to