lsyldliu commented on code in PR #26775:
URL: https://github.com/apache/flink/pull/26775#discussion_r2207322498


##########
docs/content/docs/dev/table/sql/queries/joins.md:
##########
@@ -83,6 +83,9 @@ FULL OUTER JOIN Product
 ON Orders.product_id = Product.id
 ```
 
+### Multiple Regular Joins

Review Comment:
   Multiway Regular Joins? What do you think about this title? 



##########
docs/content/docs/dev/table/tuning.md:
##########
@@ -302,3 +302,69 @@ The execution of mini-batch join operator are as shown in 
the figure below.
 MiniBatch optimization is disabled by default for regular join. In order to 
enable this optimization, you should set options 
`table.exec.mini-batch.enabled`, `table.exec.mini-batch.allow-latency` and 
`table.exec.mini-batch.size`. Please see [configuration]({{< ref 
"docs/dev/table/config" >}}#execution-options) page for more details.
 
 {{< top >}}
+
+## Multiple Regular Joins
+
+{{< label Streaming >}}
+
+Streaming Flink jobs with multiple non-temporal regular joins often experience 
operational instability and performance degradation due to large state sizes. 
This is often because the intermediate state created by a chain of joins is 
much larger than the input state itself. In Flink 2.1, we introduce a new 
multi-join operator, an optimization designed to significantly reduce state 
size and improve performance for join pipelines that involve record 
amplification and large intermediate state. This new operator eliminates the 
need to store intermediate state for joins across multiple tables by processing 
joins across various input streams simultaneously. This "zero intermediate 
state" approach primarily targets state reduction, offering substantial 
benefits in resource consumption and operational stability.

Review Comment:
   I think it would be better we could use the unified keyword `multiway join` 
instead of MultiJoin in this docs, what do you think about this?



##########
docs/content/docs/dev/table/tuning.md:
##########
@@ -302,3 +302,69 @@ The execution of mini-batch join operator are as shown in 
the figure below.
 MiniBatch optimization is disabled by default for regular join. In order to 
enable this optimization, you should set options 
`table.exec.mini-batch.enabled`, `table.exec.mini-batch.allow-latency` and 
`table.exec.mini-batch.size`. Please see [configuration]({{< ref 
"docs/dev/table/config" >}}#execution-options) page for more details.
 
 {{< top >}}
+
+## Multiple Regular Joins
+
+{{< label Streaming >}}
+
+Streaming Flink jobs with multiple non-temporal regular joins often experience 
operational instability and performance degradation due to large state sizes. 
This is often because the intermediate state created by a chain of joins is 
much larger than the input state itself. In Flink 2.1, we introduce a new 
multi-join operator, an optimization designed to significantly reduce state 
size and improve performance for join pipelines that involve record 
amplification and large intermediate state. This new operator eliminates the 
need to store intermediate state for joins across multiple tables by processing 
joins across various input streams simultaneously. This "zero intermediate 
state" approach primarily targets state reduction, offering substantial 
benefits in resource consumption and operational stability.
+
+In most joins, a significant portion of processing time is spent fetching 
records from the state. The efficiency of the MultiJoin operator largely 
depends on the size of this intermediate state. In a common scenario where a 
pipeline experiences record amplification—meaning each join produces more data 
and records than the previous one, the MultiJoin operator is more efficient. 
This is because it keeps the state on which the operator interacts much 
smaller, leading to a more stable operator. If a chain of joins actually 
produces less state than the original records, the MultiJoin operator will 
still use less state overall. However, in this specific case, binary joins 
might perform better because the state that the final joins need to operate on 
is smaller. 
+
+### The MultiJoin Operator
+The main benefits of the MultiJoin operator are:

Review Comment:
   ditto, multiway join?



##########
docs/content/docs/dev/table/tuning.md:
##########
@@ -302,3 +302,69 @@ The execution of mini-batch join operator are as shown in 
the figure below.
 MiniBatch optimization is disabled by default for regular join. In order to 
enable this optimization, you should set options 
`table.exec.mini-batch.enabled`, `table.exec.mini-batch.allow-latency` and 
`table.exec.mini-batch.size`. Please see [configuration]({{< ref 
"docs/dev/table/config" >}}#execution-options) page for more details.
 
 {{< top >}}
+
+## Multiple Regular Joins

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to