lsyldliu commented on code in PR #26775:
URL: https://github.com/apache/flink/pull/26775#discussion_r2207322498
##########
docs/content/docs/dev/table/sql/queries/joins.md:
##########
@@ -83,6 +83,9 @@ FULL OUTER JOIN Product
ON Orders.product_id = Product.id
```
+### Multiple Regular Joins
Review Comment:
Multiway Regular Joins? What do you think about this title?
##########
docs/content/docs/dev/table/tuning.md:
##########
@@ -302,3 +302,69 @@ The execution of mini-batch join operator are as shown in
the figure below.
MiniBatch optimization is disabled by default for regular join. In order to
enable this optimization, you should set options
`table.exec.mini-batch.enabled`, `table.exec.mini-batch.allow-latency` and
`table.exec.mini-batch.size`. Please see [configuration]({{< ref
"docs/dev/table/config" >}}#execution-options) page for more details.
{{< top >}}
+
+## Multiple Regular Joins
+
+{{< label Streaming >}}
+
+Streaming Flink jobs with multiple non-temporal regular joins often experience
operational instability and performance degradation due to large state sizes.
This is often because the intermediate state created by a chain of joins is
much larger than the input state itself. In Flink 2.1, we introduce a new
multi-join operator, an optimization designed to significantly reduce state
size and improve performance for join pipelines that involve record
amplification and large intermediate state. This new operator eliminates the
need to store intermediate state for joins across multiple tables by processing
joins across various input streams simultaneously. This "zero intermediate
state" approach primarily targets state reduction, offering substantial
benefits in resource consumption and operational stability.
Review Comment:
I think it would be better we could use the unified keyword `multiway join`
instead of MultiJoin in this docs, what do you think about this?
##########
docs/content/docs/dev/table/tuning.md:
##########
@@ -302,3 +302,69 @@ The execution of mini-batch join operator are as shown in
the figure below.
MiniBatch optimization is disabled by default for regular join. In order to
enable this optimization, you should set options
`table.exec.mini-batch.enabled`, `table.exec.mini-batch.allow-latency` and
`table.exec.mini-batch.size`. Please see [configuration]({{< ref
"docs/dev/table/config" >}}#execution-options) page for more details.
{{< top >}}
+
+## Multiple Regular Joins
+
+{{< label Streaming >}}
+
+Streaming Flink jobs with multiple non-temporal regular joins often experience
operational instability and performance degradation due to large state sizes.
This is often because the intermediate state created by a chain of joins is
much larger than the input state itself. In Flink 2.1, we introduce a new
multi-join operator, an optimization designed to significantly reduce state
size and improve performance for join pipelines that involve record
amplification and large intermediate state. This new operator eliminates the
need to store intermediate state for joins across multiple tables by processing
joins across various input streams simultaneously. This "zero intermediate
state" approach primarily targets state reduction, offering substantial
benefits in resource consumption and operational stability.
+
+In most joins, a significant portion of processing time is spent fetching
records from the state. The efficiency of the MultiJoin operator largely
depends on the size of this intermediate state. In a common scenario where a
pipeline experiences record amplification—meaning each join produces more data
and records than the previous one, the MultiJoin operator is more efficient.
This is because it keeps the state on which the operator interacts much
smaller, leading to a more stable operator. If a chain of joins actually
produces less state than the original records, the MultiJoin operator will
still use less state overall. However, in this specific case, binary joins
might perform better because the state that the final joins need to operate on
is smaller.
+
+### The MultiJoin Operator
+The main benefits of the MultiJoin operator are:
Review Comment:
ditto, multiway join?
##########
docs/content/docs/dev/table/tuning.md:
##########
@@ -302,3 +302,69 @@ The execution of mini-batch join operator are as shown in
the figure below.
MiniBatch optimization is disabled by default for regular join. In order to
enable this optimization, you should set options
`table.exec.mini-batch.enabled`, `table.exec.mini-batch.allow-latency` and
`table.exec.mini-batch.size`. Please see [configuration]({{< ref
"docs/dev/table/config" >}}#execution-options) page for more details.
{{< top >}}
+
+## Multiple Regular Joins
Review Comment:
ditto
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]