[GitHub] [flink] xuyangzhong commented on a diff in pull request #20513: [FLINK-28858][docs] Add document to describe join hints for batch sql

GitBox Fri, 12 Aug 2022 03:30:02 -0700


xuyangzhong commented on code in PR #20513:
URL: https://github.com/apache/flink/pull/20513#discussion_r944079451



##########
docs/content/docs/dev/table/sql/queries/hints.md:
##########
@@ -84,4 +84,222 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join hints allow users to manually choose the join strategy in optimizer, and 
manually optimize the execution plan to improve the performance of the query. 
Now only support in batch mode.
+
+### Join Hints Strategies
+The following Join strategies are currently supported in Flink SQL for batch 
job:
+
+
+**BroadCast Join**
+
+
+In this Join strategy, the data on the build side (usually a small table) will 
be broadcast to each downstream operator, 
+and the data on the probe side (usually a large table) will be sent directly 
to the downstream operator with Forward. 
+Then the data on the build side (small table) will be built into a Hash Table 
for the probe side to query.
+
+**Hash Shuffle Join**
+
+
+In this Join strategy, the data on the Build side and the Probe side will be 
shuffled with the join key, and the data 
+with the same key will be distributed to the same downstream operator. Then 
the data on the build side (smaller table) 

Review Comment:
   It seems 'the same downstream operator' should be replaced with 'the same 
subtask of the downstream operator'



##########
docs/content/docs/dev/table/sql/queries/hints.md:
##########
@@ -84,4 +84,222 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join hints allow users to manually choose the join strategy in optimizer, and 
manually optimize the execution plan to improve the performance of the query. 
Now only support in batch mode.
+
+### Join Hints Strategies
+The following Join strategies are currently supported in Flink SQL for batch 
job:
+
+
+**BroadCast Join**
+
+
+In this Join strategy, the data on the build side (usually a small table) will 
be broadcast to each downstream operator, 
+and the data on the probe side (usually a large table) will be sent directly 
to the downstream operator with Forward. 
+Then the data on the build side (small table) will be built into a Hash Table 
for the probe side to query.
+
+**Hash Shuffle Join**
+
+
+In this Join strategy, the data on the Build side and the Probe side will be 
shuffled with the join key, and the data 
+with the same key will be distributed to the same downstream operator. Then 
the data on the build side (smaller table) 
+will be built into a Hash Table for the probe side to query.
+
+
+Compared with the `Hash Shuffle Join` strategy, the `BroadCast Join` strategy 
does not need to shuffle the probe side, which saves a lot of 
+shuffle time. Therefore, when a table is tiny, the `BroadCast Join` strategy 
is usually selected to avoid the shuffle 
+cost and improve computing performance. However, when the scale of the small 
table is large, the `BroadCast Join` strategy 
+is not applicable, because the overhead of redundant data transmission will 
exceed the overhead of shuffle.
+
+**Sort Merge Join**
+
+This Join strategy is aimed at the scenario of joining between two large 
tables or the scenario that the data at both 
+sides of the join is already in order. This strategy first shuffles the data 
on both sides of the join to the downstream 
+operator according to the Join Key. Then the downstream operator sorts the 
data before joining, and finally joins the data at both ends. 
+This strategy eliminates the need to load all data on one side into memory, 
thus relieving the pressure on computational memory.
+
+**Nested Loop Join**
+
+In this Join strategy, the probe side is used as an Outer Loop, and the build 
side is used as an Inner Loop, and the data is joined through two-layer loops.
+
+### Syntax
+The join hints syntax in Flink:
+```sql
+# query:
+SELECT /*+ hint_content[, hint_content] */ ...
+
+hint_content:
+    hint_strategy_name(hint_item[, hint_item])
+
+hint_strategy_name:
+    supported_join_hint_name
+
+hint_item:
+    string_literal
+```
+
+### Examples
+
+**BroadCast Join**
+
+
+*NOTE:* `BroadCast Join` strategy only supports join with equivalence join 
condition. And it not supports `Full Outer Join` join type.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Select broadcast join strategy with t1, t1 will be the broadcast table.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Select broadcast join strategy with t1 and t3.
+-- For this case, when join between t1 and t2, t1 will be the broadcast table,
+-- after that, when join between the result after t1 joins t2 and t3, t3 will 
be the broadcast table.

Review Comment:
   ", after that," -> ". And"



##########
docs/content/docs/dev/table/sql/queries/hints.md:
##########
@@ -84,4 +84,222 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join hints allow users to manually choose the join strategy in optimizer, and 
manually optimize the execution plan to improve the performance of the query. 
Now only support in batch mode.
+
+### Join Hints Strategies
+The following Join strategies are currently supported in Flink SQL for batch 
job:
+
+
+**BroadCast Join**
+
+
+In this Join strategy, the data on the build side (usually a small table) will 
be broadcast to each downstream operator, 
+and the data on the probe side (usually a large table) will be sent directly 
to the downstream operator with Forward. 
+Then the data on the build side (small table) will be built into a Hash Table 
for the probe side to query.
+
+**Hash Shuffle Join**
+
+
+In this Join strategy, the data on the Build side and the Probe side will be 
shuffled with the join key, and the data 

Review Comment:
   I think it is necessary to unify the capitalization of 'Build side and Probe 
side' or 'build side and probe side'



##########
docs/content/docs/dev/table/sql/queries/hints.md:
##########
@@ -84,4 +84,219 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join hints allow users to suggest the join strategy to optimizer in order to 
get a more optimal execution plan. 
+Currently, only supported in batch mode.
+
+### Supported Join Hints
+The following Join strategies are currently supported in Flink SQL for batch 
mode:
+
+
+**BroadCast Join**
+
+In this Join strategy, the data on the build side (ideally the smaller table) 
will be broadcast to each downstream operator, 
+and the data on the probe side (usually a large table) will be sent directly 
to the downstream operator with Forward. 
+Then the data on the build side (small table) will be built into a Hash Table 
for the probe side to query.
+
+**Hash Shuffle Join**
+
+In this Join strategy, the data on the Build side and the Probe side will be 
shuffled with the join key, and the data 
+with the same key will be distributed to the same downstream operator. Then 
the data on the build side (smaller table) 
+will be built into a Hash Table for the probe side to query.
+
+
+Compared with the `Hash Shuffle Join` strategy, the `BroadCast Join` strategy 
does not need to shuffle the probe side, which saves a lot of 
+shuffle time. Therefore, when a table is tiny, the `BroadCast Join` strategy 
is usually selected to avoid the shuffle 
+cost and improve computing performance. However, when the scale of the small 
table is large, the `BroadCast Join` strategy 
+is not applicable, because the overhead of redundant data transmission will 
exceed the overhead of shuffle.
+
+**Sort Merge Join**
+
+This Join strategy is aimed at the scenario of joining between two large 
tables or the scenario that the data at both 
+sides of the join is already in order. This strategy first shuffles the data 
on both sides of the join to the downstream 
+operator according to the Join Key. Then the downstream operator sorts the 
data before joining, and finally joins the data at both ends. 
+This strategy eliminates the need to load all data on one side into memory, 
thus relieving the pressure on computational memory.
+
+**Nested Loop Join**
+
+In this Join strategy, the probe side is used as an Outer Loop, and the build 
side is used as an Inner Loop, and the data is joined through two-layer loops.
+
+### Syntax
+The join hints syntax in Flink:
+```sql
+# query:
+SELECT /*+ hint_content[, hint_content] */ ...
+
+hint_content:
+    hint_strategy_name(hint_item[, hint_item])
+
+hint_strategy_name:
+    supported_join_hint_name
+
+hint_item:
+    string_literal
+```
+
+### Examples
+
+**BroadCast Join**
+
+*NOTE:* `BroadCast Join` strategy only supports join with equivalence join 
condition. And it not supports `Full Outer Join` join type.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Select broadcast join strategy with t1, t1 will be the broadcast table.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Select broadcast join strategy with t1 and t3.
+-- For this case, when join between t1 and t2, t1 will be the broadcast table,
+-- after that, when join between the result after t1 joins t2 and t3, t3 will 
be the broadcast table.
+SELECT /*+ BROADCAST(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+
+-- BroadCast join strategy don't support non-equivalent join conditions.
+-- For this case, default join strategy will be adopted instead of broadCast 
join strategy.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+
+-- BroadCast join strategy don't support full outer join type.
+-- For this case, default join strategy will be adopted instead of broadCast 
join strategy.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;
+```
+
+
+**Hash Shuffle Join**
+
+*NOTE:* `Hash Shuffle Join` strategy only supports join with equivalence join 
condition.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Select hash shuffle join strategy with t1, t1 will be the build side.
+SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Select hash shuffle join strategy with t1 and t3.
+-- For this case, when join between t1 and t2, t1 will be the build side,
+-- after that, when join between the result after t1 joins t2 and t3, t3 will 
be the build side.
+SELECT /*+ SHUFFLE_HASH(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 
ON t1.id = t3.id;
+
+-- Hash shuffle join strategy don't support non-equivalent join conditions.
+-- For this case, default join strategy will be adopted instead of hash 
shuffle join strategy.
+SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+```
+
+**Sort Merge Join**
+
+
+*NOTE:* `Sort Merge Join` strategy only supports join with equivalence join 
condition.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Sort merge join is adopted.
+SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Sort merge join is both adopted in these two joins.
+SELECT /*+ SHUFFLE_MERGE(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 
ON t1.id = t3.id;
+
+-- Sort merge join strategy don't support non-equivalent join conditions.
+-- For this case, default join strategy will be adopted instead of sort merge 
join strategy.
+SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+```
+
+
+**Nested Loop Join**
+
+*NOTE:* `Nest Loop Join` support both equivalent and non-equivalent join 
condition.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Select nest loop join strategy with t1, t1 will be the build side.
+SELECT /*+ NEST_LOOP(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Select nest loop join strategy with t1 and t3.
+-- For this case, when join between t1 and t2, t1 will be the build side,
+-- after that, when join between the result after t1 joins t2 and t3, t3 will 
be the build side.
+SELECT /*+ NEST_LOOP(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+```
+
+**Conflict Cases In Join Hints**
+
+For some cases, join hints will conflict, we formulate the behavior of join 
hint in Flink for the following conditions:
+ - View
+ - Sub query with table alias
+ - Conflict in one same join hint strategy
+ - Conflict in different join hint strategies
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+
+
+-- View Case
+
+CREATE VIEW view1 AS SELECT t1.* FROM t1 JOIN ON t2 ON t1.id = t2.id;
+
+-- For this view case, broadCast join strategy will not be propagated into 
view1, and join inside view1 will use default join strategy.
+SELECT /*+ BROADCAST(t1) */ * FROM view1 JOIN t3 ON view1.id = t3.id;
+
+-- For this view case, the join inside view1 will use default join strategy, 
and join between view1 and t1 will use broadCast join strategy.
+SELECT /*+ BROADCAST(t1) */ * FROM view1 JOIN t1 ON view1.id = t1.id;
+
+
+
+-- Sub Query with Table Alias Case
+
+-- For this alias case, an exception that the alias_table not existent will be 
thrown.
+SELECT /*+ BROADCAST(alias_table) */ * FROM (SELECT * FROM t1) alias_table 
JOIN t2 on alias_table.id = t2.id;

Review Comment:
   currently, join hint with query-block alias is support



##########
docs/content.zh/docs/dev/table/sql/queries/hints.md:
##########
@@ -79,4 +79,209 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join Hints 特性允许用户手动的指定表 join 时使用的 join 策略来达到优化执行的目标，该特性只能在批模式(Batch mode)中执行。
+
+### Join Hints 策略
+在批模式下， Flink 现在支持以下的几种 join 策略：
+
+
+**BroadCast Join**
+
+对于这种 Join 策略， 位于 join build 端(通常为小表)的数据会被广播到每一个下游的算子中，同时，位于 join probe 
端(通常为大表) 会使用 Forward 的
+方式发送给下游的算子。然后，位于 join build 端的数据会被放入一个 hash table 中提供给 probe 端的表去查询。
+
+**Hash Shuffle Join**
+
+对于这种 Join 策略， 位于 join build 端和 位于 join probe 端的数据都会按照 join key 进行 shuffle， 
拥有相同 join key 的数据会被分配到下游的同一个算子中。
+然后，位于 join build 端的数据会被放入一个 hash table 中提供给 probe 端的表去查询。
+
+相较于 `Hash Shuffle Join`，`BroadCast Join` 这种 join 策略不需要同时 shuffle probe 
端，这样会节约很多的时间。所以，如果 build 端的表很小的话，我们通常
+会选择 `BroadCast Join` 策略来避免 shuffle 带来的时间损耗并提高 join 的性能。 相反，当 build 
端的小表的数据量较大时， 我们则不会选用`BroadCast Join` 策略，
+因为大量的数据采用广播的方式带来的成本远大于 shuffle。
+
+**Sort Merge Join**
+
+该 join 策略主要是用于两个大表进行 join 的场景或者 join 的两端的数据已完成排序的场景。该策略首先会根据 join key 来 
shuffle 两端的数据到下游的算子。
+然后，下游的算子在 join 前会提前将数据进行排序。 最后，会完成两张表的 
join。该策略消除了将一侧的所有数据加载到内存中的需要，从而减轻了计算内存的压力。
+
+**Nested Loop Join**
+
+对于这种 Join 策略，join 的 probe 端被用作 outer loop，同时，join 的 build 端被用作 inner loop， 
数据会通过两层的 loops 来 join 数据。
+
+
+### 语法
+Flink 中的 join hints 语法：
+```sql
+# query:
+SELECT /*+ hint_content[, hint_content] */ ...
+
+hint_content:
+    hint_strategy_name(hint_item[, hint_item])
+
+hint_strategy_name:
+    supported_join_hint_name
+
+hint_item:
+    string_literal
+```
+
+### 示例
+
+**BroadCast Join**
+
+*注意：* `BroadCast Join` 策略只支持等值的 join 条件， 同时，也不支持 `Full Outer Join` 的 join 类型。
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- 对表 t1 指定 broadcast join 策略, 则表 t1 会被当作需 broadcast 的小表。
+SELECT /*+ BROADCAST(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- 对表 t1 和 t3 指定 broadcast join 策略。
+-- 对于这种情况，当 t1 和 t2 join 时， 表 t1 会被当作需 broadcast 的小表，
+-- 然后，当 t1 与 t2 join 后的结果与 t3 join 时，则 t3 会被当作需 broadcast 的小表。
+SELECT /*+ BROADCAST(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+
+-- `BroadCast Join` 策略只支持等值的 join 条件，
+-- 对于这种情况，由于使用了非等值的 join 条件，则 join hints 失效，会使用默认的 join 策略。
+SELECT /*+ BROADCAST(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+
+-- `BroadCast Join` 策略不支持 `Full Outer Join` 的 join 类型
+-- 对于这种情况，由于是 `Full Outer Join`，则 join hints 失效，会使用默认的 join 策略。
+SELECT /*+ BROADCAST(t1) */ * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;
+```
+
+
+**Hash Shuffle Join**
+
+*注意：* `Hash Shuffle Join` 只支持等值的 join 条件。
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- 对表 t1 使用 `Hash Shuffle Join` 策略， 则表 t1 会被当作 join 的 build 端。
+SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- 对表 t1 和 t3 指定 hash shuffle join 策略。
+-- 对于这种情况，当 t1 和 t2 join 时， 表 t1 会被当作 join 的 build 端，
+-- 然后，当 t1 与 t2 join 后的结果与 t3 join 时，则 t3 会被当作 join 的 build 端。
+SELECT /*+ SHUFFLE_HASH(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 
ON t1.id = t3.id;
+
+-- `Hash Shuffle Join` 策略只支持等值的 join 条件，
+-- 对于这种情况，由于使用了非等值的 join 条件，则 join hints 失效，会使用默认的 join 策略。
+SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+```
+
+**Sort Merge Join**
+
+*注意：* `Sort Merge Join` 只支持等值的 join 条件。
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- 会使用 Sort merge join。
+SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Sort merge join 会使用在两次的 join 中。
+SELECT /*+ SHUFFLE_MERGE(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 
ON t1.id = t3.id;
+
+-- `Sort Merge Join` 策略只支持等值的 join 条件，
+-- 对于这种情况，由于使用了非等值的 join 条件，则 join hints 失效，会使用默认的 join 策略。
+SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+```
+
+
+**Nested Loop Join**
+
+*注意：* `Nest Loop Join` 同时支持等值的和非等值的 join 条件。
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- 对表 t1 使用 `Nest Loop Join` 策略， 则表 t1 会被当作 join 的 build 端。
+SELECT /*+ NEST_LOOP(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- 对表 t1 和 t3 指定 hash shuffle join 策略。
+-- 对于这种情况，当 t1 和 t2 join 时， 表 t1 会被当作 join 的 build 端，
+-- 然后，当 t1 与 t2 join 后的结果与 t3 join 时，则 t3 会被当作 join 的 build 端。
+SELECT /*+ NEST_LOOP(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+```
+
+**Join Hints 使用中的冲突**
+
+对于一些 join hints 语法在使用时会产生冲突，我们在 Flink 中规范了冲突时会产生的行为。
+ - 视图(View)
+ - query 中存在别名(alias)
+ - 同一种 join 策略中可能的冲突
+ - 不同 join 策略间可能的冲突
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+
+
+-- 视图(View) 案例
+
+CREATE VIEW view1 AS SELECT t1.* FROM t1 JOIN ON t2 ON t1.id = t2.id;
+
+-- 对于下述视图案例，指定的 join 策略不会在 view1 内的 join 中生效，所以 view1 中的 join 会使用默认的 join 策略。
+SELECT /*+ BROADCAST(t1) */ * FROM view1 JOIN t3 ON view1.id = t3.id;
+
+-- 对于下述的视图案例，指定的 join 策略不会在 view1 内的 join 中生效，但在 view1 与 t1 join 时会生效， t1 会被作为 
broadcast 的 join 端。
+SELECT /*+ BROADCAST(t1) */ * FROM view1 JOIN t1 ON view1.id = t1.id;
+
+
+
+-- query 中存在别名(alias)
+
+-- 对于存在别名的例子，会抛出 alias_table 不存在的异常。
+SELECT /*+ BROADCAST(alias_table) */ * FROM (SELECT * FROM t1) alias_table 
JOIN t2 on alias_table.id = t2.id;
+
+-- 对于上述别名的例子，可以使用 view 来同等的替代，使 join hints 指定的 join 策略生效。

Review Comment:
   ditto



##########
docs/content/docs/dev/table/sql/queries/hints.md:
##########
@@ -84,4 +84,222 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join hints allow users to manually choose the join strategy in optimizer, and 
manually optimize the execution plan to improve the performance of the query. 
Now only support in batch mode.
+
+### Join Hints Strategies
+The following Join strategies are currently supported in Flink SQL for batch 
job:
+
+
+**BroadCast Join**
+
+
+In this Join strategy, the data on the build side (usually a small table) will 
be broadcast to each downstream operator, 
+and the data on the probe side (usually a large table) will be sent directly 
to the downstream operator with Forward. 
+Then the data on the build side (small table) will be built into a Hash Table 
for the probe side to query.
+
+**Hash Shuffle Join**
+
+
+In this Join strategy, the data on the Build side and the Probe side will be 
shuffled with the join key, and the data 
+with the same key will be distributed to the same downstream operator. Then 
the data on the build side (smaller table) 
+will be built into a Hash Table for the probe side to query.
+
+
+Compared with the `Hash Shuffle Join` strategy, the `BroadCast Join` strategy 
does not need to shuffle the probe side, which saves a lot of 
+shuffle time. Therefore, when a table is tiny, the `BroadCast Join` strategy 
is usually selected to avoid the shuffle 
+cost and improve computing performance. However, when the scale of the small 
table is large, the `BroadCast Join` strategy 
+is not applicable, because the overhead of redundant data transmission will 
exceed the overhead of shuffle.
+
+**Sort Merge Join**
+
+This Join strategy is aimed at the scenario of joining between two large 
tables or the scenario that the data at both 
+sides of the join is already in order. This strategy first shuffles the data 
on both sides of the join to the downstream 
+operator according to the Join Key. Then the downstream operator sorts the 
data before joining, and finally joins the data at both ends. 
+This strategy eliminates the need to load all data on one side into memory, 
thus relieving the pressure on computational memory.
+
+**Nested Loop Join**
+
+In this Join strategy, the probe side is used as an Outer Loop, and the build 
side is used as an Inner Loop, and the data is joined through two-layer loops.
+
+### Syntax
+The join hints syntax in Flink:
+```sql
+# query:
+SELECT /*+ hint_content[, hint_content] */ ...
+
+hint_content:
+    hint_strategy_name(hint_item[, hint_item])
+
+hint_strategy_name:
+    supported_join_hint_name
+
+hint_item:
+    string_literal
+```
+
+### Examples
+
+**BroadCast Join**
+
+
+*NOTE:* `BroadCast Join` strategy only supports join with equivalence join 
condition. And it not supports `Full Outer Join` join type.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Select broadcast join strategy with t1, t1 will be the broadcast table.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Select broadcast join strategy with t1 and t3.
+-- For this case, when join between t1 and t2, t1 will be the broadcast table,
+-- after that, when join between the result after t1 joins t2 and t3, t3 will 
be the broadcast table.
+SELECT /*+ BROADCAST(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+
+-- BroadCast join strategy don't support non-equivalent join conditions.
+-- For this case, default join strategy will be adopted instead of broadCast 
join strategy.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+
+-- BroadCast join strategy don't support full outer join type.

Review Comment:
   'type' can be removed



##########
docs/content/docs/dev/table/sql/queries/hints.md:
##########
@@ -84,4 +84,222 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join hints allow users to manually choose the join strategy in optimizer, and 
manually optimize the execution plan to improve the performance of the query. 
Now only support in batch mode.
+
+### Join Hints Strategies
+The following Join strategies are currently supported in Flink SQL for batch 
job:
+
+
+**BroadCast Join**
+
+
+In this Join strategy, the data on the build side (usually a small table) will 
be broadcast to each downstream operator, 
+and the data on the probe side (usually a large table) will be sent directly 
to the downstream operator with Forward. 
+Then the data on the build side (small table) will be built into a Hash Table 
for the probe side to query.
+
+**Hash Shuffle Join**
+
+
+In this Join strategy, the data on the Build side and the Probe side will be 
shuffled with the join key, and the data 
+with the same key will be distributed to the same downstream operator. Then 
the data on the build side (smaller table) 
+will be built into a Hash Table for the probe side to query.
+
+
+Compared with the `Hash Shuffle Join` strategy, the `BroadCast Join` strategy 
does not need to shuffle the probe side, which saves a lot of 
+shuffle time. Therefore, when a table is tiny, the `BroadCast Join` strategy 
is usually selected to avoid the shuffle 
+cost and improve computing performance. However, when the scale of the small 
table is large, the `BroadCast Join` strategy 
+is not applicable, because the overhead of redundant data transmission will 
exceed the overhead of shuffle.
+
+**Sort Merge Join**
+
+This Join strategy is aimed at the scenario of joining between two large 
tables or the scenario that the data at both 
+sides of the join is already in order. This strategy first shuffles the data 
on both sides of the join to the downstream 
+operator according to the Join Key. Then the downstream operator sorts the 
data before joining, and finally joins the data at both ends. 
+This strategy eliminates the need to load all data on one side into memory, 
thus relieving the pressure on computational memory.
+
+**Nested Loop Join**
+
+In this Join strategy, the probe side is used as an Outer Loop, and the build 
side is used as an Inner Loop, and the data is joined through two-layer loops.
+
+### Syntax
+The join hints syntax in Flink:
+```sql
+# query:
+SELECT /*+ hint_content[, hint_content] */ ...
+
+hint_content:
+    hint_strategy_name(hint_item[, hint_item])
+
+hint_strategy_name:
+    supported_join_hint_name
+
+hint_item:
+    string_literal
+```
+
+### Examples
+
+**BroadCast Join**
+
+
+*NOTE:* `BroadCast Join` strategy only supports join with equivalence join 
condition. And it not supports `Full Outer Join` join type.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Select broadcast join strategy with t1, t1 will be the broadcast table.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Select broadcast join strategy with t1 and t3.
+-- For this case, when join between t1 and t2, t1 will be the broadcast table,
+-- after that, when join between the result after t1 joins t2 and t3, t3 will 
be the broadcast table.
+SELECT /*+ BROADCAST(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+
+-- BroadCast join strategy don't support non-equivalent join conditions.
+-- For this case, default join strategy will be adopted instead of broadCast 
join strategy.

Review Comment:
   'broadCast' > 'BroadCast'.  And the following noun should also be rewrited



##########
docs/content/docs/dev/table/sql/queries/hints.md:
##########
@@ -84,4 +84,219 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join hints allow users to suggest the join strategy to optimizer in order to 
get a more optimal execution plan. 
+Currently, only supported in batch mode.
+
+### Supported Join Hints
+The following Join strategies are currently supported in Flink SQL for batch 
mode:
+
+
+**BroadCast Join**
+
+In this Join strategy, the data on the build side (ideally the smaller table) 
will be broadcast to each downstream operator, 
+and the data on the probe side (usually a large table) will be sent directly 
to the downstream operator with Forward. 
+Then the data on the build side (small table) will be built into a Hash Table 
for the probe side to query.
+
+**Hash Shuffle Join**
+
+In this Join strategy, the data on the Build side and the Probe side will be 
shuffled with the join key, and the data 
+with the same key will be distributed to the same downstream operator. Then 
the data on the build side (smaller table) 
+will be built into a Hash Table for the probe side to query.
+
+
+Compared with the `Hash Shuffle Join` strategy, the `BroadCast Join` strategy 
does not need to shuffle the probe side, which saves a lot of 
+shuffle time. Therefore, when a table is tiny, the `BroadCast Join` strategy 
is usually selected to avoid the shuffle 
+cost and improve computing performance. However, when the scale of the small 
table is large, the `BroadCast Join` strategy 
+is not applicable, because the overhead of redundant data transmission will 
exceed the overhead of shuffle.
+
+**Sort Merge Join**
+
+This Join strategy is aimed at the scenario of joining between two large 
tables or the scenario that the data at both 
+sides of the join is already in order. This strategy first shuffles the data 
on both sides of the join to the downstream 
+operator according to the Join Key. Then the downstream operator sorts the 
data before joining, and finally joins the data at both ends. 
+This strategy eliminates the need to load all data on one side into memory, 
thus relieving the pressure on computational memory.
+
+**Nested Loop Join**
+
+In this Join strategy, the probe side is used as an Outer Loop, and the build 
side is used as an Inner Loop, and the data is joined through two-layer loops.
+
+### Syntax
+The join hints syntax in Flink:
+```sql
+# query:
+SELECT /*+ hint_content[, hint_content] */ ...
+
+hint_content:
+    hint_strategy_name(hint_item[, hint_item])
+
+hint_strategy_name:
+    supported_join_hint_name
+
+hint_item:
+    string_literal
+```
+
+### Examples
+
+**BroadCast Join**
+
+*NOTE:* `BroadCast Join` strategy only supports join with equivalence join 
condition. And it not supports `Full Outer Join` join type.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Select broadcast join strategy with t1, t1 will be the broadcast table.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Select broadcast join strategy with t1 and t3.
+-- For this case, when join between t1 and t2, t1 will be the broadcast table,
+-- after that, when join between the result after t1 joins t2 and t3, t3 will 
be the broadcast table.
+SELECT /*+ BROADCAST(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+
+-- BroadCast join strategy don't support non-equivalent join conditions.
+-- For this case, default join strategy will be adopted instead of broadCast 
join strategy.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+
+-- BroadCast join strategy don't support full outer join type.
+-- For this case, default join strategy will be adopted instead of broadCast 
join strategy.
+SELECT /*+ BROADCAST(t1) */ * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;
+```
+
+
+**Hash Shuffle Join**
+
+*NOTE:* `Hash Shuffle Join` strategy only supports join with equivalence join 
condition.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Select hash shuffle join strategy with t1, t1 will be the build side.
+SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Select hash shuffle join strategy with t1 and t3.
+-- For this case, when join between t1 and t2, t1 will be the build side,
+-- after that, when join between the result after t1 joins t2 and t3, t3 will 
be the build side.
+SELECT /*+ SHUFFLE_HASH(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 
ON t1.id = t3.id;
+
+-- Hash shuffle join strategy don't support non-equivalent join conditions.
+-- For this case, default join strategy will be adopted instead of hash 
shuffle join strategy.
+SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+```
+
+**Sort Merge Join**
+
+
+*NOTE:* `Sort Merge Join` strategy only supports join with equivalence join 
condition.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Sort merge join is adopted.
+SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Sort merge join is both adopted in these two joins.
+SELECT /*+ SHUFFLE_MERGE(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 
ON t1.id = t3.id;
+
+-- Sort merge join strategy don't support non-equivalent join conditions.
+-- For this case, default join strategy will be adopted instead of sort merge 
join strategy.
+SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+```
+
+
+**Nested Loop Join**
+
+*NOTE:* `Nest Loop Join` support both equivalent and non-equivalent join 
condition.
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- Select nest loop join strategy with t1, t1 will be the build side.
+SELECT /*+ NEST_LOOP(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Select nest loop join strategy with t1 and t3.
+-- For this case, when join between t1 and t2, t1 will be the build side,
+-- after that, when join between the result after t1 joins t2 and t3, t3 will 
be the build side.
+SELECT /*+ NEST_LOOP(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+```
+
+**Conflict Cases In Join Hints**
+
+For some cases, join hints will conflict, we formulate the behavior of join 
hint in Flink for the following conditions:
+ - View
+ - Sub query with table alias
+ - Conflict in one same join hint strategy
+ - Conflict in different join hint strategies
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+
+
+-- View Case
+
+CREATE VIEW view1 AS SELECT t1.* FROM t1 JOIN ON t2 ON t1.id = t2.id;
+
+-- For this view case, broadCast join strategy will not be propagated into 
view1, and join inside view1 will use default join strategy.
+SELECT /*+ BROADCAST(t1) */ * FROM view1 JOIN t3 ON view1.id = t3.id;
+
+-- For this view case, the join inside view1 will use default join strategy, 
and join between view1 and t1 will use broadCast join strategy.
+SELECT /*+ BROADCAST(t1) */ * FROM view1 JOIN t1 ON view1.id = t1.id;
+
+
+
+-- Sub Query with Table Alias Case
+
+-- For this alias case, an exception that the alias_table not existent will be 
thrown.
+SELECT /*+ BROADCAST(alias_table) */ * FROM (SELECT * FROM t1) alias_table 
JOIN t2 on alias_table.id = t2.id;
+
+-- To avoid the above alias case, user need use view instead of table alias. 
Solution:

Review Comment:
   this part can be removed



##########
docs/content/docs/dev/table/sql/queries/hints.md:
##########
@@ -84,4 +84,222 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join hints allow users to manually choose the join strategy in optimizer, and 
manually optimize the execution plan to improve the performance of the query. 
Now only support in batch mode.
+
+### Join Hints Strategies
+The following Join strategies are currently supported in Flink SQL for batch 
job:
+
+
+**BroadCast Join**
+
+
+In this Join strategy, the data on the build side (usually a small table) will 
be broadcast to each downstream operator, 
+and the data on the probe side (usually a large table) will be sent directly 
to the downstream operator with Forward. 
+Then the data on the build side (small table) will be built into a Hash Table 
for the probe side to query.
+
+**Hash Shuffle Join**
+
+
+In this Join strategy, the data on the Build side and the Probe side will be 
shuffled with the join key, and the data 
+with the same key will be distributed to the same downstream operator. Then 
the data on the build side (smaller table) 
+will be built into a Hash Table for the probe side to query.
+
+
+Compared with the `Hash Shuffle Join` strategy, the `BroadCast Join` strategy 
does not need to shuffle the probe side, which saves a lot of 
+shuffle time. Therefore, when a table is tiny, the `BroadCast Join` strategy 
is usually selected to avoid the shuffle 
+cost and improve computing performance. However, when the scale of the small 
table is large, the `BroadCast Join` strategy 
+is not applicable, because the overhead of redundant data transmission will 
exceed the overhead of shuffle.
+
+**Sort Merge Join**
+
+This Join strategy is aimed at the scenario of joining between two large 
tables or the scenario that the data at both 
+sides of the join is already in order. This strategy first shuffles the data 
on both sides of the join to the downstream 
+operator according to the Join Key. Then the downstream operator sorts the 
data before joining, and finally joins the data at both ends. 
+This strategy eliminates the need to load all data on one side into memory, 
thus relieving the pressure on computational memory.
+
+**Nested Loop Join**
+
+In this Join strategy, the probe side is used as an Outer Loop, and the build 
side is used as an Inner Loop, and the data is joined through two-layer loops.
+
+### Syntax
+The join hints syntax in Flink:
+```sql
+# query:
+SELECT /*+ hint_content[, hint_content] */ ...
+
+hint_content:
+    hint_strategy_name(hint_item[, hint_item])
+
+hint_strategy_name:
+    supported_join_hint_name
+
+hint_item:
+    string_literal
+```
+
+### Examples
+
+**BroadCast Join**
+
+
+*NOTE:* `BroadCast Join` strategy only supports join with equivalence join 
condition. And it not supports `Full Outer Join` join type.

Review Comment:
   "And it not supports `Full Outer Join` join type." -> "And it doesn't 
support `Full Outer Join`"



##########
docs/content.zh/docs/dev/table/sql/queries/hints.md:
##########
@@ -79,4 +79,209 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join Hints 特性允许用户手动的指定表 join 时使用的 join 策略来达到优化执行的目标，该特性只能在批模式(Batch mode)中执行。
+
+### Join Hints 策略
+在批模式下， Flink 现在支持以下的几种 join 策略：
+
+
+**BroadCast Join**
+
+对于这种 Join 策略， 位于 join build 端(通常为小表)的数据会被广播到每一个下游的算子中，同时，位于 join probe 
端(通常为大表) 会使用 Forward 的
+方式发送给下游的算子。然后，位于 join build 端的数据会被放入一个 hash table 中提供给 probe 端的表去查询。
+
+**Hash Shuffle Join**
+
+对于这种 Join 策略， 位于 join build 端和 位于 join probe 端的数据都会按照 join key 进行 shuffle， 
拥有相同 join key 的数据会被分配到下游的同一个算子中。
+然后，位于 join build 端的数据会被放入一个 hash table 中提供给 probe 端的表去查询。
+
+相较于 `Hash Shuffle Join`，`BroadCast Join` 这种 join 策略不需要同时 shuffle probe 
端，这样会节约很多的时间。所以，如果 build 端的表很小的话，我们通常
+会选择 `BroadCast Join` 策略来避免 shuffle 带来的时间损耗并提高 join 的性能。 相反，当 build 
端的小表的数据量较大时， 我们则不会选用`BroadCast Join` 策略，
+因为大量的数据采用广播的方式带来的成本远大于 shuffle。
+
+**Sort Merge Join**
+
+该 join 策略主要是用于两个大表进行 join 的场景或者 join 的两端的数据已完成排序的场景。该策略首先会根据 join key 来 
shuffle 两端的数据到下游的算子。
+然后，下游的算子在 join 前会提前将数据进行排序。 最后，会完成两张表的 
join。该策略消除了将一侧的所有数据加载到内存中的需要，从而减轻了计算内存的压力。

Review Comment:
   this expression can be improved



##########
docs/content.zh/docs/dev/table/sql/queries/hints.md:
##########
@@ -79,4 +79,209 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join Hints 特性允许用户手动的指定表 join 时使用的 join 策略来达到优化执行的目标，该特性只能在批模式(Batch mode)中执行。
+
+### Join Hints 策略
+在批模式下， Flink 现在支持以下的几种 join 策略：
+
+
+**BroadCast Join**
+
+对于这种 Join 策略， 位于 join build 端(通常为小表)的数据会被广播到每一个下游的算子中，同时，位于 join probe 
端(通常为大表) 会使用 Forward 的
+方式发送给下游的算子。然后，位于 join build 端的数据会被放入一个 hash table 中提供给 probe 端的表去查询。
+
+**Hash Shuffle Join**
+
+对于这种 Join 策略， 位于 join build 端和 位于 join probe 端的数据都会按照 join key 进行 shuffle， 
拥有相同 join key 的数据会被分配到下游的同一个算子中。
+然后，位于 join build 端的数据会被放入一个 hash table 中提供给 probe 端的表去查询。
+
+相较于 `Hash Shuffle Join`，`BroadCast Join` 这种 join 策略不需要同时 shuffle probe 
端，这样会节约很多的时间。所以，如果 build 端的表很小的话，我们通常
+会选择 `BroadCast Join` 策略来避免 shuffle 带来的时间损耗并提高 join 的性能。 相反，当 build 
端的小表的数据量较大时， 我们则不会选用`BroadCast Join` 策略，
+因为大量的数据采用广播的方式带来的成本远大于 shuffle。
+
+**Sort Merge Join**
+
+该 join 策略主要是用于两个大表进行 join 的场景或者 join 的两端的数据已完成排序的场景。该策略首先会根据 join key 来 
shuffle 两端的数据到下游的算子。
+然后，下游的算子在 join 前会提前将数据进行排序。 最后，会完成两张表的 
join。该策略消除了将一侧的所有数据加载到内存中的需要，从而减轻了计算内存的压力。
+
+**Nested Loop Join**
+
+对于这种 Join 策略，join 的 probe 端被用作 outer loop，同时，join 的 build 端被用作 inner loop， 
数据会通过两层的 loops 来 join 数据。
+
+
+### 语法
+Flink 中的 join hints 语法：
+```sql
+# query:
+SELECT /*+ hint_content[, hint_content] */ ...
+
+hint_content:
+    hint_strategy_name(hint_item[, hint_item])
+
+hint_strategy_name:
+    supported_join_hint_name
+
+hint_item:
+    string_literal
+```
+
+### 示例
+
+**BroadCast Join**
+
+*注意：* `BroadCast Join` 策略只支持等值的 join 条件， 同时，也不支持 `Full Outer Join` 的 join 类型。
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- 对表 t1 指定 broadcast join 策略, 则表 t1 会被当作需 broadcast 的小表。
+SELECT /*+ BROADCAST(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- 对表 t1 和 t3 指定 broadcast join 策略。
+-- 对于这种情况，当 t1 和 t2 join 时， 表 t1 会被当作需 broadcast 的小表，
+-- 然后，当 t1 与 t2 join 后的结果与 t3 join 时，则 t3 会被当作需 broadcast 的小表。
+SELECT /*+ BROADCAST(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+
+-- `BroadCast Join` 策略只支持等值的 join 条件，
+-- 对于这种情况，由于使用了非等值的 join 条件，则 join hints 失效，会使用默认的 join 策略。
+SELECT /*+ BROADCAST(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+
+-- `BroadCast Join` 策略不支持 `Full Outer Join` 的 join 类型
+-- 对于这种情况，由于是 `Full Outer Join`，则 join hints 失效，会使用默认的 join 策略。
+SELECT /*+ BROADCAST(t1) */ * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;
+```
+
+
+**Hash Shuffle Join**
+
+*注意：* `Hash Shuffle Join` 只支持等值的 join 条件。
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- 对表 t1 使用 `Hash Shuffle Join` 策略， 则表 t1 会被当作 join 的 build 端。
+SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- 对表 t1 和 t3 指定 hash shuffle join 策略。
+-- 对于这种情况，当 t1 和 t2 join 时， 表 t1 会被当作 join 的 build 端，
+-- 然后，当 t1 与 t2 join 后的结果与 t3 join 时，则 t3 会被当作 join 的 build 端。
+SELECT /*+ SHUFFLE_HASH(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 
ON t1.id = t3.id;
+
+-- `Hash Shuffle Join` 策略只支持等值的 join 条件，
+-- 对于这种情况，由于使用了非等值的 join 条件，则 join hints 失效，会使用默认的 join 策略。
+SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+```
+
+**Sort Merge Join**
+
+*注意：* `Sort Merge Join` 只支持等值的 join 条件。
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- 会使用 Sort merge join。
+SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- Sort merge join 会使用在两次的 join 中。
+SELECT /*+ SHUFFLE_MERGE(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 
ON t1.id = t3.id;
+
+-- `Sort Merge Join` 策略只支持等值的 join 条件，
+-- 对于这种情况，由于使用了非等值的 join 条件，则 join hints 失效，会使用默认的 join 策略。
+SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 join t2 ON t1.id > t2.id;
+```
+
+
+**Nested Loop Join**
+
+*注意：* `Nest Loop Join` 同时支持等值的和非等值的 join 条件。
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+-- 对表 t1 使用 `Nest Loop Join` 策略， 则表 t1 会被当作 join 的 build 端。
+SELECT /*+ NEST_LOOP(t1) */ * FROM t1 JOIN t2 ON t1.id = t2.id;
+
+-- 对表 t1 和 t3 指定 hash shuffle join 策略。
+-- 对于这种情况，当 t1 和 t2 join 时， 表 t1 会被当作 join 的 build 端，
+-- 然后，当 t1 与 t2 join 后的结果与 t3 join 时，则 t3 会被当作 join 的 build 端。
+SELECT /*+ NEST_LOOP(t1, t3) */ * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON 
t1.id = t3.id;
+```
+
+**Join Hints 使用中的冲突**
+
+对于一些 join hints 语法在使用时会产生冲突，我们在 Flink 中规范了冲突时会产生的行为。
+ - 视图(View)
+ - query 中存在别名(alias)
+ - 同一种 join 策略中可能的冲突
+ - 不同 join 策略间可能的冲突
+```sql
+CREATE TABLE t1 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t2 (id BIGINT, name STRING, age INT) WITH (...);
+CREATE TABLE t3 (id BIGINT, name STRING, age INT) WITH (...);
+
+
+
+-- 视图(View) 案例
+
+CREATE VIEW view1 AS SELECT t1.* FROM t1 JOIN ON t2 ON t1.id = t2.id;
+
+-- 对于下述视图案例，指定的 join 策略不会在 view1 内的 join 中生效，所以 view1 中的 join 会使用默认的 join 策略。
+SELECT /*+ BROADCAST(t1) */ * FROM view1 JOIN t3 ON view1.id = t3.id;
+
+-- 对于下述的视图案例，指定的 join 策略不会在 view1 内的 join 中生效，但在 view1 与 t1 join 时会生效， t1 会被作为 
broadcast 的 join 端。
+SELECT /*+ BROADCAST(t1) */ * FROM view1 JOIN t1 ON view1.id = t1.id;
+
+
+
+-- query 中存在别名(alias)
+
+-- 对于存在别名的例子，会抛出 alias_table 不存在的异常。

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xuyangzhong commented on a diff in pull request #20513: [FLINK-28858][docs] Add document to describe join hints for batch sql

Reply via email to