[GitHub] [flink] swuferhong commented on a diff in pull request #20513: [FLINK-28858][docs] Add document to describe join hints for batch sql

GitBox Tue, 16 Aug 2022 19:21:01 -0700


swuferhong commented on code in PR #20513:
URL: https://github.com/apache/flink/pull/20513#discussion_r947390796



##########
docs/content.zh/docs/dev/table/sql/queries/hints.md:
##########
@@ -79,4 +79,209 @@ insert into kafka_table1 /*+ 
OPTIONS('sink.partitioner'='round-robin') */ select
 
 ```
 
+## Join Hints
+
+{{< label Batch >}}
+
+Join Hints 特性允许用户手动的指定表 join 时使用的 join 策略来达到优化执行的目标，该特性只能在批模式(Batch mode)中执行。
+
+### Join Hints 策略
+在批模式下， Flink 现在支持以下的几种 join 策略：
+
+
+**BroadCast Join**
+
+对于这种 Join 策略， 位于 join build 端(通常为小表)的数据会被广播到每一个下游的算子中，同时，位于 join probe 
端(通常为大表) 会使用 Forward 的
+方式发送给下游的算子。然后，位于 join build 端的数据会被放入一个 hash table 中提供给 probe 端的表去查询。
+
+**Hash Shuffle Join**
+
+对于这种 Join 策略， 位于 join build 端和 位于 join probe 端的数据都会按照 join key 进行 shuffle， 
拥有相同 join key 的数据会被分配到下游的同一个算子中。
+然后，位于 join build 端的数据会被放入一个 hash table 中提供给 probe 端的表去查询。
+
+相较于 `Hash Shuffle Join`，`BroadCast Join` 这种 join 策略不需要同时 shuffle probe 
端，这样会节约很多的时间。所以，如果 build 端的表很小的话，我们通常
+会选择 `BroadCast Join` 策略来避免 shuffle 带来的时间损耗并提高 join 的性能。 相反，当 build 
端的小表的数据量较大时， 我们则不会选用`BroadCast Join` 策略，
+因为大量的数据采用广播的方式带来的成本远大于 shuffle。
+
+**Sort Merge Join**
+
+该 join 策略主要是用于两个大表进行 join 的场景或者 join 的两端的数据已完成排序的场景。该策略首先会根据 join key 来 
shuffle 两端的数据到下游的算子。
+然后，下游的算子在 join 前会提前将数据进行排序。 最后，会完成两张表的 
join。该策略消除了将一侧的所有数据加载到内存中的需要，从而减轻了计算内存的压力。

Review Comment:
   > this expression can be improved
   
   Already deleted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] swuferhong commented on a diff in pull request #20513: [FLINK-28858][docs] Add document to describe join hints for batch sql

Reply via email to