lsyldliu commented on code in PR #21789:
URL: https://github.com/apache/flink/pull/21789#discussion_r1098208433


##########
docs/content.zh/docs/connectors/table/hive/hive_functions.md:
##########
@@ -73,6 +73,34 @@ Some Hive built-in functions in older versions have [thread 
safety issues](https
 We recommend users patch their own Hive to fix them.
 {{< /hint >}}
 
+## Use Native Hive Aggregate Functions
+
+If [HiveModule]({{< ref "docs/dev/table/modules" >}}#hivemodule) is loaded 
with a higher priority than CoreModule, Flink will try to use the Hive built-in 
function first. And then for Hive built-in aggregation function,
+Flink currently uses sort-based aggregation strategy. Compared to hash-based 
aggregation strategy, the performance is one to two times worse, so from Flink 
1.17, we have implemented some of Hive's aggregation functions natively in 
Flink.
+These functions will use the hash-agg strategy to improve performance. 
Currently, only five functions are supported, namely sum/count/avg/min/max, and 
more aggregation functions will be supported in the future.
+Users can use the native aggregation function by turning on the option 
`table.exec.hive.native-agg-function.enabled`, which brings significant 
performance improvement to the job.
+
+<table class="table table-bordered">
+  <thead>
+    <tr>
+        <th class="text-left" style="width: 20%">Key</th>
+        <th class="text-left" style="width: 15%">Default</th>
+        <th class="text-left" style="width: 10%">Type</th>
+        <th class="text-left" style="width: 55%">Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+        <td><h5>table.exec.hive.native-agg-function.enabled</h5></td>
+        <td style="word-wrap: break-word;">false</td>
+        <td>Boolean</td>
+        <td>Enabling to use native aggregate function which use hash-agg 
strategy that can improve the aggregation performance after loading HiveModule. 
This is a job-level option, user can enable it per-job.</td>
+    </tr>
+  </tbody>
+</table>
+
+<span class="label label-danger">Attention</span> The ability of the native 
aggregate functions don't fully align with Hive built-in aggregation functions 
now, for example, some data types are not supported. If performance is not a 
bottleneck, you don't need to turn on this option.

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to