luoyuxia commented on code in PR #21703: URL: https://github.com/apache/flink/pull/21703#discussion_r1081412503
########## docs/content/docs/connectors/table/hive/hive_read_write.md: ########## @@ -558,6 +558,70 @@ use more threads to speed the gathering. **NOTE:** - Only `BATCH` mode supports to auto gather statistic, `STREAMING` mode doesn't support it yet. +### File Compaction + +The Hive sink also supports file compactions, which allows applications to reduce the number of files generated while writing into Hive. + +#### Stream Mode + +In stream mode, the behavior is same to `FileSystem` sink. Please refer to [File Compaction]({{< ref "docs/connectors/table/filesystem" >}}#file-compaction) for more details. + +#### Batch Mode + +When it's in batch mode and auto compaction is enabled, after finishing writing files, Flink will calculate the average size of written files for each partition. And if the average size is less than the +threshold configured, Flink will then try to compact these files to files with a target size. The following is the table's options for file compactions. Review Comment: I accpet it except that I still think we should use `a target size` instead of `the target size` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org