[GitHub] [flink] luoyuxia commented on a diff in pull request #21703: [FLINK-29880][hive] Introduce auto compaction for Hive sink in batch mode

GitBox Thu, 19 Jan 2023 07:13:30 -0800


luoyuxia commented on code in PR #21703:
URL: https://github.com/apache/flink/pull/21703#discussion_r1081412503



##########
docs/content/docs/connectors/table/hive/hive_read_write.md:
##########
@@ -558,6 +558,70 @@ use more threads to speed the gathering.
 **NOTE:**
 - Only `BATCH` mode supports to auto gather statistic, `STREAMING` mode 
doesn't support it yet.
 
+### File Compaction
+
+The Hive sink also supports file compactions, which allows applications to 
reduce the number of files generated while writing into Hive.
+
+#### Stream Mode
+
+In stream mode, the behavior is same to `FileSystem` sink. Please refer to 
[File Compaction]({{< ref "docs/connectors/table/filesystem" 
>}}#file-compaction) for more details.
+
+#### Batch Mode
+
+When it's in batch mode and auto compaction is enabled, after finishing 
writing files, Flink will calculate the average size of written files for each 
partition. And if the average size is less than the
+threshold configured, Flink will then try to compact these files to files with 
a target size. The following is the table's options for file compactions.

Review Comment:
   I accpet it except that I still think we should use `a target size` instead 
of `the target size`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] luoyuxia commented on a diff in pull request #21703: [FLINK-29880][hive] Introduce auto compaction for Hive sink in batch mode

Reply via email to