[ https://issues.apache.org/jira/browse/FLINK-30951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691112#comment-17691112 ]
Shengkai Fang commented on FLINK-30951: --------------------------------------- 1. I submit a job to write records data with/without auto-compaction. With auto-compaction, the records are merged into a target file. 2. I adjust the target file size and submit again, the compacted file is as expected. 3. I set the compacter parallelism and the sink parallelism 8, but it only works for the compacter. 4. I adjust the avg size of the file with a smaller value, the compaction doesn't happens. About the case 3: the graph in Flink WebUI !screenshot-1.png! > Release Testing: Verify FLINK-29635 Hive sink should support merge files in > batch mode > -------------------------------------------------------------------------------------- > > Key: FLINK-30951 > URL: https://issues.apache.org/jira/browse/FLINK-30951 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive > Reporter: luoyuxia > Assignee: Shengkai Fang > Priority: Blocker > Fix For: 1.17.0 > > Attachments: screenshot-1.png > > > The issue aims to verfiy FLINK-29635. > Please verify in batch mode, the document is in > [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#file-compaction]: > > 1: enable auto-compaction, write some data to a Hive table which results in > the average size of files is less than compaction.small-files.avg-size(16MB > by default), verfiy these files should be merged. > 2: enable auto-compaction, set compaction.small-files.avg-size to a smaller > values, then write some data to a Hive table which results in the average > size of files is greater thant the compaction.small-files.avg-size, verfiy > these files shouldn't be merged. > 3. set sink.parallelism manually, check the parallelism of the compact > operator is equal to sink.parallelism. > 4. set compaction.parallelism manually, check the parallelism of the compact > operator is equal to compaction.parallelism. > 5. set compaction.file-size, check the size of the each target file merged is > about the `compaction.file-size`. > > We shoud verify it with writing non-partitioned table, static partition > table, dynamic partition table. > We can find the example sql for how to create & write hive table in the > codebase > [HiveTableCompactSinkITCase|[https://github.com/apache/flink/blob/0915c9850d861165e283acc0f60545cd836f0567/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveTableCompactSinkITCase.java]]. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)