[jira] [Commented] (FLINK-30951) Release Testing: Verify FLINK-29635 Hive sink should support merge files in batch mode

Shengkai Fang (Jira) Mon, 20 Feb 2023 01:57:08 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-30951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691112#comment-17691112
 ]


Shengkai Fang commented on FLINK-30951:
---------------------------------------

1. I submit a job to write records data with/without auto-compaction. With 
auto-compaction, the records are merged into a target file. 
2. I adjust the target file size and submit again, the compacted file is as 
expected. 
3. I set the compacter parallelism and the sink parallelism 8, but it only 
works for the compacter. 
4. I adjust the avg size of the file with a smaller value, the compaction 
doesn't happens.


About the case 3: the graph in Flink WebUI

 !screenshot-1.png! 


> Release Testing: Verify FLINK-29635 Hive sink should support merge files in 
> batch mode
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-30951
>                 URL: https://issues.apache.org/jira/browse/FLINK-30951
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Connectors / Hive
>            Reporter: luoyuxia
>            Assignee: Shengkai Fang
>            Priority: Blocker
>             Fix For: 1.17.0
>
>         Attachments: screenshot-1.png
>
>
> The issue aims to verfiy FLINK-29635.
> Please verify in batch mode, the document is in 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#file-compaction]:
>  
> 1: enable auto-compaction, write some data to a Hive table which results in 
> the average size of files is less than compaction.small-files.avg-size(16MB 
> by default), verfiy these files should be merged.
> 2:  enable auto-compaction, set compaction.small-files.avg-size to a smaller 
> values, then write some data to a Hive table which results in the average 
> size of files is greater thant the compaction.small-files.avg-size, verfiy 
> these files shouldn't be merged.
> 3. set sink.parallelism manually, check the parallelism of the compact 
> operator is equal to sink.parallelism.
> 4. set compaction.parallelism manually, check the parallelism of the compact 
> operator is equal to compaction.parallelism.
> 5. set compaction.file-size, check the size of the each target file merged is 
> about the `compaction.file-size`.
>  
> We shoud verify it with writing non-partitioned table, static partition 
> table, dynamic partition table.
> We can find the example sql for how to create & write hive table in the 
> codebase  
> [HiveTableCompactSinkITCase|[https://github.com/apache/flink/blob/0915c9850d861165e283acc0f60545cd836f0567/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveTableCompactSinkITCase.java]].
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30951) Release Testing: Verify FLINK-29635 Hive sink should support merge files in batch mode

Reply via email to