[ 
https://issues.apache.org/jira/browse/FLINK-27708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556682#comment-17556682
 ] 

Jane Chan commented on FLINK-27708:
-----------------------------------

* Why
 ** Checkpoints will interfere with the writer, which forces the writer close, 
and thus the generated files may not meet the target file size.

                  !image-2022-06-21-14-59-59-593.png|width=502,height=149!
 * How
 ** Since the append-only table does not define a key, the compaction should be 
based on the sequence number to keep orderliness. 
 ** We could introduce an asynchronized task to collect previously committed 
files whose sizes are less than the target file size, sort files by min/max seq 
number, and then perform a concatenation rewrite. And during the prepare commit 
phase, the compacted files (if available) can be submitted along with the newly 
written files.

 

Please assign this ticket to me, cc [~lzljs3620320], thanks!

> Add background compaction task for append-only table when ingesting.
> --------------------------------------------------------------------
>
>                 Key: FLINK-27708
>                 URL: https://issues.apache.org/jira/browse/FLINK-27708
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Zheng Hu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: table-store-0.2.0
>
>         Attachments: image-2022-06-21-14-59-59-593.png
>
>
> We could still execute compaction task to merge small files in the background 
> for append-only table.
> This compaction is just to avoid a lot of small files.
> Its purpose is similar to that of filesystem compaction: 
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/#file-compaction



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to