[ https://issues.apache.org/jira/browse/FLINK-27708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556682#comment-17556682 ]
Jane Chan commented on FLINK-27708: ----------------------------------- * Why ** Checkpoints will interfere with the writer, which forces the writer close, and thus the generated files may not meet the target file size. !image-2022-06-21-14-59-59-593.png|width=502,height=149! * How ** Since the append-only table does not define a key, the compaction should be based on the sequence number to keep orderliness. ** We could introduce an asynchronized task to collect previously committed files whose sizes are less than the target file size, sort files by min/max seq number, and then perform a concatenation rewrite. And during the prepare commit phase, the compacted files (if available) can be submitted along with the newly written files. Please assign this ticket to me, cc [~lzljs3620320], thanks! > Add background compaction task for append-only table when ingesting. > -------------------------------------------------------------------- > > Key: FLINK-27708 > URL: https://issues.apache.org/jira/browse/FLINK-27708 > Project: Flink > Issue Type: Sub-task > Reporter: Zheng Hu > Priority: Major > Labels: pull-request-available > Fix For: table-store-0.2.0 > > Attachments: image-2022-06-21-14-59-59-593.png > > > We could still execute compaction task to merge small files in the background > for append-only table. > This compaction is just to avoid a lot of small files. > Its purpose is similar to that of filesystem compaction: > https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/#file-compaction -- This message was sent by Atlassian Jira (v8.20.7#820007)