[ https://issues.apache.org/jira/browse/FLINK-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061809#comment-17061809 ]
Piyush Narang commented on FLINK-11499: --------------------------------------- Adding a couple of questions we had based on an offline thread with Piotr. For the WAL implementation, were you thinking of leveraging delta-lake (and thus making a dependency on Spark) or potentially implementing this from scratch? An alternative idea that could be to try and merge the Parquet files in a given bucket on every checkpoint. > Extend StreamingFileSink BulkFormats to support arbitrary roll policies > ----------------------------------------------------------------------- > > Key: FLINK-11499 > URL: https://issues.apache.org/jira/browse/FLINK-11499 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem > Reporter: Seth Wiesman > Priority: Major > Labels: usability > Fix For: 1.11.0 > > > Currently when using the StreamingFilleSink Bulk-encoding formats can only be > combined with the `OnCheckpointRollingPolicy`, which rolls the in-progress > part file on every checkpoint. > However, many bulk formats such as parquet are most efficient when written as > large files; this is not possible when frequent checkpointing is enabled. > Currently the only work-around is to have long checkpoint intervals which is > not ideal. > > The StreamingFileSink should be enhanced to support arbitrary roll policy's > so users may write large bulk files while retaining frequent checkpoints. -- This message was sent by Atlassian Jira (v8.3.4#803005)