[ 
https://issues.apache.org/jira/browse/FLINK-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061809#comment-17061809
 ] 

Piyush Narang commented on FLINK-11499:
---------------------------------------

Adding a couple of questions we had based on an offline thread with Piotr. For 
the WAL implementation, were you thinking of leveraging delta-lake (and thus 
making a dependency on Spark) or potentially implementing this from scratch? An 
alternative idea that could be to try and merge the Parquet files in a given 
bucket on every checkpoint.

> Extend StreamingFileSink BulkFormats to support arbitrary roll policies
> -----------------------------------------------------------------------
>
>                 Key: FLINK-11499
>                 URL: https://issues.apache.org/jira/browse/FLINK-11499
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>            Reporter: Seth Wiesman
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>
> Currently when using the StreamingFilleSink Bulk-encoding formats can only be 
> combined with the `OnCheckpointRollingPolicy`, which rolls the in-progress 
> part file on every checkpoint.
> However, many bulk formats such as parquet are most efficient when written as 
> large files; this is not possible when frequent checkpointing is enabled. 
> Currently the only work-around is to have long checkpoint intervals which is 
> not ideal.
>  
> The StreamingFileSink should be enhanced to support arbitrary roll policy's 
> so users may write large bulk files while retaining frequent checkpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to