[ 
https://issues.apache.org/jira/browse/FLINK-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073381#comment-17073381
 ] 

Sivaprasanna Sethuraman commented on FLINK-11499:
-------------------------------------------------

[~pnowojski] That clarifies my question. Thanks : ) 

I'm +1 for having both approaches in place, as in, those formats which can 
easily be adapted with custom writer implementations satisfying Flink's 
paradigm can have its own way and general formats/writers can use the WAL based 
approach.

And regarding "rolling over both of the streams will need to happen on 
checkpoint", I don't think we can always guarantee that. Some writers which 
come with buffer size/capacity criteria may get full even before the checkpoint 
is triggered. We have to think about this scenario as well, right? May be in 
that case also we can roll over both the streams? but I'm not sure how 
checkpoint will behave in this situation. Correct me, if I'm wrong with the 
understanding here.

> Extend StreamingFileSink BulkFormats to support arbitrary roll policies
> -----------------------------------------------------------------------
>
>                 Key: FLINK-11499
>                 URL: https://issues.apache.org/jira/browse/FLINK-11499
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>            Reporter: Seth Wiesman
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>
> Currently when using the StreamingFilleSink Bulk-encoding formats can only be 
> combined with the `OnCheckpointRollingPolicy`, which rolls the in-progress 
> part file on every checkpoint.
> However, many bulk formats such as parquet are most efficient when written as 
> large files; this is not possible when frequent checkpointing is enabled. 
> Currently the only work-around is to have long checkpoint intervals which is 
> not ideal.
>  
> The StreamingFileSink should be enhanced to support arbitrary roll policy's 
> so users may write large bulk files while retaining frequent checkpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to