[
https://issues.apache.org/jira/browse/FLINK-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073381#comment-17073381
]
Sivaprasanna Sethuraman commented on FLINK-11499:
-------------------------------------------------
[~pnowojski] That clarifies my question. Thanks : )
I'm +1 for having both approaches in place, as in, those formats which can
easily be adapted with custom writer implementations satisfying Flink's
paradigm can have its own way and general formats/writers can use the WAL based
approach.
And regarding "rolling over both of the streams will need to happen on
checkpoint", I don't think we can always guarantee that. Some writers which
come with buffer size/capacity criteria may get full even before the checkpoint
is triggered. We have to think about this scenario as well, right? May be in
that case also we can roll over both the streams? but I'm not sure how
checkpoint will behave in this situation. Correct me, if I'm wrong with the
understanding here.
> Extend StreamingFileSink BulkFormats to support arbitrary roll policies
> -----------------------------------------------------------------------
>
> Key: FLINK-11499
> URL: https://issues.apache.org/jira/browse/FLINK-11499
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem
> Reporter: Seth Wiesman
> Priority: Major
> Labels: usability
> Fix For: 1.11.0
>
>
> Currently when using the StreamingFilleSink Bulk-encoding formats can only be
> combined with the `OnCheckpointRollingPolicy`, which rolls the in-progress
> part file on every checkpoint.
> However, many bulk formats such as parquet are most efficient when written as
> large files; this is not possible when frequent checkpointing is enabled.
> Currently the only work-around is to have long checkpoint intervals which is
> not ideal.
>
> The StreamingFileSink should be enhanced to support arbitrary roll policy's
> so users may write large bulk files while retaining frequent checkpoints.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)