Hi, sorry for late reply. 1: If the exisiting OnCheckpointRollingPolicy can't meet your requirement, you can customize a RollingPolicy extending CheckpointRollingPolicy.
2: > Why DefaultRollingPolicy don't always roll inProgressParts on checkpoint? I'm also not the creator of `DefaultRollingPolicy`. But I would like to share my thoughts. In whatever filesystem, we hate small files. If default rolling policy will always roll inProgressParts on checkpoint, it'll be like to produce many small files which is harmful. And if don't think it's a problem and still want to roll file on checkpoint, you can still customize your rolling policy. Btw, more exactly, for row-encoded sink output, it'll will use DefaultRollingPolicy by default, for bulk-encoded sink output, it'll use OnCheckpointRollingPolicy. Best regards, Yuxia ----- 原始邮件 ----- 发件人: "Lee, Keith" <lee...@amazon.co.uk.INVALID> 收件人: "dev" <dev@flink.apache.org> 发送时间: 星期二, 2023年 3 月 07日 下午 10:39:44 主题: StreamingFileSink's DefaultRollingPolicy Hi, StreamingFileSink’s DefaultRollingPolicy does not always roll in progress parts to pending parts on checkpoints. Is this by design? It seems counter intuitive to me that part files are not finished on checkpoint by default. https://github.com/ueshin/apache-flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.java#L70-L72 1. This has impact on S3 implementation in that inProgressPart files are MultiPartUploads which can expire<https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html>. When the files do expire, jobs can no longer be started from savepoints as they will run into missing FileNotFoundException on the in progress part file. This leaves users no options but to restart job without savepoint. 2. Having reference to inProgressPart files within savepoints also prevents users from restarting job from earlier savepoints, should the user deem it appropriate to replay the stream and rewrite to output. The exception should be clearer if the intention is to prevent user from starting from earlier savepoint to avoid them accidentally replaying stream (therefore violating end-to-end exactly once). May I propose that we change DefaultRollingPolicy to always roll inProgressParts to pending on checkpoint? Thank you Keith