[ 
https://issues.apache.org/jira/browse/FLINK-11116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-11116:
-----------------------------------
    Labels: pull-request-available stale-major  (was: pull-request-available)

> Overwrite outdated in-progress files in StreamingFileSink.
> ----------------------------------------------------------
>
>                 Key: FLINK-11116
>                 URL: https://issues.apache.org/jira/browse/FLINK-11116
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>    Affects Versions: 1.7.0
>            Reporter: Kostas Kloudas
>            Priority: Major
>              Labels: pull-request-available, stale-major
>             Fix For: 1.7.3
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to guarantee exactly-once semantics, the streaming file sink is 
> implementing a two-phase commit protocol when writing files to the filesystem.
> Initially data is written to in-progress files. These files are then put into 
> "pending" state when they are completed (based on the rolling policy), and 
> they are finally committed when the checkpoint that put them in the "pending" 
> state is acknowledged as complete.
> The above shows that in the case that we have:
> 1) checkpoints A, B, C coming 
> 2) checkpoint A being acknowledged and 
> 3) failure
> Then we may have files that do not belong to any checkpoint (because B and C 
> were not considered successful). These files are currently not cleaned up.
> In order to reduce the amount of such files created, we removed the random 
> suffix from in-progress temporary files, so that the next in-progress file 
> that is opened for this part, overwrites them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to