Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1084#issuecomment-137125224 If it fails in the middle of writing or before sync/flush is called on the writer then the data can be in an inconsistent state. I see three ways of dealing with this, one is more long-term. The long term solution is to make the sink exactly-once aware. Either using truncate() support in Hadoop 2.7 or a custom Thread that does merging of part files and throwing away of data that was erroneously written. The two short term options are: - Keep it as it is, consumers need to be able to deal with corrupt records and ignore them. This would give you at-least-once semantics. - Write to a temporary file. When rolling, close the current bucket and rename the file to the final filename. This would ensure that the output doesn't contain corrupt records but you would have neither at-least-once nor exactly-once semantics because some written records would be lost if checkpoint restore restores to a state after the writing of the current bucket file started.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---