[ https://issues.apache.org/jira/browse/FLINK-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428452#comment-16428452 ]
ASF GitHub Bot commented on FLINK-9113: --------------------------------------- Github user twalthr closed the pull request at: https://github.com/apache/flink/pull/5811 > Data loss in BucketingSink when writing to local filesystem > ----------------------------------------------------------- > > Key: FLINK-9113 > URL: https://issues.apache.org/jira/browse/FLINK-9113 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors > Reporter: Timo Walther > Assignee: Timo Walther > Priority: Major > > For local filesystems, it is not guaranteed that the data is flushed to disk > during checkpointing. This leads to data loss in cases of TaskManager > failures when writing to a local filesystem > {{org.apache.hadoop.fs.LocalFileSystem}}. The {{flush()}} method returns a > written length but the data is not written into the file (thus the valid > length might be greater than the actual file size). {{hsync}} and {{hflush}} > have no effect either. > It seems that this behavior won't be fixed in the near future: > https://issues.apache.org/jira/browse/HADOOP-7844 > One solution would be to call {{close()}} on a checkpoint for local > filesystems, even though this would lead to performance decrease. If we don't > fix this issue, we should at least add proper documentation for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)