[ https://issues.apache.org/jira/browse/FLINK-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334103#comment-17334103 ]
Flink Jira Bot commented on FLINK-10203: ---------------------------------------- This issue was marked "stale-assigned" and has not received an update in 7 days. It is now automatically unassigned. If you are still working on it, you can assign it to yourself again. Please also give an update about the status of the work. > Support truncate method for old Hadoop versions in > HadoopRecoverableFsDataOutputStream > -------------------------------------------------------------------------------------- > > Key: FLINK-10203 > URL: https://issues.apache.org/jira/browse/FLINK-10203 > Project: Flink > Issue Type: Bug > Components: API / DataStream, Connectors / FileSystem > Affects Versions: 1.6.0, 1.6.1, 1.7.0 > Reporter: Artsem Semianenka > Assignee: Artsem Semianenka > Priority: Major > Labels: pull-request-available, stale-assigned > Attachments: legacy truncate logic.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > New StreamingFileSink ( introduced in 1.6 Flink version ) use > HadoopRecoverableFsDataOutputStream wrapper to write data in HDFS. > HadoopRecoverableFsDataOutputStream is a wrapper for FSDataOutputStream to > have an ability to restore from a certain point of the file after failure and > continue to write data. To achieve this recovery functionality the > HadoopRecoverableFsDataOutputStream uses "truncate" method which was > introduced only in Hadoop 2.7. > FLINK-14170 has enabled the usage of StreamingFileSink for > OnCheckpointRollingPolicy, but it is still not possible to use > StreamingFileSink with DefaultRollingPolicy, which makes writing of the data > to HDFS unpractical in scale for HDFS < 2.7. > Unfortunately, there are a few official Hadoop distributives which latest > version still use Hadoop 2.6 (This distributives: Cloudera, Pivotal HD ). As > the result Flinks Hadoop connector can't work with this distributives. > Flink declares that supported Hadoop from version 2.4.0 upwards > ([https://ci.apache.org/projects/flink/flink-docs-release-1.6/start/building.html#hadoop-versions]) > I guess we should emulate the functionality of "truncate" method for older > Hadoop versions. -- This message was sent by Atlassian Jira (v8.3.4#803005)