[jira] [Commented] (FLINK-10203) Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

Flink Jira Bot (Jira) Tue, 27 Apr 2021 16:23:43 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334103#comment-17334103
 ]


Flink Jira Bot commented on FLINK-10203:
----------------------------------------

This issue was marked "stale-assigned" and has not received an update in 7 
days. It is now automatically unassigned. If you are still working on it, you 
can assign it to yourself again. Please also give an update about the status of 
the work.

> Support truncate method for old Hadoop versions in 
> HadoopRecoverableFsDataOutputStream
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-10203
>                 URL: https://issues.apache.org/jira/browse/FLINK-10203
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream, Connectors / FileSystem
>    Affects Versions: 1.6.0, 1.6.1, 1.7.0
>            Reporter: Artsem Semianenka
>            Assignee: Artsem Semianenka
>            Priority: Major
>              Labels: pull-request-available, stale-assigned
>         Attachments: legacy truncate logic.pdf
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> New StreamingFileSink ( introduced in 1.6 Flink version ) use 
> HadoopRecoverableFsDataOutputStream wrapper to write data in HDFS.
> HadoopRecoverableFsDataOutputStream is a wrapper for FSDataOutputStream to 
> have an ability to restore from a certain point of the file after failure and 
> continue to write data. To achieve this recovery functionality the 
> HadoopRecoverableFsDataOutputStream uses "truncate" method which was 
> introduced only in Hadoop 2.7. 
> FLINK-14170 has enabled the usage of StreamingFileSink for 
> OnCheckpointRollingPolicy, but it is still not possible to use 
> StreamingFileSink with DefaultRollingPolicy, which makes writing of the data 
> to HDFS unpractical in scale for HDFS < 2.7.
> Unfortunately, there are a few official Hadoop distributives which latest 
> version still use Hadoop 2.6 (This distributives: Cloudera, Pivotal HD ). As 
> the result Flinks Hadoop connector can't work with this distributives.
> Flink declares that supported Hadoop from version 2.4.0 upwards 
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.6/start/building.html#hadoop-versions])
> I guess we should emulate the functionality of "truncate" method for older 
> Hadoop versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-10203) Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

Reply via email to