I have an idea to create new version of HadoopRecoverableFsDataOutputStream class (for example with name LegacyHadoopRecoverableFsDataOutputStream :) ) which will works with valid-length files without invoking truncate. And modify check in HadoopRecoverableWriter to use LegacyHadoopRecoverableFsDataOutputStream in case if Hadoop version is lower then 2.7 . I will try to provide PR soon if no objections. I hope I am on the right way.
On Mon, 20 Aug 2018 at 14:40, Artsem Semianenka <artfulonl...@gmail.com> wrote: > Hi guys ! > I have a question regarding new StreamingFileSink (introduced in 1.6 > version) . We use this sink to write data into Parquet format. But I faced > with issue when trying to run job on Yarn cluster and save result to HDFS. > In our case we use latest Cloudera distributive (CHD 5.15) and it contains > HDFS 2.6.0 . This version is not support truncate method . I would like to > create Pull request but I want to ask your advice how better design this > fix and which ideas are behind this decision . I saw similiar PR for > BucketingSink https://github.com/apache/flink/pull/6108 . Maybe I could > also add support of valid-length files for older Hadoop versions ? > > P.S.Unfortently CHD 5.15 (with Hadoop 2.6) is the latest version of > Cloudera distributive and we can't upgrade hadoop to 2.7 Hadoop . > > Best regards, > Artsem > -- С уважением, Артем Семененко