Arnaud Linz created FLINK-2580:
----------------------------------
Summary: HadoopDataOutputStream does not expose enough methods of
org.apache.hadoop.fs.FSDataOutputStream
Key: FLINK-2580
URL: https://issues.apache.org/jira/browse/FLINK-2580
Project: Flink
Issue Type: Improvement
Components: Hadoop Compatibility
Reporter: Arnaud Linz
Priority: Minor
I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write
into a hdfs file, calling
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a
HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream
(under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper).
However, FSDataOutputStream exposes many methods like flush, getPos etc, but
HadoopDataOutputStream only wraps write & close.
For instance, flush() calls the default, empty implementation of OutputStream
instead of the hadoop one, and that’s confusing. Moreover, because of the
restrictive OutputStream interface, hsync() and hflush() are not exposed to
Flink.
I see two options:
- complete the class to wrap all methods of OutputStream and add a
getWrappedStream() to access other stuff like hsync().
- get rid of the Hadoop wrapping and directly use Hadoop file system objects.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)