Andrey Zagrebin created FLINK-13963:
---------------------------------------
Summary: Consolidate Hadoop file systems usage and Hadoop
integration docs
Key: FLINK-13963
URL: https://issues.apache.org/jira/browse/FLINK-13963
Project: Flink
Issue Type: Improvement
Components: Connectors / FileSystem, Connectors / Hadoop
Compatibility, Documentation, FileSystems
Reporter: Andrey Zagrebin
Assignee: Andrey Zagrebin
Fix For: 1.10.0
We have hadoop related docs in several places at the moment:
* *dev/batch/connectors.md* (Hadoop FS implementations and setup)
* *dev/batch/hadoop_compatibility.md* (not valid any more that Flink always
has Hadoop types out of the box as we do not build and provide Flink with
Hadoop by default)
* *ops/filesystems/index.md* (plugins, Hadoop FS implementations and setup
revisited)
* *ops/deployment/hadoop.md* (Hadoop classpath)
* *ops/config.md* (deprecated way to provide Hadoop configuration in Flink
conf)
We could consolidate all these pieces of docs into a consistent structure to
help users to navigate through the docs to well-defined spots depending on
which feature they are trying to use.
The places in docs which should contain the information about Hadoop:
* *dev/batch/hadoop_compatibility.md* (only Dataset API specific stuff about
integration with Hadoop)
* *ops/filesystems/index.md* (Flink FS plugins and Hadoop FS implementations)
* *ops/deployment/hadoop.md* (Hadoop configuration and classpath)
How to setup Hadoop itself should be only in *ops/deployment/hadoop.md*. All
other places dealing with Hadoop/HDFS should contain only their related things
and just reference it 'how to configure Hadoop'. Like all chapters about
writing to file systems (batch connectors and streaming file sinks) should just
reference file systems.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)