Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/4961#discussion_r149636089 --- Diff: flink-filesystems/flink-s3-fs-presto/README.md --- @@ -0,0 +1,28 @@ +This project is a wrapper around the S3 file system from the Presto project which shades all dependencies. +Initial simple tests seem to indicate that it responds slightly faster +and in a bit more lightweight manner to write/read/list requests, compared +to the Hadoop s3a FS, but it has some semantic differences. + +We also relocate the shaded Hadoop version to allow running in a different +setup. For this to work, however, we needed to adapt Hadoop's `Configuration` +class to load a (shaded) `core-default-shaded.xml` configuration with the +relocated class names of classes loaded via reflection +(in the fute, we may need to extend this to `mapred-default.xml` and `hdfs-defaults.xml` and their respective configuration classes). + +# Changing the Hadoop Version + +If you want to change the Hadoop version this project depends on, the following +steps are required to keep the shading correct: + +1. copy `org/apache/hadoop/conf/Configuration.java` from the respective Hadoop jar file (from `com.facebook.presto.hadoop/hadoop-apache2`) to this project + - adapt the `Configuration` class by replacing `core-default.xml` with `core-default-shaded.xml`. +2. copy `core-default.xml` from the respective Hadoop jar (from `com.facebook.presto.hadoop/hadoop-apache2`) file to this project as + - `src/main/resources/core-default-shaded.xml` (replacing every occurence of `org.apache.hadoop` with `org.apache.flink.fs.s3presto.shaded.org.apache.hadoop`) + - `src/test/resources/core-site.xml` (as is) +3. verify the shaded jar: + - does not contain any unshaded classes except for `org.apache.flink.fs.s3presto.S3FileSystemFactory` + - every other classes should be under `org.apache.flink.fs.s3presto.shaded` --- End diff -- nit: "classes"
---