[ https://issues.apache.org/jira/browse/FLINK-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251486#comment-16251486 ]
ASF GitHub Bot commented on FLINK-7973: --------------------------------------- GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/5013 [FLINK-7973] disable JNI bridge for relocated hadoop classes in s3-fs-* ## What is the purpose of the change If some Hadoop's JNI library is in the classpath, it will be loaded by our shaded, relocated hadoop classes in the `flink-s3-fs-*` filesystems as well. Then, however, `NativeCodeLoader#isNativeCodeLoaded` will return `true` and native code libraries will be tried although our relocated namespaces have no JNI mapping leading to errors like `java.lang.UnsatisfiedLinkError: org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V`. ## Brief change log - disable native code loading (there are more users than the shown `JniBasedUnixGroupsMapping`) via copies of the respective `NativeCodeLoader` class ## Verifying this change This change added tests and can be verified as follows: - Manually verified the change by running a 3 node cluster with 1 JobManagers and 2 TaskManagers on EMR executing the `WordCount` example with an S3 input source: ``` cp ./opt/flink-s3-fs-hadoop-1.4-SNAPSHOT.jar ./lib/ ./bin/flink run -m yarn-cluster -yn 2 -ys 1 -yjm 768 -ytm 1024 ./examples/batch/WordCount.jar --input s3://<bucket>/<path-to-intput-file> ``` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **yes** -- actually, the shaded and relocated Hadoop classes may not use (potentially faster) JNI implementations for certain functions; depending on their use, this may be per record but since this only applies to the S3 filesystem access, performance penalties should be hidden by its access times anyway - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **yes** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **docs** You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-7973-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5013.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5013 ---- commit 95f533d004e7373e9de03245a7984b6355209c22 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-11-14T13:36:22Z [FLINK-7973] disable JNI bridge for relocated hadoop classes in s3-fs-* ---- > Fix service shading relocation for S3 file systems > -------------------------------------------------- > > Key: FLINK-7973 > URL: https://issues.apache.org/jira/browse/FLINK-7973 > Project: Flink > Issue Type: Bug > Reporter: Stephan Ewen > Assignee: Nico Kruber > Priority: Blocker > Fix For: 1.4.0 > > > The shade plugin relocates services incorrectly currently, applying > relocation patterns multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)