[GitHub] [flink] zentol commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile
zentol commented on a change in pull request #11983: URL: https://github.com/apache/flink/pull/11983#discussion_r422996560 ## File path: flink-end-to-end-tests/pom.xml ## @@ -255,6 +298,21 @@ under the License. + + org.apache.maven.plugins + maven-enforcer-plugin Review comment: Convergence isn't just a problem for users though; we also want convergence so the tests are stable, so i don't want to throw it out in general just because Hadoop is causing yet another issue. I'd also like to understand why the exclusions aren't working; we used them in other areas to solve convergence issues IIRC. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile
zentol commented on a change in pull request #11983: URL: https://github.com/apache/flink/pull/11983#discussion_r422992723 ## File path: flink-end-to-end-tests/pom.xml ## @@ -153,6 +170,32 @@ under the License. + + maven-resources-plugin + Review comment: yes we should remove it. It's a bit odd that we don't have a managed version, but that's a separate issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile
zentol commented on a change in pull request #11983: URL: https://github.com/apache/flink/pull/11983#discussion_r422899707 ## File path: flink-end-to-end-tests/pom.xml ## @@ -153,6 +170,32 @@ under the License. + + maven-resources-plugin + Review comment: ? ## File path: flink-dist/pom.xml ## @@ -460,50 +460,6 @@ under the License. - - - include-hadoop Review comment: still referenced in the azure files and `run-nightly-tests.sh` ## File path: flink-end-to-end-tests/pom.xml ## @@ -153,6 +170,32 @@ under the License. + + maven-resources-plugin + + + + copy-resources + Review comment: ```suggestion ``` ## File path: flink-end-to-end-tests/pom.xml ## @@ -255,6 +298,21 @@ under the License. + + org.apache.maven.plugins + maven-enforcer-plugin Review comment: do we still need this despite excluding all dependencies? ## File path: flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionCapacitySchedulerITCase.java ## @@ -155,9 +155,7 @@ public static void tearDown() throws Exception { public void testStartYarnSessionClusterInQaTeamQueue() throws Exception { runTest(() -> runWithArgs(new String[]{ "-j", flinkUberjar.getAbsolutePath(), - "-t", flinkLibFolder.getAbsolutePath(), - "-t", flinkShadedHadoopDir.getAbsolutePath(), - "-jm", "768m", + "-t", flinkLibFolder.getAbsolutePath(), "-jm", "768m", Review comment: ```suggestion "-t", flinkLibFolder.getAbsolutePath(), "-jm", "768m", ``` ## File path: docs/ops/deployment/hadoop.md ## @@ -120,4 +88,44 @@ This way it should work both in local and cluster run where the provided depende To run or debug an application in IntelliJ Idea the provided dependencies can be included to the class path in the "Run|Edit Configurations" window. + +2) Putting the required jar files into /lib directory of the Flink distribution +Option 1) requires very little work, integrates nicely with existing Hadoop setups and should be the +preferred approach. +However, Hadoop has a large dependency footprint that increases the risk for dependency conflicts to occur. +If this happens, please refer to option 2). + +The following subsections explains these approaches in detail. + +## Using `flink-shaded-hadoop-2-uber` jar for resolving dependency conflicts (legacy) + + + Warning: Starting from Flink 1.11, using `flink-shaded-hadoop-2-uber` releases is not officially supported + by the Flink project anymore. Users are advised to provide Hadoop dependencies through `HADOOP_CLASSPATH` (see above). + + + + +The Flink project used to release Hadoop distributions for specific versions, that relocate or exclude several dependencies Review comment: add a specific release as a date, instead of "used to" ## File path: docs/ops/deployment/hadoop.md ## @@ -120,4 +88,44 @@ This way it should work both in local and cluster run where the provided depende To run or debug an application in IntelliJ Idea the provided dependencies can be included to the class path in the "Run|Edit Configurations" window. + +2) Putting the required jar files into /lib directory of the Flink distribution Review comment: This paragraph isn't properly integrated with the current documentation. I guess it should be removed since the section below subsumes it? ## File path: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/util/HBaseTestingClusterAutoStarter.java ## @@ -142,6 +144,11 @@ private static Configuration initialize(Configuration conf) { @BeforeClass public static void setUp() throws Exception { + // HBase 1.4 does not work with Hadoop 3 + // because it uses Guava 12.0.1, Hadoop 3 uses Guava 27.0-jre. + // There is not Guava version in between that works with both. Review comment: ```suggestion // There is no Guava version in between that works with both. ``` ## File path: flink-end-to-
[GitHub] [flink] zentol commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile
zentol commented on a change in pull request #11983: URL: https://github.com/apache/flink/pull/11983#discussion_r420605259 ## File path: flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/java/org/apache/flink/tests/util/kafka/SQLClientKafkaITCase.java ## @@ -106,11 +112,16 @@ public SQLClientKafkaITCase(String kafkaVersion, String kafkaSQLVersion, String } @Before - public void before() { + public void before() throws Exception { + downloadCache.before(); Path tmpPath = tmp.getRoot().toPath(); LOG.info("The current temporary path: {}", tmpPath); this.sqlClientSessionConf = tmpPath.resolve("sql-client-session.conf"); this.result = tmpPath.resolve("result"); + + apacheAvroJars.add(downloadCache.getOrDownload("https://repo1.maven.org/maven2/org/apache/avro/avro/1.8.2/avro-1.8.2.jar";, tmpPath)); Review comment: > Because flink-shaded-hadoop does not exist anymore in this change Well duh, but hadoop is still on the classpath, no? Otherwise you're not fulfilling the contract of the `Hadoop` category; if the test is still passing then the category should be removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile
zentol commented on a change in pull request #11983: URL: https://github.com/apache/flink/pull/11983#discussion_r420172348 ## File path: flink-connectors/flink-connector-hive/pom.xml ## @@ -126,12 +211,18 @@ under the License. thus override the default hadoop version from 2.4.1 to 2.7.5 --> - org.apache.flink - flink-shaded-hadoop-2-uber - ${hivemetastore.hadoop.version}-${flink.shaded.version} + org.apache.hadoop + hadoop-common + ${hivemetastore.hadoop.version} Review comment: redundant due to dependency management ## File path: flink-connectors/flink-connector-hive/pom.xml ## @@ -126,12 +211,18 @@ under the License. thus override the default hadoop version from 2.4.1 to 2.7.5 --> - org.apache.flink - flink-shaded-hadoop-2-uber - ${hivemetastore.hadoop.version}-${flink.shaded.version} + org.apache.hadoop + hadoop-common + ${hivemetastore.hadoop.version} provided + + org.apache.hadoop + hadoop-mapreduce-client-core + ${hivemetastore.hadoop.version} Review comment: redundant due to dependency management ## File path: flink-formats/flink-orc-nohive/pom.xml ## @@ -98,6 +103,22 @@ under the License. + + + + hadoop3-tests + + + org.apache.hadoop + hadoop-hdfs-client Review comment: how come this isn't required at runtime? ## File path: flink-filesystems/flink-s3-fs-hadoop/pom.xml ## @@ -32,6 +32,17 @@ under the License. jar +
[GitHub] [flink] zentol commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile
zentol commented on a change in pull request #11983: URL: https://github.com/apache/flink/pull/11983#discussion_r419698801 ## File path: flink-dist/pom.xml ## @@ -137,8 +137,8 @@ under the License. ${project.version} - org.apache.flink - flink-shaded-hadoop-2 + org.apache.hadoop + * Review comment: hmm, should've read MNG-3832 more carefully; it _does_ work with all maven 3 versions, just prints a warning on everything below 3.2.1 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile
zentol commented on a change in pull request #11983: URL: https://github.com/apache/flink/pull/11983#discussion_r419630074 ## File path: flink-dist/pom.xml ## @@ -137,8 +137,8 @@ under the License. ${project.version} - org.apache.flink - flink-shaded-hadoop-2 + org.apache.hadoop + * Review comment: This breaks compatibility with maven 3.1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org