Re: Recommended way of using hadoop-minicluster für unit testing?
Hi, thanks for the fast reply. The PR is here [1]. It works, if I exclude the client-api and client-api-runtime from being scanned in surefire, which is a hacky workaround for the actual issue. The hadoop-commons jar is a transient dependency of the minicluster, which is used for testing. Debugging the situation shows, that HttpServer2 is in the same package in hadoop-commons as well as in the client-api but with differences in methods / classes used, so depending on the classpath order the wrong class is loaded. Stacktraces are in the first GH Action run.here: [1]. A reproducer would be to check out Storm, go to storm-hdfs and remove the exclusion in [2] and run the tests in that module, which will fail due to a missing jetty server class (as the HTTPServer2 class is loaded from client-api instead of minicluster). Gruß & Thx Richard [1] https://github.com/apache/storm/pull/3637 [2] https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120 On 2024/04/11 21:29:13 Ayush Saxena wrote: > Hi Richard, > I am not able to decode the issue properly here, It would have been > better if you shared the PR or the failure trace as well. > QQ: Why are you having hadoop-common as an explicit dependency? Those > hadoop-common stuff should be there in hadoop-client-api > I quickly checked once on the 3.4.0 release and I think it does have them. > > ``` > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar | > grep org/apache/hadoop/fs/FileSystem.class > org/apache/hadoop/fs/FileSystem.class > `` > > You didn't mention which shaded classes are being reported as > missing... I think spark uses these client jars, you can use that as > an example, can grab pointers from here: [1] & [2] > > -Ayush > > [1] https://github.com/apache/spark/blob/master/pom.xml#L1361 > [2] https://issues.apache.org/jira/browse/SPARK-33212 > > On Thu, 11 Apr 2024 at 17:09, Richard Zowalla wrote: > > > > Hi all, > > > > we are using "hadoop-minicluster" in Apache Storm to test our hdfs > > integration. > > > > Recently, we were cleaning up our dependencies and I noticed, that if I > > am adding > > > > > > org.apache.hadoop > > hadoop-client-api > > ${hadoop.version} > > > > > > org.apache.hadoop > > hadoop-client-runtime > > ${hadoop.version} > > > > > > and have > > > > org.apache.hadoop > > hadoop-minicluster > > ${hadoop.version} > > test > > > > > > as a test dependency to setup a mini-cluster to test our storm-hdfs > > integration. > > > > This fails weirdly because of missing (shaded) classes as well as a > > class ambiquity with HttpServer2. > > > > It is present as a class inside of the "hadoop-client-api" and within > > "hadoop-common". > > > > Is this setup wrong or should we try something different here? > > > > Gruß > > Richard > > - > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org > > - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org
Re: [ANNOUNCE] Apache Hadoop 3.4.0 release
Xiaoqiao He and Shilun Fan Awesome! Thanks for leading the effort to release the Hadoop 3.4.0 ! Bests, Sammi On Tue, 19 Mar 2024 at 21:12, slfan1989 wrote: > On behalf of the Apache Hadoop Project Management Committee, We are > pleased to announce the release of Apache Hadoop 3.4.0. > > This is a release of Apache Hadoop 3.4 line. > > Key changes include > > * S3A: Upgrade AWS SDK to V2 > * HDFS DataNode Split one FsDatasetImpl lock to volume grain locks > * YARN Federation improvements > * YARN Capacity Scheduler improvements > * HDFS RBF: Code Enhancements, New Features, and Bug Fixes > * HDFS EC: Code Enhancements and Bug Fixes > * Transitive CVE fixes > > This is the first release of Apache Hadoop 3.4 line. It contains 2888 bug > fixes, improvements and enhancements since 3.3. > > Users are encouraged to read the [overview of major changes][1]. > For details of please check [release notes][2] and [changelog][3]. > > [1]: http://hadoop.apache.org/docs/r3.4.0/index.html > [2]: > > http://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-common/release/3.4.0/RELEASENOTES.3.4.0.html > [3]: > > http://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-common/release/3.4.0/CHANGELOG.3.4.0.html > > Many thanks to everyone who helped in this release by supplying patches, > reviewing them, helping get this release building and testing and > reviewing the final artifacts. > > Best Regards, > Xiaoqiao He And Shilun Fan. >
Re: Recommended way of using hadoop-minicluster für unit testing?
Hi Richard, I am not able to decode the issue properly here, It would have been better if you shared the PR or the failure trace as well. QQ: Why are you having hadoop-common as an explicit dependency? Those hadoop-common stuff should be there in hadoop-client-api I quickly checked once on the 3.4.0 release and I think it does have them. ``` ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar | grep org/apache/hadoop/fs/FileSystem.class org/apache/hadoop/fs/FileSystem.class `` You didn't mention which shaded classes are being reported as missing... I think spark uses these client jars, you can use that as an example, can grab pointers from here: [1] & [2] -Ayush [1] https://github.com/apache/spark/blob/master/pom.xml#L1361 [2] https://issues.apache.org/jira/browse/SPARK-33212 On Thu, 11 Apr 2024 at 17:09, Richard Zowalla wrote: > > Hi all, > > we are using "hadoop-minicluster" in Apache Storm to test our hdfs > integration. > > Recently, we were cleaning up our dependencies and I noticed, that if I > am adding > > > org.apache.hadoop > hadoop-client-api > ${hadoop.version} > > > org.apache.hadoop > hadoop-client-runtime > ${hadoop.version} > > > and have > > org.apache.hadoop > hadoop-minicluster > ${hadoop.version} > test > > > as a test dependency to setup a mini-cluster to test our storm-hdfs > integration. > > This fails weirdly because of missing (shaded) classes as well as a > class ambiquity with HttpServer2. > > It is present as a class inside of the "hadoop-client-api" and within > "hadoop-common". > > Is this setup wrong or should we try something different here? > > Gruß > Richard - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org
Recommended way of using hadoop-minicluster für unit testing?
Hi all, we are using "hadoop-minicluster" in Apache Storm to test our hdfs integration. Recently, we were cleaning up our dependencies and I noticed, that if I am adding org.apache.hadoop hadoop-client-api ${hadoop.version} org.apache.hadoop hadoop-client-runtime ${hadoop.version} and have org.apache.hadoop hadoop-minicluster ${hadoop.version} test as a test dependency to setup a mini-cluster to test our storm-hdfs integration. This fails weirdly because of missing (shaded) classes as well as a class ambiquity with HttpServer2. It is present as a class inside of the "hadoop-client-api" and within "hadoop-common". Is this setup wrong or should we try something different here? Gruß Richard signature.asc Description: This is a digitally signed message part