Re: Recommended way of using hadoop-minicluster für unit testing?
Hi Ayush, thanks for your time investigating! I followed your recommendation and it seems to work (also for some of our consumer projects), so thanks a lot for your time! Gruß Richard Am Samstag, dem 13.04.2024 um 03:35 +0530 schrieb Ayush Saxena: > Hi Richard, > Thanx for sharing the steps to reproduce the issue. I cloned the > Apache Storm repo and was able to repro the issue. The build was > indeed failing due to missing classes. > > Spent some time to debug the issue, might not be very right (no > experience with Storm), There are Two ways to get this going > > First Approach: If we want to use the shaded classes > > 1. I think the artifact to be used for minicluster should be `hadoop- > client-minicluster`, even spark uses the same [1], the one which you > are using is `hadoop-minicluster`, which in its own is empty > ``` > ayushsaxena@ayushsaxena ~ % jar tf > /Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop- > minicluster/3.3.6/hadoop-minicluster-3.3.6.jar | grep .class > ayushsaxena@ayushsaxena ~ % > ``` > > It just defines artifacts which are to be used by `hadoop-client- > minicluster` and this jar has that shading and stuff, using `hadoop- > minicluster` is like adding the hadoop dependencies into the pom > transitively, without any shading or so, which tends to conflict with > `hadoop-client-api` and `hadoop-client-runtime` jars, which uses the > shaded classes. > > 2. Once you change `hadoop-minicluster` to `hadoop-client- > minicluster`, still the tests won't pass, the reason being the > `storm-autocreds` dependency which pulls in the hadoop jars via > `hbase-client` & `hive-exec`, So, we need to exclude them as well > > 3. I reverted your classpath hack, changed the jar, & excluded the > dependencies from storm-autocreds & ran the storm-hdfs tests & all > the tests passed, which were failing initially without any code > change > ``` > [INFO] Results: > [INFO] > [INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0 > [INFO] > [INFO] -- > -- > [INFO] BUILD SUCCESS > [INFO] -- > -- > ``` > > 4. Putting the code diff here might make this mail unreadable, so I > am sharing the link to the commit which fixed Storm for me here [2], > let me know if it has any access issues, I will put the diff on the > mail itself in text form. > > Second Approach: If we don't want to use the shaded classes > > 1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses > shading which tends to conflict with your non shaded `hadoop- > minicluster`, Rather than using these jars use the `hadoop-client` > jar > > 2. I removed your hack & changed those two jars with `hadoop-client` > jar & the storm-hdfs tests passes > > 3. I am sharing the link to the commit in my fork, it is here at [3], > one advantage is, you don't have to change your existing jar nor you > would need to add those exclusions in the `storm-cred` dependency. > > ++ Adding common-dev, in case any fellow developers with more > experience around using the hadoop-client jars can help, if things > still don't work or Storm needs something more. The downstream > projects which I have experience with don't use these jars (which > they should ideally) :-) > > -Ayush > > > [1] https://github.com/apache/spark/blob/master/pom.xml#L1382 > [2] > https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6 > [3] > https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4b > ab9b3f7ac8 > > > On Fri, 12 Apr 2024 at 10:41, Richard Zowalla > wrote: > > Hi, > > > > thanks for the fast reply. The PR is here [1]. > > > > It works, if I exclude the client-api and client-api-runtime from > > being scanned in surefire, which is a hacky workaround for the > > actual issue. > > > > The hadoop-commons jar is a transient dependency of the > > minicluster, which is used for testing. > > > > Debugging the situation shows, that HttpServer2 is in the same > > package in hadoop-commons as well as in the client-api but with > > differences in methods / classes used, so depending on the > > classpath order the wrong class is loaded. > > > > Stacktraces are in the first GH Action run.here: [1]. > > > > A reproducer would be to check out Storm, go to storm-hdfs and > > remove the exclusion in [2] and run the tests in that module, which > > will fail due to a missing jetty server class (as the HTTPServer2 > > class is loaded from client-api instead of minicluster). > > > > Gruß & Thx > > Richard > > > > [1] https://github.com/apache/storm/pull/3637 > > [2] > > https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120 > > > > On 2024/04/11 21:29:13 Ayush Saxena wrote: > > > Hi Richard, > > > I am not able to decode the issue properly here, It would have > > > been > > > better if you shared the PR or the failure tr
Re: Recommended way of using hadoop-minicluster für unit testing?
Hi Richard, Thanx for sharing the steps to reproduce the issue. I cloned the Apache Storm repo and was able to repro the issue. The build was indeed failing due to missing classes. Spent some time to debug the issue, might not be very right (no experience with Storm), There are Two ways to get this going *First Approach: If we want to use the shaded classes* 1. I think the artifact to be used for minicluster should be `hadoop-client-minicluster`, even spark uses the same [1], the one which you are using is `hadoop-minicluster`, which in its own is empty ``` ayushsaxena@ayushsaxena ~ % jar tf /Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop-minicluster/3.3.6/hadoop-minicluster-3.3.6.jar | grep .class ayushsaxena@ayushsaxena ~ % ``` It just defines artifacts which are to be used by `hadoop-client-minicluster` and this jar has that shading and stuff, using `hadoop-minicluster` is like adding the hadoop dependencies into the pom transitively, without any shading or so, which tends to conflict with `hadoop-client-api` and `hadoop-client-runtime` jars, which uses the shaded classes. 2. Once you change `hadoop-minicluster` to `hadoop-client-minicluster`, still the tests won't pass, the reason being the `storm-autocreds` dependency which pulls in the hadoop jars via `hbase-client` & `hive-exec`, So, we need to exclude them as well 3. I reverted your classpath hack, changed the jar, & excluded the dependencies from storm-autocreds & ran the storm-hdfs tests & all the tests passed, which were failing initially without any code change ``` [INFO] Results: [INFO] [INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] [INFO] BUILD SUCCESS [INFO] ``` 4. Putting the code diff here might make this mail unreadable, so I am sharing the link to the commit which fixed Storm for me here [2], let me know if it has any access issues, I will put the diff on the mail itself in text form. *Second Approach: If we don't want to use the shaded classes* 1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses shading which tends to conflict with your non shaded `hadoop-minicluster`, Rather than using these jars use the `hadoop-client` jar 2. I removed your hack & changed those two jars with `hadoop-client` jar & the storm-hdfs tests passes 3. I am sharing the link to the commit in my fork, it is here at [3], one advantage is, you don't have to change your existing jar nor you would need to add those exclusions in the `storm-cred` dependency. ++ Adding common-dev, in case any fellow developers with more experience around using the hadoop-client jars can help, if things still don't work or Storm needs something more. The downstream projects which I have experience with don't use these jars (which they should ideally) :-) -Ayush [1] https://github.com/apache/spark/blob/master/pom.xml#L1382 [2] https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6 [3] https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4bab9b3f7ac8 On Fri, 12 Apr 2024 at 10:41, Richard Zowalla wrote: > Hi, > > thanks for the fast reply. The PR is here [1]. > > It works, if I exclude the client-api and client-api-runtime from being > scanned in surefire, which is a hacky workaround for the actual issue. > > The hadoop-commons jar is a transient dependency of the minicluster, which > is used for testing. > > Debugging the situation shows, that HttpServer2 is in the same package in > hadoop-commons as well as in the client-api but with differences in methods > / classes used, so depending on the classpath order the wrong class is > loaded. > > Stacktraces are in the first GH Action run.here: [1]. > > A reproducer would be to check out Storm, go to storm-hdfs and remove the > exclusion in [2] and run the tests in that module, which will fail due to a > missing jetty server class (as the HTTPServer2 class is loaded from > client-api instead of minicluster). > > Gruß & Thx > Richard > > [1] https://github.com/apache/storm/pull/3637 > [2] > https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120 > > On 2024/04/11 21:29:13 Ayush Saxena wrote: > > Hi Richard, > > I am not able to decode the issue properly here, It would have been > > better if you shared the PR or the failure trace as well. > > QQ: Why are you having hadoop-common as an explicit dependency? Those > > hadoop-common stuff should be there in hadoop-client-api > > I quickly checked once on the 3.4.0 release and I think it does have > them. > > > > ``` > > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar | > > grep org/apache/hadoop/fs/FileSystem.class > > org/apache/hadoop/fs/FileSystem.class > > `` > > > > You didn't mention which shaded classes are being reported as > > missing... I think spark uses the
Re: Recommended way of using hadoop-minicluster für unit testing?
Hi, thanks for the fast reply. The PR is here [1]. It works, if I exclude the client-api and client-api-runtime from being scanned in surefire, which is a hacky workaround for the actual issue. The hadoop-commons jar is a transient dependency of the minicluster, which is used for testing. Debugging the situation shows, that HttpServer2 is in the same package in hadoop-commons as well as in the client-api but with differences in methods / classes used, so depending on the classpath order the wrong class is loaded. Stacktraces are in the first GH Action run.here: [1]. A reproducer would be to check out Storm, go to storm-hdfs and remove the exclusion in [2] and run the tests in that module, which will fail due to a missing jetty server class (as the HTTPServer2 class is loaded from client-api instead of minicluster). Gruß & Thx Richard [1] https://github.com/apache/storm/pull/3637 [2] https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120 On 2024/04/11 21:29:13 Ayush Saxena wrote: > Hi Richard, > I am not able to decode the issue properly here, It would have been > better if you shared the PR or the failure trace as well. > QQ: Why are you having hadoop-common as an explicit dependency? Those > hadoop-common stuff should be there in hadoop-client-api > I quickly checked once on the 3.4.0 release and I think it does have them. > > ``` > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar | > grep org/apache/hadoop/fs/FileSystem.class > org/apache/hadoop/fs/FileSystem.class > `` > > You didn't mention which shaded classes are being reported as > missing... I think spark uses these client jars, you can use that as > an example, can grab pointers from here: [1] & [2] > > -Ayush > > [1] https://github.com/apache/spark/blob/master/pom.xml#L1361 > [2] https://issues.apache.org/jira/browse/SPARK-33212 > > On Thu, 11 Apr 2024 at 17:09, Richard Zowalla wrote: > > > > Hi all, > > > > we are using "hadoop-minicluster" in Apache Storm to test our hdfs > > integration. > > > > Recently, we were cleaning up our dependencies and I noticed, that if I > > am adding > > > > > > org.apache.hadoop > > hadoop-client-api > > ${hadoop.version} > > > > > > org.apache.hadoop > > hadoop-client-runtime > > ${hadoop.version} > > > > > > and have > > > > org.apache.hadoop > > hadoop-minicluster > > ${hadoop.version} > > test > > > > > > as a test dependency to setup a mini-cluster to test our storm-hdfs > > integration. > > > > This fails weirdly because of missing (shaded) classes as well as a > > class ambiquity with HttpServer2. > > > > It is present as a class inside of the "hadoop-client-api" and within > > "hadoop-common". > > > > Is this setup wrong or should we try something different here? > > > > Gruß > > Richard > > - > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org > > - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org
Re: Recommended way of using hadoop-minicluster für unit testing?
Hi Richard, I am not able to decode the issue properly here, It would have been better if you shared the PR or the failure trace as well. QQ: Why are you having hadoop-common as an explicit dependency? Those hadoop-common stuff should be there in hadoop-client-api I quickly checked once on the 3.4.0 release and I think it does have them. ``` ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar | grep org/apache/hadoop/fs/FileSystem.class org/apache/hadoop/fs/FileSystem.class `` You didn't mention which shaded classes are being reported as missing... I think spark uses these client jars, you can use that as an example, can grab pointers from here: [1] & [2] -Ayush [1] https://github.com/apache/spark/blob/master/pom.xml#L1361 [2] https://issues.apache.org/jira/browse/SPARK-33212 On Thu, 11 Apr 2024 at 17:09, Richard Zowalla wrote: > > Hi all, > > we are using "hadoop-minicluster" in Apache Storm to test our hdfs > integration. > > Recently, we were cleaning up our dependencies and I noticed, that if I > am adding > > > org.apache.hadoop > hadoop-client-api > ${hadoop.version} > > > org.apache.hadoop > hadoop-client-runtime > ${hadoop.version} > > > and have > > org.apache.hadoop > hadoop-minicluster > ${hadoop.version} > test > > > as a test dependency to setup a mini-cluster to test our storm-hdfs > integration. > > This fails weirdly because of missing (shaded) classes as well as a > class ambiquity with HttpServer2. > > It is present as a class inside of the "hadoop-client-api" and within > "hadoop-common". > > Is this setup wrong or should we try something different here? > > Gruß > Richard - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org
Recommended way of using hadoop-minicluster für unit testing?
Hi all, we are using "hadoop-minicluster" in Apache Storm to test our hdfs integration. Recently, we were cleaning up our dependencies and I noticed, that if I am adding org.apache.hadoop hadoop-client-api ${hadoop.version} org.apache.hadoop hadoop-client-runtime ${hadoop.version} and have org.apache.hadoop hadoop-minicluster ${hadoop.version} test as a test dependency to setup a mini-cluster to test our storm-hdfs integration. This fails weirdly because of missing (shaded) classes as well as a class ambiquity with HttpServer2. It is present as a class inside of the "hadoop-client-api" and within "hadoop-common". Is this setup wrong or should we try something different here? Gruß Richard signature.asc Description: This is a digitally signed message part