Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-11 Thread Richard Zowalla
Hi,

thanks for the fast reply. The PR is here [1].

It works, if I exclude the client-api and client-api-runtime from being scanned 
in surefire, which is a hacky workaround for the actual issue.

The hadoop-commons jar is a transient dependency of the minicluster, which is 
used for testing.

Debugging the situation shows, that HttpServer2  is in the same package in 
hadoop-commons as well as in the client-api but with differences in methods / 
classes used, so depending on the classpath order the wrong class is loaded.

Stacktraces are in the first GH Action run.here: [1]. 

A reproducer would be to check out Storm, go to storm-hdfs and remove the 
exclusion in [2] and run the tests in that module, which will fail due to a 
missing jetty server class (as the HTTPServer2 class is loaded from client-api 
instead of minicluster).

Gruß & Thx
Richard 

[1] https://github.com/apache/storm/pull/3637
[2] 
https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120

On 2024/04/11 21:29:13 Ayush Saxena wrote:
> Hi Richard,
> I am not able to decode the issue properly here, It would have been
> better if you shared the PR or the failure trace as well.
> QQ: Why are you having hadoop-common as an explicit dependency? Those
> hadoop-common stuff should be there in hadoop-client-api
> I quickly checked once on the 3.4.0 release and I think it does have them.
> 
> ```
> ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar |
> grep org/apache/hadoop/fs/FileSystem.class
> org/apache/hadoop/fs/FileSystem.class
> ``
> 
> You didn't mention which shaded classes are being reported as
> missing... I think spark uses these client jars, you can use that as
> an example, can grab pointers from here: [1] & [2]
> 
> -Ayush
> 
> [1] https://github.com/apache/spark/blob/master/pom.xml#L1361
> [2] https://issues.apache.org/jira/browse/SPARK-33212
> 
> On Thu, 11 Apr 2024 at 17:09, Richard Zowalla  wrote:
> >
> > Hi all,
> >
> > we are using "hadoop-minicluster" in Apache Storm to test our hdfs
> > integration.
> >
> > Recently, we were cleaning up our dependencies and I noticed, that if I
> > am adding
> >
> > 
> > org.apache.hadoop
> > hadoop-client-api
> > ${hadoop.version}
> > 
> > 
> > org.apache.hadoop
> > hadoop-client-runtime
> > ${hadoop.version}
> > 
> >
> > and have
> > 
> > org.apache.hadoop
> > hadoop-minicluster
> > ${hadoop.version}
> > test
> > 
> >
> > as a test dependency to setup a mini-cluster to test our storm-hdfs
> > integration.
> >
> > This fails weirdly because of missing (shaded) classes as well as a
> > class ambiquity with HttpServer2.
> >
> > It is present as a class inside of the "hadoop-client-api" and within
> > "hadoop-common".
> >
> > Is this setup wrong or should we try something different here?
> >
> > Gruß
> > Richard
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
> 
> 

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: [ANNOUNCE] Apache Hadoop 3.4.0 release

2024-04-11 Thread Sammi Chen
Xiaoqiao He and Shilun Fan

Awesome!  Thanks for leading the effort to release the Hadoop 3.4.0 !

Bests,
Sammi

On Tue, 19 Mar 2024 at 21:12, slfan1989  wrote:

> On behalf of the Apache Hadoop Project Management Committee, We are
> pleased to announce the release of Apache Hadoop 3.4.0.
>
> This is a release of Apache Hadoop 3.4 line.
>
> Key changes include
>
> * S3A: Upgrade AWS SDK to V2
> * HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
> * YARN Federation improvements
> * YARN Capacity Scheduler improvements
> * HDFS RBF: Code Enhancements, New Features, and Bug Fixes
> * HDFS EC: Code Enhancements and Bug Fixes
> * Transitive CVE fixes
>
> This is the first release of Apache Hadoop 3.4 line. It contains 2888 bug
> fixes, improvements and enhancements since 3.3.
>
> Users are encouraged to read the [overview of major changes][1].
> For details of please check [release notes][2] and [changelog][3].
>
> [1]: http://hadoop.apache.org/docs/r3.4.0/index.html
> [2]:
>
> http://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-common/release/3.4.0/RELEASENOTES.3.4.0.html
> [3]:
>
> http://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-common/release/3.4.0/CHANGELOG.3.4.0.html
>
> Many thanks to everyone who helped in this release by supplying patches,
> reviewing them, helping get this release building and testing and
> reviewing the final artifacts.
>
> Best Regards,
> Xiaoqiao He And Shilun Fan.
>


Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-11 Thread Ayush Saxena
Hi Richard,
I am not able to decode the issue properly here, It would have been
better if you shared the PR or the failure trace as well.
QQ: Why are you having hadoop-common as an explicit dependency? Those
hadoop-common stuff should be there in hadoop-client-api
I quickly checked once on the 3.4.0 release and I think it does have them.

```
ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar |
grep org/apache/hadoop/fs/FileSystem.class
org/apache/hadoop/fs/FileSystem.class
``

You didn't mention which shaded classes are being reported as
missing... I think spark uses these client jars, you can use that as
an example, can grab pointers from here: [1] & [2]

-Ayush

[1] https://github.com/apache/spark/blob/master/pom.xml#L1361
[2] https://issues.apache.org/jira/browse/SPARK-33212

On Thu, 11 Apr 2024 at 17:09, Richard Zowalla  wrote:
>
> Hi all,
>
> we are using "hadoop-minicluster" in Apache Storm to test our hdfs
> integration.
>
> Recently, we were cleaning up our dependencies and I noticed, that if I
> am adding
>
> 
> org.apache.hadoop
> hadoop-client-api
> ${hadoop.version}
> 
> 
> org.apache.hadoop
> hadoop-client-runtime
> ${hadoop.version}
> 
>
> and have
> 
> org.apache.hadoop
> hadoop-minicluster
> ${hadoop.version}
> test
> 
>
> as a test dependency to setup a mini-cluster to test our storm-hdfs
> integration.
>
> This fails weirdly because of missing (shaded) classes as well as a
> class ambiquity with HttpServer2.
>
> It is present as a class inside of the "hadoop-client-api" and within
> "hadoop-common".
>
> Is this setup wrong or should we try something different here?
>
> Gruß
> Richard

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Recommended way of using hadoop-minicluster für unit testing?

2024-04-11 Thread Richard Zowalla
Hi all,

we are using "hadoop-minicluster" in Apache Storm to test our hdfs
integration.

Recently, we were cleaning up our dependencies and I noticed, that if I
am adding


org.apache.hadoop
hadoop-client-api
${hadoop.version}


org.apache.hadoop
hadoop-client-runtime
${hadoop.version}


and have

org.apache.hadoop
hadoop-minicluster
${hadoop.version}
test


as a test dependency to setup a mini-cluster to test our storm-hdfs
integration.

This fails weirdly because of missing (shaded) classes as well as a
class ambiquity with HttpServer2. 

It is present as a class inside of the "hadoop-client-api" and within
"hadoop-common". 

Is this setup wrong or should we try something different here?

Gruß
Richard


signature.asc
Description: This is a digitally signed message part