Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-14 Thread Richard Zowalla
Hi Ayush,

thanks for your time investigating!

I followed your recommendation and it seems to work (also for some of
our consumer projects), so thanks a lot for your time!

Gruß
Richard


Am Samstag, dem 13.04.2024 um 03:35 +0530 schrieb Ayush Saxena:
> Hi Richard,
> Thanx for sharing the steps to reproduce the issue. I cloned the
> Apache Storm repo and was able to repro the issue. The build was
> indeed failing due to missing classes.
> 
> Spent some time to debug the issue, might not be very right (no
> experience with Storm), There are Two ways to get this going
> 
> First Approach: If we want to use the shaded classes
> 
> 1. I think the artifact to be used for minicluster should be `hadoop-
> client-minicluster`, even spark uses the same [1], the one which you
> are using is `hadoop-minicluster`, which in its own is empty
> ```
> ayushsaxena@ayushsaxena ~ %  jar tf
> /Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop-
> minicluster/3.3.6/hadoop-minicluster-3.3.6.jar  | grep .class
> ayushsaxena@ayushsaxena ~ %
> ```
> 
> It just defines artifacts which are to be used by `hadoop-client-
> minicluster` and this jar has that shading and stuff, using `hadoop-
> minicluster` is like adding the hadoop dependencies into the pom
> transitively, without any shading or so, which tends to conflict with
> `hadoop-client-api` and `hadoop-client-runtime` jars, which uses the
> shaded classes.
> 
> 2. Once you change `hadoop-minicluster` to `hadoop-client-
> minicluster`, still the tests won't pass, the reason being the
> `storm-autocreds` dependency which pulls in the hadoop jars via
> `hbase-client` & `hive-exec`, So, we need to exclude them as well
> 
> 3. I reverted your classpath hack, changed the jar, & excluded the
> dependencies from storm-autocreds & ran the storm-hdfs tests & all
> the tests passed, which were failing initially without any code
> change
> ```
> [INFO] Results:
> [INFO]
> [INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0
> [INFO]
> [INFO] --
> --
> [INFO] BUILD SUCCESS
> [INFO] --
> --
> ```
> 
> 4. Putting the code diff here might make this mail unreadable, so I
> am sharing the link to the commit which fixed Storm for me here [2],
> let me know if it has any access issues, I will put the diff on the
> mail itself in text form.
> 
> Second Approach: If we don't want to use the shaded classes
> 
> 1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses
> shading which tends to conflict with your non shaded `hadoop-
> minicluster`, Rather than using these jars use the `hadoop-client`
> jar
> 
> 2. I removed your hack & changed those two jars with `hadoop-client`
> jar & the storm-hdfs tests passes
> 
> 3. I am sharing the link to the commit in my fork, it is here at [3],
> one advantage is, you don't have to change your existing jar nor you
> would need to add those exclusions in the `storm-cred` dependency.
> 
> ++ Adding common-dev, in case any fellow developers with more
> experience around using the hadoop-client jars can help, if things
> still don't work or Storm needs something more. The downstream
> projects which I have experience with don't use these jars (which
> they should ideally) :-) 
> 
> -Ayush
> 
> 
> [1] https://github.com/apache/spark/blob/master/pom.xml#L1382
> [2]
> https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6
> [3] 
> https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4b
> ab9b3f7ac8
> 
> 
> On Fri, 12 Apr 2024 at 10:41, Richard Zowalla 
> wrote:
> > Hi,
> > 
> > thanks for the fast reply. The PR is here [1].
> > 
> > It works, if I exclude the client-api and client-api-runtime from
> > being scanned in surefire, which is a hacky workaround for the
> > actual issue.
> > 
> > The hadoop-commons jar is a transient dependency of the
> > minicluster, which is used for testing.
> > 
> > Debugging the situation shows, that HttpServer2  is in the same
> > package in hadoop-commons as well as in the client-api but with
> > differences in methods / classes used, so depending on the
> > classpath order the wrong class is loaded.
> > 
> > Stacktraces are in the first GH Action run.here: [1]. 
> > 
> > A reproducer would be to check out Storm, go to storm-hdfs and
> > remove the exclusion in [2] and run the tests in that module, which
> > will fail due to a missing jetty server class (as the HTTPServer2
> > class is loaded from client-api instead of minicluster).
> > 
> > Gruß & Thx
> > Richard 
> > 
> > [1] https://github.com/apache/storm/pull/3637
> > [2]
> > https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120
> > 
> > On 2024/04/11 21:29:13 Ayush Saxena wrote:
> > > Hi Richard,
> > > I am not able to decode the issue properly here, It would have
> > > been
> > > better if you shared the PR or the failure tr

Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-12 Thread Ayush Saxena
Hi Richard,
Thanx for sharing the steps to reproduce the issue. I cloned the Apache
Storm repo and was able to repro the issue. The build was indeed failing
due to missing classes.

Spent some time to debug the issue, might not be very right (no
experience with Storm), There are Two ways to get this going

*First Approach: If we want to use the shaded classes*

1. I think the artifact to be used for minicluster should be
`hadoop-client-minicluster`, even spark uses the same [1], the one which
you are using is `hadoop-minicluster`, which in its own is empty
```
ayushsaxena@ayushsaxena ~ %  jar tf
/Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop-minicluster/3.3.6/hadoop-minicluster-3.3.6.jar
 | grep .class
ayushsaxena@ayushsaxena ~ %
```

It just defines artifacts which are to be used by
`hadoop-client-minicluster` and this jar has that shading and stuff, using
`hadoop-minicluster` is like adding the hadoop dependencies into the pom
transitively, without any shading or so, which tends to conflict with
`hadoop-client-api` and `hadoop-client-runtime` jars, which uses the shaded
classes.

2. Once you change `hadoop-minicluster` to `hadoop-client-minicluster`,
still the tests won't pass, the reason being the `storm-autocreds`
dependency which pulls in the hadoop jars via `hbase-client` & `hive-exec`,
So, we need to exclude them as well

3. I reverted your classpath hack, changed the jar, & excluded the
dependencies from storm-autocreds & ran the storm-hdfs tests & all the
tests passed, which were failing initially without any code change
```
[INFO] Results:
[INFO]
[INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

```

4. Putting the code diff here might make this mail unreadable, so I am
sharing the link to the commit which fixed Storm for me here [2], let me
know if it has any access issues, I will put the diff on the mail itself in
text form.

*Second Approach: If we don't want to use the shaded classes*

1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses shading
which tends to conflict with your non shaded `hadoop-minicluster`, Rather
than using these jars use the `hadoop-client` jar

2. I removed your hack & changed those two jars with `hadoop-client` jar &
the storm-hdfs tests passes

3. I am sharing the link to the commit in my fork, it is here at [3], one
advantage is, you don't have to change your existing jar nor you would need
to add those exclusions in the `storm-cred` dependency.

++ Adding common-dev, in case any fellow developers with more
experience around using the hadoop-client jars can help, if things still
don't work or Storm needs something more. The downstream projects which I
have experience with don't use these jars (which they should ideally) :-)

-Ayush


[1] https://github.com/apache/spark/blob/master/pom.xml#L1382
[2]
https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6
[3]
https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4bab9b3f7ac8


On Fri, 12 Apr 2024 at 10:41, Richard Zowalla  wrote:

> Hi,
>
> thanks for the fast reply. The PR is here [1].
>
> It works, if I exclude the client-api and client-api-runtime from being
> scanned in surefire, which is a hacky workaround for the actual issue.
>
> The hadoop-commons jar is a transient dependency of the minicluster, which
> is used for testing.
>
> Debugging the situation shows, that HttpServer2  is in the same package in
> hadoop-commons as well as in the client-api but with differences in methods
> / classes used, so depending on the classpath order the wrong class is
> loaded.
>
> Stacktraces are in the first GH Action run.here: [1].
>
> A reproducer would be to check out Storm, go to storm-hdfs and remove the
> exclusion in [2] and run the tests in that module, which will fail due to a
> missing jetty server class (as the HTTPServer2 class is loaded from
> client-api instead of minicluster).
>
> Gruß & Thx
> Richard
>
> [1] https://github.com/apache/storm/pull/3637
> [2]
> https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120
>
> On 2024/04/11 21:29:13 Ayush Saxena wrote:
> > Hi Richard,
> > I am not able to decode the issue properly here, It would have been
> > better if you shared the PR or the failure trace as well.
> > QQ: Why are you having hadoop-common as an explicit dependency? Those
> > hadoop-common stuff should be there in hadoop-client-api
> > I quickly checked once on the 3.4.0 release and I think it does have
> them.
> >
> > ```
> > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar |
> > grep org/apache/hadoop/fs/FileSystem.class
> > org/apache/hadoop/fs/FileSystem.class
> > ``
> >
> > You didn't mention which shaded classes are being reported as
> > missing... I think spark uses the

Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-11 Thread Richard Zowalla
Hi,

thanks for the fast reply. The PR is here [1].

It works, if I exclude the client-api and client-api-runtime from being scanned 
in surefire, which is a hacky workaround for the actual issue.

The hadoop-commons jar is a transient dependency of the minicluster, which is 
used for testing.

Debugging the situation shows, that HttpServer2  is in the same package in 
hadoop-commons as well as in the client-api but with differences in methods / 
classes used, so depending on the classpath order the wrong class is loaded.

Stacktraces are in the first GH Action run.here: [1]. 

A reproducer would be to check out Storm, go to storm-hdfs and remove the 
exclusion in [2] and run the tests in that module, which will fail due to a 
missing jetty server class (as the HTTPServer2 class is loaded from client-api 
instead of minicluster).

Gruß & Thx
Richard 

[1] https://github.com/apache/storm/pull/3637
[2] 
https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120

On 2024/04/11 21:29:13 Ayush Saxena wrote:
> Hi Richard,
> I am not able to decode the issue properly here, It would have been
> better if you shared the PR or the failure trace as well.
> QQ: Why are you having hadoop-common as an explicit dependency? Those
> hadoop-common stuff should be there in hadoop-client-api
> I quickly checked once on the 3.4.0 release and I think it does have them.
> 
> ```
> ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar |
> grep org/apache/hadoop/fs/FileSystem.class
> org/apache/hadoop/fs/FileSystem.class
> ``
> 
> You didn't mention which shaded classes are being reported as
> missing... I think spark uses these client jars, you can use that as
> an example, can grab pointers from here: [1] & [2]
> 
> -Ayush
> 
> [1] https://github.com/apache/spark/blob/master/pom.xml#L1361
> [2] https://issues.apache.org/jira/browse/SPARK-33212
> 
> On Thu, 11 Apr 2024 at 17:09, Richard Zowalla  wrote:
> >
> > Hi all,
> >
> > we are using "hadoop-minicluster" in Apache Storm to test our hdfs
> > integration.
> >
> > Recently, we were cleaning up our dependencies and I noticed, that if I
> > am adding
> >
> > 
> > org.apache.hadoop
> > hadoop-client-api
> > ${hadoop.version}
> > 
> > 
> > org.apache.hadoop
> > hadoop-client-runtime
> > ${hadoop.version}
> > 
> >
> > and have
> > 
> > org.apache.hadoop
> > hadoop-minicluster
> > ${hadoop.version}
> > test
> > 
> >
> > as a test dependency to setup a mini-cluster to test our storm-hdfs
> > integration.
> >
> > This fails weirdly because of missing (shaded) classes as well as a
> > class ambiquity with HttpServer2.
> >
> > It is present as a class inside of the "hadoop-client-api" and within
> > "hadoop-common".
> >
> > Is this setup wrong or should we try something different here?
> >
> > Gruß
> > Richard
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
> 
> 

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-11 Thread Ayush Saxena
Hi Richard,
I am not able to decode the issue properly here, It would have been
better if you shared the PR or the failure trace as well.
QQ: Why are you having hadoop-common as an explicit dependency? Those
hadoop-common stuff should be there in hadoop-client-api
I quickly checked once on the 3.4.0 release and I think it does have them.

```
ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar |
grep org/apache/hadoop/fs/FileSystem.class
org/apache/hadoop/fs/FileSystem.class
``

You didn't mention which shaded classes are being reported as
missing... I think spark uses these client jars, you can use that as
an example, can grab pointers from here: [1] & [2]

-Ayush

[1] https://github.com/apache/spark/blob/master/pom.xml#L1361
[2] https://issues.apache.org/jira/browse/SPARK-33212

On Thu, 11 Apr 2024 at 17:09, Richard Zowalla  wrote:
>
> Hi all,
>
> we are using "hadoop-minicluster" in Apache Storm to test our hdfs
> integration.
>
> Recently, we were cleaning up our dependencies and I noticed, that if I
> am adding
>
> 
> org.apache.hadoop
> hadoop-client-api
> ${hadoop.version}
> 
> 
> org.apache.hadoop
> hadoop-client-runtime
> ${hadoop.version}
> 
>
> and have
> 
> org.apache.hadoop
> hadoop-minicluster
> ${hadoop.version}
> test
> 
>
> as a test dependency to setup a mini-cluster to test our storm-hdfs
> integration.
>
> This fails weirdly because of missing (shaded) classes as well as a
> class ambiquity with HttpServer2.
>
> It is present as a class inside of the "hadoop-client-api" and within
> "hadoop-common".
>
> Is this setup wrong or should we try something different here?
>
> Gruß
> Richard

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Recommended way of using hadoop-minicluster für unit testing?

2024-04-11 Thread Richard Zowalla
Hi all,

we are using "hadoop-minicluster" in Apache Storm to test our hdfs
integration.

Recently, we were cleaning up our dependencies and I noticed, that if I
am adding


org.apache.hadoop
hadoop-client-api
${hadoop.version}


org.apache.hadoop
hadoop-client-runtime
${hadoop.version}


and have

org.apache.hadoop
hadoop-minicluster
${hadoop.version}
test


as a test dependency to setup a mini-cluster to test our storm-hdfs
integration.

This fails weirdly because of missing (shaded) classes as well as a
class ambiquity with HttpServer2. 

It is present as a class inside of the "hadoop-client-api" and within
"hadoop-common". 

Is this setup wrong or should we try something different here?

Gruß
Richard


signature.asc
Description: This is a digitally signed message part