Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-29 Thread slfan1989
+1

I agree with Xiaoqiao He's idea.

Best Regards,
Shilun Fan.

On Fri, Mar 1, 2024 at 12:55 PM Xiaoqiao He  wrote:

> Thanks Shilun for your great work! It is acceptable for me to release
> 3.4.0 first which is dependent on hadoop-thirdparty-1.2.0.
> Then push forwards to fix the following issues mentioned above at the next
> release version.
> I don't think we can solve all historical issues in one release. If it is
> possible, we could mark this release (release-3.4.0) as an
> unstable version.
> Any thoughts? Thanks again.
>
> Best Regards,
> - He Xiaoqiao
>
> On Fri, Mar 1, 2024 at 12:19 PM slfan1989  wrote:
>
>> I expect to initiate a vote for hadoop-3.4.0-RC3 in preparation for the
>> hadoop-3.4.0 release. We have been working on this for 2 months and have
>> already released hadoop-thirdparty-1.2.0.
>>
>> Regarding the issue described in HADOOP-19090, I believe we can address
>> it in the hadoop-3.4.1 release because not all improvements can be expected
>> to be completed in hadoop-3.4.0.
>>
>> I commented on HADOOP-19090:
>>
>> I am not opposed to releasing hadoop-thirdparty-1.2.1, but I don't think
>> now is a good time to do so. If we were to release hadoop-thirdparty-1.2.1,
>> our process is too lengthy:
>>
>> 1. We need to announce this in a public mailing list.
>> 2. Then initiate a vote, and after the vote passes, release
>> hadoop-thirdparty-1.2.1.
>> 3. Introduce version 1.2.1 in the Hadoop trunk branch.
>> 4. backport hadoop-3.4.0
>>
>> Even if we upgrade to protobuf-3.23.4, there might still be other issues.
>> If there really are other issues, would we need to release
>> hadoop-thirdparty-1.2.2?
>>
>> I think a better approach would be:
>>
>> To notify about this in the release email for hadoop-3.4.0, and then
>> release hadoop-thirdparty-1.2.1 before the release of hadoop-3.4.1,
>> followed by thorough validation.
>>
>> I would like to hear the thoughts of other members.
>>
>> Best Regards,
>> Shilun Fan.
>>
>> On Fri, Mar 1, 2024 at 6:05 AM slfan1989  wrote:
>>
>>> Thank you for the feedback on this issue!
>>>
>>> We have already released hadoop-thirdparty-1.2.0. I think we should not
>>> release hadoop-thirdparty-1.2.1 before the launch of hadoop-3.4.0, as we
>>> are already short on time.
>>>
>>> Can we consider addressing this matter with the release of hadoop-3.4.1
>>> instead?
>>>
>>> From my personal point of view, I hope to solve this problem in
>>> hadoop-3.4.1.
>>>
>>> Best Regards,
>>> Shilun Fan.
>>>
>>> On Fri, Mar 1, 2024 at 5:37 AM PJ Fanning  wrote:
>>>
 There is an issue with the protobuf lib - described here [1]

 The idea would be to do a new hadoop-thirdparty release and uptake that.

 Related the hadoop-thirdparty uptake, I would like to get the Avro
 uptake merged [2]. I think if we don't merge this for Hadoop 3.4.0, we
 will have to wait until v3.5.0 instead because changing the Avro
 compilation is probably something that you would want in a patch
 release.


 [1] https://issues.apache.org/jira/browse/HADOOP-19090
 [2] https://github.com/apache/hadoop/pull/4854#issuecomment-1967549235


 On Thu, 29 Feb 2024 at 22:24, slfan1989  wrote:
 >
 > I am preparing hadoop-3.4.0-RC3 as we have already released 3 RC
 versions
 > before, and I hope hadoop-3.4.0-RC3 will receive the approval of the
 > members.
 >
 > Compared to hadoop-3.4.0-RC2, my plan is to backport 2 PRs from
 branch-3.4
 > to branch-3.4.0:
 >
 > HADOOP-18088: Replacing log4j 1.x with reload4j.
 > HADOOP-19084: Pruning hadoop-common transitive dependencies.
 >
 > I will use hadoop-release-support to package the arm version.
 >
 > I plan to release hadoop-3.4.0-RC3 next Monday.
 >
 > Best Regards,
 > Shilun Fan.
 >
 > On Sat, Feb 24, 2024 at 11:28 AM slfan1989 
 wrote:
 >
 > > Thank you very much for Steve's detailed test report and issue
 description!
 > >
 > >  I appreciate your time spent helping with validation. I am
 currently
 > > trying to use hadoop-release-support to prepare hadoop-3.4.0-RC3.
 > >
 > > After completing the hadoop-3.4.0 version, I will document some of
 the
 > > issues encountered in the "how to release" document, so that future
 members
 > > can refer to it during the release process.
 > >
 > > Once again, thank you to all members involved in the hadoop-3.4.0
 release.
 > >
 > > Let's hope for a smooth release process.
 > >
 > > Best Regards,
 > > Shilun Fan.
 > >
 > > On Sat, Feb 24, 2024 at 2:29 AM Steve Loughran
 
 > > wrote:
 > >
 > >> I have been testing this all week, and a -1 until some very minor
 changes
 > >> go in.
 > >>
 > >>
 > >>1. build the arm64 binaries with the same jar artifacts as the
 x86 one
 > >>2. include ad8b6541117b HADOOP-18088. Replace log4j 1.x with
 reload4

Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-29 Thread Xiaoqiao He
Thanks Shilun for your great work! It is acceptable for me to release 3.4.0
first which is dependent on hadoop-thirdparty-1.2.0.
Then push forwards to fix the following issues mentioned above at the next
release version.
I don't think we can solve all historical issues in one release. If it is
possible, we could mark this release (release-3.4.0) as an
unstable version.
Any thoughts? Thanks again.

Best Regards,
- He Xiaoqiao

On Fri, Mar 1, 2024 at 12:19 PM slfan1989  wrote:

> I expect to initiate a vote for hadoop-3.4.0-RC3 in preparation for the
> hadoop-3.4.0 release. We have been working on this for 2 months and have
> already released hadoop-thirdparty-1.2.0.
>
> Regarding the issue described in HADOOP-19090, I believe we can address it
> in the hadoop-3.4.1 release because not all improvements can be expected to
> be completed in hadoop-3.4.0.
>
> I commented on HADOOP-19090:
>
> I am not opposed to releasing hadoop-thirdparty-1.2.1, but I don't think
> now is a good time to do so. If we were to release hadoop-thirdparty-1.2.1,
> our process is too lengthy:
>
> 1. We need to announce this in a public mailing list.
> 2. Then initiate a vote, and after the vote passes, release
> hadoop-thirdparty-1.2.1.
> 3. Introduce version 1.2.1 in the Hadoop trunk branch.
> 4. backport hadoop-3.4.0
>
> Even if we upgrade to protobuf-3.23.4, there might still be other issues.
> If there really are other issues, would we need to release
> hadoop-thirdparty-1.2.2?
>
> I think a better approach would be:
>
> To notify about this in the release email for hadoop-3.4.0, and then
> release hadoop-thirdparty-1.2.1 before the release of hadoop-3.4.1,
> followed by thorough validation.
>
> I would like to hear the thoughts of other members.
>
> Best Regards,
> Shilun Fan.
>
> On Fri, Mar 1, 2024 at 6:05 AM slfan1989  wrote:
>
>> Thank you for the feedback on this issue!
>>
>> We have already released hadoop-thirdparty-1.2.0. I think we should not
>> release hadoop-thirdparty-1.2.1 before the launch of hadoop-3.4.0, as we
>> are already short on time.
>>
>> Can we consider addressing this matter with the release of hadoop-3.4.1
>> instead?
>>
>> From my personal point of view, I hope to solve this problem in
>> hadoop-3.4.1.
>>
>> Best Regards,
>> Shilun Fan.
>>
>> On Fri, Mar 1, 2024 at 5:37 AM PJ Fanning  wrote:
>>
>>> There is an issue with the protobuf lib - described here [1]
>>>
>>> The idea would be to do a new hadoop-thirdparty release and uptake that.
>>>
>>> Related the hadoop-thirdparty uptake, I would like to get the Avro
>>> uptake merged [2]. I think if we don't merge this for Hadoop 3.4.0, we
>>> will have to wait until v3.5.0 instead because changing the Avro
>>> compilation is probably something that you would want in a patch
>>> release.
>>>
>>>
>>> [1] https://issues.apache.org/jira/browse/HADOOP-19090
>>> [2] https://github.com/apache/hadoop/pull/4854#issuecomment-1967549235
>>>
>>>
>>> On Thu, 29 Feb 2024 at 22:24, slfan1989  wrote:
>>> >
>>> > I am preparing hadoop-3.4.0-RC3 as we have already released 3 RC
>>> versions
>>> > before, and I hope hadoop-3.4.0-RC3 will receive the approval of the
>>> > members.
>>> >
>>> > Compared to hadoop-3.4.0-RC2, my plan is to backport 2 PRs from
>>> branch-3.4
>>> > to branch-3.4.0:
>>> >
>>> > HADOOP-18088: Replacing log4j 1.x with reload4j.
>>> > HADOOP-19084: Pruning hadoop-common transitive dependencies.
>>> >
>>> > I will use hadoop-release-support to package the arm version.
>>> >
>>> > I plan to release hadoop-3.4.0-RC3 next Monday.
>>> >
>>> > Best Regards,
>>> > Shilun Fan.
>>> >
>>> > On Sat, Feb 24, 2024 at 11:28 AM slfan1989 
>>> wrote:
>>> >
>>> > > Thank you very much for Steve's detailed test report and issue
>>> description!
>>> > >
>>> > >  I appreciate your time spent helping with validation. I am currently
>>> > > trying to use hadoop-release-support to prepare hadoop-3.4.0-RC3.
>>> > >
>>> > > After completing the hadoop-3.4.0 version, I will document some of
>>> the
>>> > > issues encountered in the "how to release" document, so that future
>>> members
>>> > > can refer to it during the release process.
>>> > >
>>> > > Once again, thank you to all members involved in the hadoop-3.4.0
>>> release.
>>> > >
>>> > > Let's hope for a smooth release process.
>>> > >
>>> > > Best Regards,
>>> > > Shilun Fan.
>>> > >
>>> > > On Sat, Feb 24, 2024 at 2:29 AM Steve Loughran
>>> 
>>> > > wrote:
>>> > >
>>> > >> I have been testing this all week, and a -1 until some very minor
>>> changes
>>> > >> go in.
>>> > >>
>>> > >>
>>> > >>1. build the arm64 binaries with the same jar artifacts as the
>>> x86 one
>>> > >>2. include ad8b6541117b HADOOP-18088. Replace log4j 1.x with
>>> reload4j.
>>> > >>3. include 80b4bb68159c HADOOP-19084. Prune hadoop-common
>>> transitive
>>> > >>dependencies
>>> > >>
>>> > >>
>>> > >> For #1 we have automation there in my client-validator module,
>>> which I
>>> > >> have
>>> > >> moved to be a hadoop-managed project

Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-29 Thread slfan1989
I expect to initiate a vote for hadoop-3.4.0-RC3 in preparation for the
hadoop-3.4.0 release. We have been working on this for 2 months and have
already released hadoop-thirdparty-1.2.0.

Regarding the issue described in HADOOP-19090, I believe we can address it
in the hadoop-3.4.1 release because not all improvements can be expected to
be completed in hadoop-3.4.0.

I commented on HADOOP-19090:

I am not opposed to releasing hadoop-thirdparty-1.2.1, but I don't think
now is a good time to do so. If we were to release hadoop-thirdparty-1.2.1,
our process is too lengthy:

1. We need to announce this in a public mailing list.
2. Then initiate a vote, and after the vote passes, release
hadoop-thirdparty-1.2.1.
3. Introduce version 1.2.1 in the Hadoop trunk branch.
4. backport hadoop-3.4.0

Even if we upgrade to protobuf-3.23.4, there might still be other issues.
If there really are other issues, would we need to release
hadoop-thirdparty-1.2.2?

I think a better approach would be:

To notify about this in the release email for hadoop-3.4.0, and then
release hadoop-thirdparty-1.2.1 before the release of hadoop-3.4.1,
followed by thorough validation.

I would like to hear the thoughts of other members.

Best Regards,
Shilun Fan.

On Fri, Mar 1, 2024 at 6:05 AM slfan1989  wrote:

> Thank you for the feedback on this issue!
>
> We have already released hadoop-thirdparty-1.2.0. I think we should not
> release hadoop-thirdparty-1.2.1 before the launch of hadoop-3.4.0, as we
> are already short on time.
>
> Can we consider addressing this matter with the release of hadoop-3.4.1
> instead?
>
> From my personal point of view, I hope to solve this problem in
> hadoop-3.4.1.
>
> Best Regards,
> Shilun Fan.
>
> On Fri, Mar 1, 2024 at 5:37 AM PJ Fanning  wrote:
>
>> There is an issue with the protobuf lib - described here [1]
>>
>> The idea would be to do a new hadoop-thirdparty release and uptake that.
>>
>> Related the hadoop-thirdparty uptake, I would like to get the Avro
>> uptake merged [2]. I think if we don't merge this for Hadoop 3.4.0, we
>> will have to wait until v3.5.0 instead because changing the Avro
>> compilation is probably something that you would want in a patch
>> release.
>>
>>
>> [1] https://issues.apache.org/jira/browse/HADOOP-19090
>> [2] https://github.com/apache/hadoop/pull/4854#issuecomment-1967549235
>>
>>
>> On Thu, 29 Feb 2024 at 22:24, slfan1989  wrote:
>> >
>> > I am preparing hadoop-3.4.0-RC3 as we have already released 3 RC
>> versions
>> > before, and I hope hadoop-3.4.0-RC3 will receive the approval of the
>> > members.
>> >
>> > Compared to hadoop-3.4.0-RC2, my plan is to backport 2 PRs from
>> branch-3.4
>> > to branch-3.4.0:
>> >
>> > HADOOP-18088: Replacing log4j 1.x with reload4j.
>> > HADOOP-19084: Pruning hadoop-common transitive dependencies.
>> >
>> > I will use hadoop-release-support to package the arm version.
>> >
>> > I plan to release hadoop-3.4.0-RC3 next Monday.
>> >
>> > Best Regards,
>> > Shilun Fan.
>> >
>> > On Sat, Feb 24, 2024 at 11:28 AM slfan1989 
>> wrote:
>> >
>> > > Thank you very much for Steve's detailed test report and issue
>> description!
>> > >
>> > >  I appreciate your time spent helping with validation. I am currently
>> > > trying to use hadoop-release-support to prepare hadoop-3.4.0-RC3.
>> > >
>> > > After completing the hadoop-3.4.0 version, I will document some of the
>> > > issues encountered in the "how to release" document, so that future
>> members
>> > > can refer to it during the release process.
>> > >
>> > > Once again, thank you to all members involved in the hadoop-3.4.0
>> release.
>> > >
>> > > Let's hope for a smooth release process.
>> > >
>> > > Best Regards,
>> > > Shilun Fan.
>> > >
>> > > On Sat, Feb 24, 2024 at 2:29 AM Steve Loughran
>> 
>> > > wrote:
>> > >
>> > >> I have been testing this all week, and a -1 until some very minor
>> changes
>> > >> go in.
>> > >>
>> > >>
>> > >>1. build the arm64 binaries with the same jar artifacts as the
>> x86 one
>> > >>2. include ad8b6541117b HADOOP-18088. Replace log4j 1.x with
>> reload4j.
>> > >>3. include 80b4bb68159c HADOOP-19084. Prune hadoop-common
>> transitive
>> > >>dependencies
>> > >>
>> > >>
>> > >> For #1 we have automation there in my client-validator module, which
>> I
>> > >> have
>> > >> moved to be a hadoop-managed project and tried to make more
>> > >> manageable
>> > >> https://github.com/apache/hadoop-release-support
>> > >>
>> > >> This contains an ant project to perform a lot of the documented build
>> > >> stages, including using SCP to copy down an x86 release tarball and
>> make a
>> > >> signed copy of this containing (locally built) arm artifacts.
>> > >>
>> > >> Although that only works with my development environment (macbook m1
>> > >> laptop
>> > >> and remote ec2 server), it should be straightforward to make it more
>> > >> flexible.
>> > >>
>> > >> It also includes and tests a maven project which imports many of the
>> > >> hadoop-* pom fi

Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-29 Thread slfan1989
I am preparing hadoop-3.4.0-RC3 as we have already released 3 RC versions
before, and I hope hadoop-3.4.0-RC3 will receive the approval of the
members.

Compared to hadoop-3.4.0-RC2, my plan is to backport 2 PRs from branch-3.4
to branch-3.4.0:

HADOOP-18088: Replacing log4j 1.x with reload4j.
HADOOP-19084: Pruning hadoop-common transitive dependencies.

I will use hadoop-release-support to package the arm version.

I plan to release hadoop-3.4.0-RC3 next Monday.

Best Regards,
Shilun Fan.

On Sat, Feb 24, 2024 at 11:28 AM slfan1989  wrote:

> Thank you very much for Steve's detailed test report and issue description!
>
>  I appreciate your time spent helping with validation. I am currently
> trying to use hadoop-release-support to prepare hadoop-3.4.0-RC3.
>
> After completing the hadoop-3.4.0 version, I will document some of the
> issues encountered in the "how to release" document, so that future members
> can refer to it during the release process.
>
> Once again, thank you to all members involved in the hadoop-3.4.0 release.
>
> Let's hope for a smooth release process.
>
> Best Regards,
> Shilun Fan.
>
> On Sat, Feb 24, 2024 at 2:29 AM Steve Loughran 
> wrote:
>
>> I have been testing this all week, and a -1 until some very minor changes
>> go in.
>>
>>
>>1. build the arm64 binaries with the same jar artifacts as the x86 one
>>2. include ad8b6541117b HADOOP-18088. Replace log4j 1.x with reload4j.
>>3. include 80b4bb68159c HADOOP-19084. Prune hadoop-common transitive
>>dependencies
>>
>>
>> For #1 we have automation there in my client-validator module, which I
>> have
>> moved to be a hadoop-managed project and tried to make more
>> manageable
>> https://github.com/apache/hadoop-release-support
>>
>> This contains an ant project to perform a lot of the documented build
>> stages, including using SCP to copy down an x86 release tarball and make a
>> signed copy of this containing (locally built) arm artifacts.
>>
>> Although that only works with my development environment (macbook m1
>> laptop
>> and remote ec2 server), it should be straightforward to make it more
>> flexible.
>>
>> It also includes and tests a maven project which imports many of the
>> hadoop-* pom files and run some test with it; this caught some problems
>> with exported slf4j and log4j2 artifacts getting into the classpath. That
>> is: hadoop-common pulling in log4j 1.2 and 2.x bindings.
>>
>> HADOOP-19084 fixes this; the build file now includes a target to scan the
>> dependencies and fail if "forbidden" artifacts are found. I have not been
>> able to stop logback ending on the transitive dependency list, but at
>> least
>> there is only one slf4j there.
>>
>> HADOOP-18088. Replace log4j 1.x with reload4j switches over to reload4j
>> while the move to v2 is still something we have to consider a WiP.
>>
>> I have tried doing some other changes to the packaging this week
>> - creating a lean distro without the AWS SDK
>> - trying to get protobuf-2.5 out of yarn-api
>> However, I think it is too late to try applying patches this risky.
>>
>> I Believe we should get the 3.4.0 release out for people to start playing
>> with while we rapidly iterate 3.4.1 release out with
>> - updated dependencies (where possible)
>> - separate "lean" and "full" installations, where "full" includes all the
>> cloud connectors and their dependencies; the default is lean and doesn't.
>> That will cut the default download size in half.
>> - critical issues which people who use the 3.4.0 release raise with us.
>>
>> That is: a packaging and bugs release, with a minimal number of new
>> features.
>>
>> I've created HADOOP-19087
>>  to cover this,
>> I'm willing to get my hands dirty here -Shilun Fan and Xiaoqiao He have
>> put
>> a lot of work on 3.4.0 and probably need other people to take up the work
>> for next release. Who else is willing to participate? (Yes Mukund, I have
>> you in mind too)
>>
>> One thing I would like to visit is: what hadoop-tools modules can we cut?
>> Are rumen and hadoop-streaming being actively used? Or can we consider
>> them
>> implicitly EOL and strip. Just think of the maintenance effort we would
>> save.
>>
>> ---
>>
>> Incidentally, I have tested the arm stuff on my raspberry pi5 which is now
>> running 64 bit linux. I believe it is the first time we have qualified a
>> Hadoop release with the media player under someone's television.
>>
>> On Thu, 15 Feb 2024 at 20:41, Mukund Madhav Thakur 
>> wrote:
>>
>> > Thanks, Shilun for putting this together.
>> >
>> > Tried the below things and everything worked for me.
>> >
>> > validated checksum and gpg signature.
>> > compiled from source.
>> > Ran AWS integration tests.
>> > untar the binaries and able to access objects in S3 via hadoop fs
>> commands.
>> > compiled gcs-connector successfully using the 3.4.0 version.
>> >
>> > qq: what is the difference between RC1 and RC2? apart from some extra
>> > patches.
>

Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-23 Thread slfan1989
Thank you very much for Steve's detailed test report and issue description!

 I appreciate your time spent helping with validation. I am currently
trying to use hadoop-release-support to prepare hadoop-3.4.0-RC3.

After completing the hadoop-3.4.0 version, I will document some of the
issues encountered in the "how to release" document, so that future members
can refer to it during the release process.

Once again, thank you to all members involved in the hadoop-3.4.0 release.

Let's hope for a smooth release process.

Best Regards,
Shilun Fan.

On Sat, Feb 24, 2024 at 2:29 AM Steve Loughran 
wrote:

> I have been testing this all week, and a -1 until some very minor changes
> go in.
>
>
>1. build the arm64 binaries with the same jar artifacts as the x86 one
>2. include ad8b6541117b HADOOP-18088. Replace log4j 1.x with reload4j.
>3. include 80b4bb68159c HADOOP-19084. Prune hadoop-common transitive
>dependencies
>
>
> For #1 we have automation there in my client-validator module, which I have
> moved to be a hadoop-managed project and tried to make more
> manageable
> https://github.com/apache/hadoop-release-support
>
> This contains an ant project to perform a lot of the documented build
> stages, including using SCP to copy down an x86 release tarball and make a
> signed copy of this containing (locally built) arm artifacts.
>
> Although that only works with my development environment (macbook m1 laptop
> and remote ec2 server), it should be straightforward to make it more
> flexible.
>
> It also includes and tests a maven project which imports many of the
> hadoop-* pom files and run some test with it; this caught some problems
> with exported slf4j and log4j2 artifacts getting into the classpath. That
> is: hadoop-common pulling in log4j 1.2 and 2.x bindings.
>
> HADOOP-19084 fixes this; the build file now includes a target to scan the
> dependencies and fail if "forbidden" artifacts are found. I have not been
> able to stop logback ending on the transitive dependency list, but at least
> there is only one slf4j there.
>
> HADOOP-18088. Replace log4j 1.x with reload4j switches over to reload4j
> while the move to v2 is still something we have to consider a WiP.
>
> I have tried doing some other changes to the packaging this week
> - creating a lean distro without the AWS SDK
> - trying to get protobuf-2.5 out of yarn-api
> However, I think it is too late to try applying patches this risky.
>
> I Believe we should get the 3.4.0 release out for people to start playing
> with while we rapidly iterate 3.4.1 release out with
> - updated dependencies (where possible)
> - separate "lean" and "full" installations, where "full" includes all the
> cloud connectors and their dependencies; the default is lean and doesn't.
> That will cut the default download size in half.
> - critical issues which people who use the 3.4.0 release raise with us.
>
> That is: a packaging and bugs release, with a minimal number of new
> features.
>
> I've created HADOOP-19087
>  to cover this,
> I'm willing to get my hands dirty here -Shilun Fan and Xiaoqiao He have put
> a lot of work on 3.4.0 and probably need other people to take up the work
> for next release. Who else is willing to participate? (Yes Mukund, I have
> you in mind too)
>
> One thing I would like to visit is: what hadoop-tools modules can we cut?
> Are rumen and hadoop-streaming being actively used? Or can we consider them
> implicitly EOL and strip. Just think of the maintenance effort we would
> save.
>
> ---
>
> Incidentally, I have tested the arm stuff on my raspberry pi5 which is now
> running 64 bit linux. I believe it is the first time we have qualified a
> Hadoop release with the media player under someone's television.
>
> On Thu, 15 Feb 2024 at 20:41, Mukund Madhav Thakur 
> wrote:
>
> > Thanks, Shilun for putting this together.
> >
> > Tried the below things and everything worked for me.
> >
> > validated checksum and gpg signature.
> > compiled from source.
> > Ran AWS integration tests.
> > untar the binaries and able to access objects in S3 via hadoop fs
> commands.
> > compiled gcs-connector successfully using the 3.4.0 version.
> >
> > qq: what is the difference between RC1 and RC2? apart from some extra
> > patches.
> >
> >
> >
> > On Thu, Feb 15, 2024 at 10:58 AM slfan1989  wrote:
> >
> >> Thank you for explaining this part!
> >>
> >> hadoop-3.4.0-RC2 used the validate-hadoop-client-artifacts tool to
> >> generate
> >> the ARM tar package, which should meet expectations.
> >>
> >> We also look forward to other members helping to verify.
> >>
> >> Best Regards,
> >> Shilun Fan.
> >>
> >> On Fri, Feb 16, 2024 at 12:22 AM Steve Loughran 
> >> wrote:
> >>
> >> >
> >> >
> >> > On Mon, 12 Feb 2024 at 15:32, slfan1989  wrote:
> >> >
> >> >>
> >> >>
> >> >> Note, because the arm64 binaries are built separately on a different
> >> >> platform and JVM, their jar files may not match those of th

Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-23 Thread Steve Loughran
I have been testing this all week, and a -1 until some very minor changes
go in.


   1. build the arm64 binaries with the same jar artifacts as the x86 one
   2. include ad8b6541117b HADOOP-18088. Replace log4j 1.x with reload4j.
   3. include 80b4bb68159c HADOOP-19084. Prune hadoop-common transitive
   dependencies


For #1 we have automation there in my client-validator module, which I have
moved to be a hadoop-managed project and tried to make more
manageable
https://github.com/apache/hadoop-release-support

This contains an ant project to perform a lot of the documented build
stages, including using SCP to copy down an x86 release tarball and make a
signed copy of this containing (locally built) arm artifacts.

Although that only works with my development environment (macbook m1 laptop
and remote ec2 server), it should be straightforward to make it more
flexible.

It also includes and tests a maven project which imports many of the
hadoop-* pom files and run some test with it; this caught some problems
with exported slf4j and log4j2 artifacts getting into the classpath. That
is: hadoop-common pulling in log4j 1.2 and 2.x bindings.

HADOOP-19084 fixes this; the build file now includes a target to scan the
dependencies and fail if "forbidden" artifacts are found. I have not been
able to stop logback ending on the transitive dependency list, but at least
there is only one slf4j there.

HADOOP-18088. Replace log4j 1.x with reload4j switches over to reload4j
while the move to v2 is still something we have to consider a WiP.

I have tried doing some other changes to the packaging this week
- creating a lean distro without the AWS SDK
- trying to get protobuf-2.5 out of yarn-api
However, I think it is too late to try applying patches this risky.

I Believe we should get the 3.4.0 release out for people to start playing
with while we rapidly iterate 3.4.1 release out with
- updated dependencies (where possible)
- separate "lean" and "full" installations, where "full" includes all the
cloud connectors and their dependencies; the default is lean and doesn't.
That will cut the default download size in half.
- critical issues which people who use the 3.4.0 release raise with us.

That is: a packaging and bugs release, with a minimal number of new
features.

I've created HADOOP-19087
 to cover this,
I'm willing to get my hands dirty here -Shilun Fan and Xiaoqiao He have put
a lot of work on 3.4.0 and probably need other people to take up the work
for next release. Who else is willing to participate? (Yes Mukund, I have
you in mind too)

One thing I would like to visit is: what hadoop-tools modules can we cut?
Are rumen and hadoop-streaming being actively used? Or can we consider them
implicitly EOL and strip. Just think of the maintenance effort we would
save.

---

Incidentally, I have tested the arm stuff on my raspberry pi5 which is now
running 64 bit linux. I believe it is the first time we have qualified a
Hadoop release with the media player under someone's television.

On Thu, 15 Feb 2024 at 20:41, Mukund Madhav Thakur 
wrote:

> Thanks, Shilun for putting this together.
>
> Tried the below things and everything worked for me.
>
> validated checksum and gpg signature.
> compiled from source.
> Ran AWS integration tests.
> untar the binaries and able to access objects in S3 via hadoop fs commands.
> compiled gcs-connector successfully using the 3.4.0 version.
>
> qq: what is the difference between RC1 and RC2? apart from some extra
> patches.
>
>
>
> On Thu, Feb 15, 2024 at 10:58 AM slfan1989  wrote:
>
>> Thank you for explaining this part!
>>
>> hadoop-3.4.0-RC2 used the validate-hadoop-client-artifacts tool to
>> generate
>> the ARM tar package, which should meet expectations.
>>
>> We also look forward to other members helping to verify.
>>
>> Best Regards,
>> Shilun Fan.
>>
>> On Fri, Feb 16, 2024 at 12:22 AM Steve Loughran 
>> wrote:
>>
>> >
>> >
>> > On Mon, 12 Feb 2024 at 15:32, slfan1989  wrote:
>> >
>> >>
>> >>
>> >> Note, because the arm64 binaries are built separately on a different
>> >> platform and JVM, their jar files may not match those of the x86
>> >> release -and therefore the maven artifacts. I don't think this is
>> >> an issue (the ASF actually releases source tarballs, the binaries are
>> >> there for help only, though with the maven repo that's a bit blurred).
>> >>
>> >> The only way to be consistent would actually untar the x86.tar.gz,
>> >> overwrite its binaries with the arm stuff, retar, sign and push out
>> >> for the vote.
>> >
>> >
>> >
>> > that's exactly what the "arm.release" target in my client-validator
>> does.
>> > builds an arm tar with the x86 binaries but the arm native libs, signs
>> it.
>> >
>> >
>> >
>> >> Even automating that would be risky.
>> >>
>> >>
>> > automating is the *only* way to do it; apache ant has everything needed
>> > for this including the ability to run gpg.
>> >
>> > we did this on the relevan

Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-18 Thread slfan1989
@Ayush Saxena  @Takanobu Asanuma
 @Xiaoqiao
He  @inigo...@apache.org  @Masatake
Iwasaki  @Akira Ajisaka

Could you please assist with the Hadoop-3.4.0-RC2 vote?

Best Regards,
Shilun Fan.

On Fri, Feb 16, 2024 at 7:27 AM slfan1989  wrote:

> Thank you for helping review hadoop-3.4.0-RC2.
>
> Compared to RC1, we have made two improvements:
> 1. Merged some patches from the branch-3.4 branch to branch-3.4.0.
> 2. Upgrade the version of hadoop-thirdparty to 1.2.0.
>
> Best Regards,
> Shilun Fan.
>
> On Fri, Feb 16, 2024 at 4:41 AM Mukund Madhav Thakur 
> wrote:
>
>> Thanks, Shilun for putting this together.
>>
>> Tried the below things and everything worked for me.
>>
>> validated checksum and gpg signature.
>> compiled from source.
>> Ran AWS integration tests.
>> untar the binaries and able to access objects in S3 via hadoop fs
>> commands.
>> compiled gcs-connector successfully using the 3.4.0 version.
>>
>> qq: what is the difference between RC1 and RC2? apart from some extra
>> patches.
>>
>>
>>
>> On Thu, Feb 15, 2024 at 10:58 AM slfan1989  wrote:
>>
>>> Thank you for explaining this part!
>>>
>>> hadoop-3.4.0-RC2 used the validate-hadoop-client-artifacts tool to
>>> generate
>>> the ARM tar package, which should meet expectations.
>>>
>>> We also look forward to other members helping to verify.
>>>
>>> Best Regards,
>>> Shilun Fan.
>>>
>>> On Fri, Feb 16, 2024 at 12:22 AM Steve Loughran 
>>> wrote:
>>>
>>> >
>>> >
>>> > On Mon, 12 Feb 2024 at 15:32, slfan1989  wrote:
>>> >
>>> >>
>>> >>
>>> >> Note, because the arm64 binaries are built separately on a different
>>> >> platform and JVM, their jar files may not match those of the x86
>>> >> release -and therefore the maven artifacts. I don't think this is
>>> >> an issue (the ASF actually releases source tarballs, the binaries are
>>> >> there for help only, though with the maven repo that's a bit blurred).
>>> >>
>>> >> The only way to be consistent would actually untar the x86.tar.gz,
>>> >> overwrite its binaries with the arm stuff, retar, sign and push out
>>> >> for the vote.
>>> >
>>> >
>>> >
>>> > that's exactly what the "arm.release" target in my client-validator
>>> does.
>>> > builds an arm tar with the x86 binaries but the arm native libs, signs
>>> it.
>>> >
>>> >
>>> >
>>> >> Even automating that would be risky.
>>> >>
>>> >>
>>> > automating is the *only* way to do it; apache ant has everything needed
>>> > for this including the ability to run gpg.
>>> >
>>> > we did this on the relevant 3.3.x releases and nobody has yet
>>> complained...
>>> >
>>>
>>


Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-15 Thread slfan1989
Thank you for helping review hadoop-3.4.0-RC2.

Compared to RC1, we have made two improvements:
1. Merged some patches from the branch-3.4 branch to branch-3.4.0.
2. Upgrade the version of hadoop-thirdparty to 1.2.0.

Best Regards,
Shilun Fan.

On Fri, Feb 16, 2024 at 4:41 AM Mukund Madhav Thakur 
wrote:

> Thanks, Shilun for putting this together.
>
> Tried the below things and everything worked for me.
>
> validated checksum and gpg signature.
> compiled from source.
> Ran AWS integration tests.
> untar the binaries and able to access objects in S3 via hadoop fs commands.
> compiled gcs-connector successfully using the 3.4.0 version.
>
> qq: what is the difference between RC1 and RC2? apart from some extra
> patches.
>
>
>
> On Thu, Feb 15, 2024 at 10:58 AM slfan1989  wrote:
>
>> Thank you for explaining this part!
>>
>> hadoop-3.4.0-RC2 used the validate-hadoop-client-artifacts tool to
>> generate
>> the ARM tar package, which should meet expectations.
>>
>> We also look forward to other members helping to verify.
>>
>> Best Regards,
>> Shilun Fan.
>>
>> On Fri, Feb 16, 2024 at 12:22 AM Steve Loughran 
>> wrote:
>>
>> >
>> >
>> > On Mon, 12 Feb 2024 at 15:32, slfan1989  wrote:
>> >
>> >>
>> >>
>> >> Note, because the arm64 binaries are built separately on a different
>> >> platform and JVM, their jar files may not match those of the x86
>> >> release -and therefore the maven artifacts. I don't think this is
>> >> an issue (the ASF actually releases source tarballs, the binaries are
>> >> there for help only, though with the maven repo that's a bit blurred).
>> >>
>> >> The only way to be consistent would actually untar the x86.tar.gz,
>> >> overwrite its binaries with the arm stuff, retar, sign and push out
>> >> for the vote.
>> >
>> >
>> >
>> > that's exactly what the "arm.release" target in my client-validator
>> does.
>> > builds an arm tar with the x86 binaries but the arm native libs, signs
>> it.
>> >
>> >
>> >
>> >> Even automating that would be risky.
>> >>
>> >>
>> > automating is the *only* way to do it; apache ant has everything needed
>> > for this including the ability to run gpg.
>> >
>> > we did this on the relevant 3.3.x releases and nobody has yet
>> complained...
>> >
>>
>


Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-15 Thread Mukund Madhav Thakur
Thanks, Shilun for putting this together.

Tried the below things and everything worked for me.

validated checksum and gpg signature.
compiled from source.
Ran AWS integration tests.
untar the binaries and able to access objects in S3 via hadoop fs commands.
compiled gcs-connector successfully using the 3.4.0 version.

qq: what is the difference between RC1 and RC2? apart from some extra
patches.



On Thu, Feb 15, 2024 at 10:58 AM slfan1989  wrote:

> Thank you for explaining this part!
>
> hadoop-3.4.0-RC2 used the validate-hadoop-client-artifacts tool to generate
> the ARM tar package, which should meet expectations.
>
> We also look forward to other members helping to verify.
>
> Best Regards,
> Shilun Fan.
>
> On Fri, Feb 16, 2024 at 12:22 AM Steve Loughran 
> wrote:
>
> >
> >
> > On Mon, 12 Feb 2024 at 15:32, slfan1989  wrote:
> >
> >>
> >>
> >> Note, because the arm64 binaries are built separately on a different
> >> platform and JVM, their jar files may not match those of the x86
> >> release -and therefore the maven artifacts. I don't think this is
> >> an issue (the ASF actually releases source tarballs, the binaries are
> >> there for help only, though with the maven repo that's a bit blurred).
> >>
> >> The only way to be consistent would actually untar the x86.tar.gz,
> >> overwrite its binaries with the arm stuff, retar, sign and push out
> >> for the vote.
> >
> >
> >
> > that's exactly what the "arm.release" target in my client-validator does.
> > builds an arm tar with the x86 binaries but the arm native libs, signs
> it.
> >
> >
> >
> >> Even automating that would be risky.
> >>
> >>
> > automating is the *only* way to do it; apache ant has everything needed
> > for this including the ability to run gpg.
> >
> > we did this on the relevant 3.3.x releases and nobody has yet
> complained...
> >
>


Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-15 Thread slfan1989
Thank you for explaining this part!

hadoop-3.4.0-RC2 used the validate-hadoop-client-artifacts tool to generate
the ARM tar package, which should meet expectations.

We also look forward to other members helping to verify.

Best Regards,
Shilun Fan.

On Fri, Feb 16, 2024 at 12:22 AM Steve Loughran  wrote:

>
>
> On Mon, 12 Feb 2024 at 15:32, slfan1989  wrote:
>
>>
>>
>> Note, because the arm64 binaries are built separately on a different
>> platform and JVM, their jar files may not match those of the x86
>> release -and therefore the maven artifacts. I don't think this is
>> an issue (the ASF actually releases source tarballs, the binaries are
>> there for help only, though with the maven repo that's a bit blurred).
>>
>> The only way to be consistent would actually untar the x86.tar.gz,
>> overwrite its binaries with the arm stuff, retar, sign and push out
>> for the vote.
>
>
>
> that's exactly what the "arm.release" target in my client-validator does.
> builds an arm tar with the x86 binaries but the arm native libs, signs it.
>
>
>
>> Even automating that would be risky.
>>
>>
> automating is the *only* way to do it; apache ant has everything needed
> for this including the ability to run gpg.
>
> we did this on the relevant 3.3.x releases and nobody has yet complained...
>


Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-15 Thread Steve Loughran
On Mon, 12 Feb 2024 at 15:32, slfan1989  wrote:

>
>
> Note, because the arm64 binaries are built separately on a different
> platform and JVM, their jar files may not match those of the x86
> release -and therefore the maven artifacts. I don't think this is
> an issue (the ASF actually releases source tarballs, the binaries are
> there for help only, though with the maven repo that's a bit blurred).
>
> The only way to be consistent would actually untar the x86.tar.gz,
> overwrite its binaries with the arm stuff, retar, sign and push out
> for the vote.



that's exactly what the "arm.release" target in my client-validator does.
builds an arm tar with the x86 binaries but the arm native libs, signs it.



> Even automating that would be risky.
>
>
automating is the *only* way to do it; apache ant has everything needed for
this including the ability to run gpg.

we did this on the relevant 3.3.x releases and nobody has yet complained...


[VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-12 Thread slfan1989
Hi folks,

Xiaoqiao He and I have put together a release candidate (RC2) for Hadoop
3.4.0.

What we would like is for anyone who can to verify the tarballs, especially
anyone who can try the arm64 binaries as we want to include them too.

The RC is available at:
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.0-RC2

The git tag is release-3.4.0-RC2, commit 88fbe62f27e

The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1402

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.0-RC2/CHANGELOG.md

Release notes
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.0-RC2/RELEASENOTES.md

This is off branch-3.4.0 and is the first big release since 3.3.6.

Key changes include

* S3A: Upgrade AWS SDK to V2
* HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
* YARN Federation improvements
* YARN Capacity Scheduler improvements
* HDFS RBF: Code Enhancements, New Features, and Bug Fixes
* HDFS EC: Code Enhancements and Bug Fixes
* Transitive CVE fixes

Differences from RC0 & RC1

* We've improved Hadoop 3.4.0 Highlight big features and improvements.
* Confirmed the JIRA status of Hadoop, HDFS, YARN, and MAPREDUCE modules.
* Use validate-hadoop-client-artifacts[1] for packaging and verification.
* Use hadoop-thirdparty-1.2.0

Note, because the arm64 binaries are built separately on a different
platform and JVM, their jar files may not match those of the x86
release -and therefore the maven artifacts. I don't think this is
an issue (the ASF actually releases source tarballs, the binaries are
there for help only, though with the maven repo that's a bit blurred).

The only way to be consistent would actually untar the x86.tar.gz,
overwrite its binaries with the arm stuff, retar, sign and push out
for the vote. Even automating that would be risky.

[1] validate-hadoop-client-artifacts:
https://github.com/steveloughran/validate-hadoop-client-artifacts
Thanks to steve for providing validate-hadoop-client-artifacts.

Best Regards,
Shilun Fan.