Re: [VOTE] Release Apache Hadoop Thirdparty 1.0.0 - RC1

2020-03-12 Thread Akira Ajisaka
+1

- Verified signatures and checksums
- Built jars and docs from source
- Built hadoop trunk with hadoop-thirdparty 1.0.0
- Checked rat files and documents
- Checked LICENSE and NOTICE files

Thanks,
Akira

On Thu, Mar 12, 2020 at 5:26 AM Vinayakumar B 
wrote:

> Hi folks,
>
> Thanks to everyone's help on this release.
>
> I have re-created a release candidate (RC1) for Apache Hadoop Thirdparty
> 1.0.0.
>
> RC Release artifacts are available at :
>
> http://home.apache.org/~vinayakumarb/release/hadoop-thirdparty-1.0.0-RC1/
>
> Maven artifacts are available in staging repo:
>
> https://repository.apache.org/content/repositories/orgapachehadoop-1261/
>
> The RC tag in git is here:
> https://github.com/apache/hadoop-thirdparty/tree/release-1.0.0-RC1
>
> And my public key is at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> *This vote will run for 5 days, ending on March 18th 2020 at 11:59 pm IST.*
>
> For the testing, I have verified Hadoop trunk compilation with
>"-DdistMgmtSnapshotsUrl=
> https://repository.apache.org/content/repositories/orgapachehadoop-1261/
>  -Dhadoop-thirdparty-protobuf.version=1.0.0"
>
> My +1 to start.
>
> -Vinay
>


Re: [DISCUSS] Accelerate Hadoop dependency updates

2020-03-12 Thread Wei-Chiu Chuang
That is unfortunately true.

Now that I recognize the impact of guava update in Hadoop 3.1/3.2, how can
we make this better for downstreamers to consume? Like I proposed, I think
a middle ground is to shade guava in hadoop-thirdparty, and include the
hadoop-thirdparty jar in the next Hadoop 3.1/3.2 release.



On Thu, Mar 12, 2020 at 12:03 AM Igor Dvorzhak 
wrote:

> How do you manage and version such dependency upgrades in subminor
> Haoop/Spark/Hive versions in Cloudera then? I would imagine that some
> upgrades will be breaking for customers and can not be shipped in subminor
> CDH release? Or this is in preparation for the next major/minor release of
> CDH?
>
> On Wed, Mar 11, 2020 at 5:45 PM Wei-Chiu Chuang
>  wrote:
>
>> FWIW we are updating guava in Spark and Hive at Cloudera. Don't know which
>> Apache version are they going to land, but we'll upstream them for sure.
>>
>> The guava change is debatable. It's not as critical as others. There are
>> critical vulnerabilities in other dependencies that we have no way but to
>> update to a new major/minor version because we are so far behind. And
>> given
>> the critical nature, I think it is worth the risk and backport to lower
>> maintenance releases is warranted. Moreover, our minor releases are at
>> best
>> 1 per year. That is too slow to respond to a critical vulnerability.
>>
>> On Wed, Mar 11, 2020 at 5:02 PM Igor Dvorzhak 
>> wrote:
>>
>> > Generally I'm for updating dependencies, but I think that Hadoop should
>> > stick with semantic versioning and do not make major and
>> > minor dependency updates in subminor releases.
>> >
>> > For  example, Hadoop 3.2.1 updated Guava to 27.0-jre, and because of
>> this
>> > Spark 3.0 stuck with Hadoop 3.2.0 - they use Hive 2.3.6 that doesn't
>> > support Guava 27.0-jre.
>> >
>> > It would be better to make dependency upgrades when releasing new
>> > major/minor versions, for example Guava 27.0-jre upgrade was more
>> > appropriate for Hadoop 3.3.0 release than 3.2.1.
>> >
>> > On Tue, Mar 10, 2020 at 3:03 PM Wei-Chiu Chuang
>> >  wrote:
>> >
>> >> I'm not hearing any feedback so far, but I want to suggest:
>> >>
>> >> use hadoop-thirdparty repository to host any dependencies that are
>> known
>> >> to
>> >> break compatibility.
>> >>
>> >> Candidate #1 guava
>> >> Candidate #2 Netty
>> >> Candidate #3 Jetty
>> >>
>> >> in fact, HBase shades these dependencies for the exact same reason.
>> >>
>> >> As an example of the cost of compatibility breakage: we spent the last
>> 6
>> >> months to backport the guava update change (guava 11 --> 27) throughout
>> >> Cloudera's stack, and after 6 months we are not done yet because we
>> have
>> >> to
>> >> update guava in Hadoop, Hive, Spark ..., and Hadoop, Hive and Spark's
>> >> guava
>> >> is in the classpath of every application.
>> >>
>> >> Thoughts?
>> >>
>> >> On Sat, Mar 7, 2020 at 9:31 AM Wei-Chiu Chuang 
>> >> wrote:
>> >>
>> >> > Hi Hadoop devs,
>> >> >
>> >> > I the past, Hadoop tends to be pretty far behind the latest versions
>> of
>> >> > dependencies. Part of that is due to the fear of the breaking changes
>> >> > brought in by the dependency updates.
>> >> >
>> >> > However, things have changed dramatically over the past few years.
>> With
>> >> > more focus on security vulnerabilities, more vulnerabilities are
>> >> discovered
>> >> > in our dependencies, and users put more pressure on patching Hadoop
>> (and
>> >> > its ecosystem) to use the latest dependency versions.
>> >> >
>> >> > As an example, Jackson-databind had 20 CVEs published in the last
>> year
>> >> > alone.
>> >> >
>> >>
>> https://www.cvedetails.com/product/42991/Fasterxml-Jackson-databind.html?vendor_id=15866
>> >> >
>> >> > Jetty: 4 CVEs in 2019:
>> >> >
>> >>
>> https://www.cvedetails.com/product/34824/Eclipse-Jetty.html?vendor_id=10410
>> >> >
>> >> > We can no longer keep Hadoop stay behind. The more we stay behind,
>> the
>> >> > harder it is to update. A good example is Jersey migration 1 -> 2
>> >> > HADOOP-15984 
>> >> contributed
>> >> > by Akira. Jersey 1 is no longer supported. But Jersey 2 migration is
>> >> hard.
>> >> > If any critical vulnerability is found in Jersey 1, it will leave us
>> in
>> >> a
>> >> > bad situation since we can't simply update Jersey version and be
>> done.
>> >> >
>> >> > Hadoop 3 adds new public artifacts that shade these dependencies. We
>> >> > should advocate downstream applications to use the public artifacts
>> to
>> >> > avoid breakage.
>> >> >
>> >> > I'd like to hear your thoughts: are you okay to see Hadoop keep up
>> with
>> >> > the latest dependency updates, or would rather stay behind to ensure
>> >> > compatibility?
>> >> >
>> >> > Coupled with that, I'd like to call for more frequent Hadoop releases
>> >> for
>> >> > the same purpose. IMHO that'll require better infrastructure to
>> assist
>> >> the
>> >> > release work and some rethinking our current Hadoop code structure,
>>

Re: [VOTE] Release Apache Hadoop Thirdparty 1.0.0 - RC1

2020-03-12 Thread Ayush Saxena
Thanx Vinay for driving the release.
+1(non-binding)
Built trunk with -Dhadoop-thirdparty-protobuf.version=1.0.0
Build from source on Ubuntu 19.10
Verified source checksum.

Good Luck!!!

-Ayush

On Thu, 12 Mar 2020 at 01:56, Vinayakumar B  wrote:

> Hi folks,
>
> Thanks to everyone's help on this release.
>
> I have re-created a release candidate (RC1) for Apache Hadoop Thirdparty
> 1.0.0.
>
> RC Release artifacts are available at :
>
> http://home.apache.org/~vinayakumarb/release/hadoop-thirdparty-1.0.0-RC1/
>
> Maven artifacts are available in staging repo:
>
> https://repository.apache.org/content/repositories/orgapachehadoop-1261/
>
> The RC tag in git is here:
> https://github.com/apache/hadoop-thirdparty/tree/release-1.0.0-RC1
>
> And my public key is at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> *This vote will run for 5 days, ending on March 18th 2020 at 11:59 pm IST.*
>
> For the testing, I have verified Hadoop trunk compilation with
>"-DdistMgmtSnapshotsUrl=
> https://repository.apache.org/content/repositories/orgapachehadoop-1261/
>  -Dhadoop-thirdparty-protobuf.version=1.0.0"
>
> My +1 to start.
>
> -Vinay
>


Re: [DISCUSS] Hadoop 3.3.0 Release include ARM binary

2020-03-12 Thread Akira Ajisaka
If you can provide ARM release for future releases, I'm fine with that.

Thanks,
Akira

On Thu, Mar 12, 2020 at 9:41 PM Brahma Reddy Battula 
wrote:

> thanks Akira.
>
> Currently only problem is dedicated ARM for future RM.This i want to sort
> out like below,if you've some other,please let me know.
>
> i) Single machine and share cred to future RM ( as we can delete keys once
> release is over).
> ii) Creating the jenkins project ( may be we need to discuss in the
> board..)
> iii) I can provide ARM release for future releases.
>
>
>
>
>
>
>
> On Thu, Mar 12, 2020 at 5:14 PM Akira Ajisaka  wrote:
>
> > Hi Brahma,
> >
> > I think we cannot do any of your proposed actions.
> >
> >
> http://www.apache.org/legal/release-policy.html#owned-controlled-hardware
> > > Strictly speaking, releases must be verified on hardware owned and
> > controlled by the committer. That means hardware the committer has
> physical
> > possession and control of and exclusively full administrative/superuser
> > access to. That's because only such hardware is qualified to hold a PGP
> > private key, and the release should be verified on the machine the
> private
> > key lives on or on a machine as trusted as that.
> >
> > https://www.apache.org/dev/release-distribution.html#sigs-and-sums
> > > Private keys MUST NOT be stored on any ASF machine. Likewise,
> signatures
> > for releases MUST NOT be created on ASF machines.
> >
> > We need to have dedicated physical ARM machines for each release manager,
> > and now it is not feasible.
> > If you provide an unofficial ARM binary release in some repository,
> that's
> > okay.
> >
> > -Akira
> >
> > On Thu, Mar 12, 2020 at 7:57 PM Brahma Reddy Battula 
> > wrote:
> >
> >> Hello folks,
> >>
> >> As currently trunk will support ARM based compilation and qbt(1) is
> >> running
> >> from several months with quite stable, hence planning to propose ARM
> >> binary
> >> this time.
> >>
> >> ( Note : As we'll know voting will be based on the source,so this will
> not
> >> issue.)
> >>
> >> *Proposed Change:*
> >> Currently in downloads we are keeping only x86 binary(2),Can we keep ARM
> >> binary also.?
> >>
> >> *Actions:*
> >> a) *Dedicated* *Machine*:
> >>i) Dedicated ARM machine will be donated which I confirmed
> >>ii) Or can use jenkins ARM machine itself which is currently used
> >> for ARM
> >> b) *Automate Release:* How about having one release project in
> jenkins..?
> >> So that future RM's just trigger the jenkin project.
> >>
> >> Please let me know your thoughts on this.
> >>
> >>
> >> 1.
> >>
> >>
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-qbt-linux-ARM-trunk/
> >> 2.https://hadoop.apache.org/releases.html
> >>
> >>
> >>
> >>
> >>
> >>
> >> --Brahma Reddy Battula
> >>
> >
>
> --
>
>
>
> --Brahma Reddy Battula
>


Re: [DISCUSS] Hadoop 3.3.0 Release include ARM binary

2020-03-12 Thread Brahma Reddy Battula
thanks Akira.

Currently only problem is dedicated ARM for future RM.This i want to sort
out like below,if you've some other,please let me know.

i) Single machine and share cred to future RM ( as we can delete keys once
release is over).
ii) Creating the jenkins project ( may be we need to discuss in the board..)
iii) I can provide ARM release for future releases.







On Thu, Mar 12, 2020 at 5:14 PM Akira Ajisaka  wrote:

> Hi Brahma,
>
> I think we cannot do any of your proposed actions.
>
> http://www.apache.org/legal/release-policy.html#owned-controlled-hardware
> > Strictly speaking, releases must be verified on hardware owned and
> controlled by the committer. That means hardware the committer has physical
> possession and control of and exclusively full administrative/superuser
> access to. That's because only such hardware is qualified to hold a PGP
> private key, and the release should be verified on the machine the private
> key lives on or on a machine as trusted as that.
>
> https://www.apache.org/dev/release-distribution.html#sigs-and-sums
> > Private keys MUST NOT be stored on any ASF machine. Likewise, signatures
> for releases MUST NOT be created on ASF machines.
>
> We need to have dedicated physical ARM machines for each release manager,
> and now it is not feasible.
> If you provide an unofficial ARM binary release in some repository, that's
> okay.
>
> -Akira
>
> On Thu, Mar 12, 2020 at 7:57 PM Brahma Reddy Battula 
> wrote:
>
>> Hello folks,
>>
>> As currently trunk will support ARM based compilation and qbt(1) is
>> running
>> from several months with quite stable, hence planning to propose ARM
>> binary
>> this time.
>>
>> ( Note : As we'll know voting will be based on the source,so this will not
>> issue.)
>>
>> *Proposed Change:*
>> Currently in downloads we are keeping only x86 binary(2),Can we keep ARM
>> binary also.?
>>
>> *Actions:*
>> a) *Dedicated* *Machine*:
>>i) Dedicated ARM machine will be donated which I confirmed
>>ii) Or can use jenkins ARM machine itself which is currently used
>> for ARM
>> b) *Automate Release:* How about having one release project in jenkins..?
>> So that future RM's just trigger the jenkin project.
>>
>> Please let me know your thoughts on this.
>>
>>
>> 1.
>>
>> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-qbt-linux-ARM-trunk/
>> 2.https://hadoop.apache.org/releases.html
>>
>>
>>
>>
>>
>>
>> --Brahma Reddy Battula
>>
>

-- 



--Brahma Reddy Battula


Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-03-12 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.TestMultipleNNPortQOP 
   hadoop.hdfs.tools.TestDFSZKFailoverController 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.registry.secure.TestSecureLogins 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-compile-cc-root-jdk1.8.0_242.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-compile-javac-root-jdk1.8.0_242.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-patch-shellcheck.txt
  [56K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/whitespace-tabs.txt
  [1.3M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/xml.txt
  [12K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_242.txt
  [1.1M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [236K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/622/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-cli

Re: [DISCUSS] Hadoop 3.3.0 Release include ARM binary

2020-03-12 Thread Akira Ajisaka
Hi Brahma,

I think we cannot do any of your proposed actions.

http://www.apache.org/legal/release-policy.html#owned-controlled-hardware
> Strictly speaking, releases must be verified on hardware owned and
controlled by the committer. That means hardware the committer has physical
possession and control of and exclusively full administrative/superuser
access to. That's because only such hardware is qualified to hold a PGP
private key, and the release should be verified on the machine the private
key lives on or on a machine as trusted as that.

https://www.apache.org/dev/release-distribution.html#sigs-and-sums
> Private keys MUST NOT be stored on any ASF machine. Likewise, signatures
for releases MUST NOT be created on ASF machines.

We need to have dedicated physical ARM machines for each release manager,
and now it is not feasible.
If you provide an unofficial ARM binary release in some repository, that's
okay.

-Akira

On Thu, Mar 12, 2020 at 7:57 PM Brahma Reddy Battula 
wrote:

> Hello folks,
>
> As currently trunk will support ARM based compilation and qbt(1) is running
> from several months with quite stable, hence planning to propose ARM binary
> this time.
>
> ( Note : As we'll know voting will be based on the source,so this will not
> issue.)
>
> *Proposed Change:*
> Currently in downloads we are keeping only x86 binary(2),Can we keep ARM
> binary also.?
>
> *Actions:*
> a) *Dedicated* *Machine*:
>i) Dedicated ARM machine will be donated which I confirmed
>ii) Or can use jenkins ARM machine itself which is currently used
> for ARM
> b) *Automate Release:* How about having one release project in jenkins..?
> So that future RM's just trigger the jenkin project.
>
> Please let me know your thoughts on this.
>
>
> 1.
>
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-qbt-linux-ARM-trunk/
> 2.https://hadoop.apache.org/releases.html
>
>
>
>
>
>
> --Brahma Reddy Battula
>


[DISCUSS] Hadoop 3.3.0 Release include ARM binary

2020-03-12 Thread Brahma Reddy Battula
Hello folks,

As currently trunk will support ARM based compilation and qbt(1) is running
from several months with quite stable, hence planning to propose ARM binary
this time.

( Note : As we'll know voting will be based on the source,so this will not
issue.)

*Proposed Change:*
Currently in downloads we are keeping only x86 binary(2),Can we keep ARM
binary also.?

*Actions:*
a) *Dedicated* *Machine*:
   i) Dedicated ARM machine will be donated which I confirmed
   ii) Or can use jenkins ARM machine itself which is currently used
for ARM
b) *Automate Release:* How about having one release project in jenkins..?
So that future RM's just trigger the jenkin project.

Please let me know your thoughts on this.


1.
https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-qbt-linux-ARM-trunk/
2.https://hadoop.apache.org/releases.html






--Brahma Reddy Battula


[jira] [Created] (HADOOP-16922) ABFS: Change in User-Agent header

2020-03-12 Thread Bilahari T H (Jira)
Bilahari T H created HADOOP-16922:
-

 Summary: ABFS: Change in User-Agent header
 Key: HADOOP-16922
 URL: https://issues.apache.org/jira/browse/HADOOP-16922
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Bilahari T H


Move the configured prefix from the end of the User-Agent value to right after 
the driver version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] Accelerate Hadoop dependency updates

2020-03-12 Thread Igor Dvorzhak
How do you manage and version such dependency upgrades in subminor
Haoop/Spark/Hive versions in Cloudera then? I would imagine that some
upgrades will be breaking for customers and can not be shipped in subminor
CDH release? Or this is in preparation for the next major/minor release of
CDH?

On Wed, Mar 11, 2020 at 5:45 PM Wei-Chiu Chuang
 wrote:

> FWIW we are updating guava in Spark and Hive at Cloudera. Don't know which
> Apache version are they going to land, but we'll upstream them for sure.
>
> The guava change is debatable. It's not as critical as others. There are
> critical vulnerabilities in other dependencies that we have no way but to
> update to a new major/minor version because we are so far behind. And given
> the critical nature, I think it is worth the risk and backport to lower
> maintenance releases is warranted. Moreover, our minor releases are at best
> 1 per year. That is too slow to respond to a critical vulnerability.
>
> On Wed, Mar 11, 2020 at 5:02 PM Igor Dvorzhak 
> wrote:
>
> > Generally I'm for updating dependencies, but I think that Hadoop should
> > stick with semantic versioning and do not make major and
> > minor dependency updates in subminor releases.
> >
> > For  example, Hadoop 3.2.1 updated Guava to 27.0-jre, and because of this
> > Spark 3.0 stuck with Hadoop 3.2.0 - they use Hive 2.3.6 that doesn't
> > support Guava 27.0-jre.
> >
> > It would be better to make dependency upgrades when releasing new
> > major/minor versions, for example Guava 27.0-jre upgrade was more
> > appropriate for Hadoop 3.3.0 release than 3.2.1.
> >
> > On Tue, Mar 10, 2020 at 3:03 PM Wei-Chiu Chuang
> >  wrote:
> >
> >> I'm not hearing any feedback so far, but I want to suggest:
> >>
> >> use hadoop-thirdparty repository to host any dependencies that are known
> >> to
> >> break compatibility.
> >>
> >> Candidate #1 guava
> >> Candidate #2 Netty
> >> Candidate #3 Jetty
> >>
> >> in fact, HBase shades these dependencies for the exact same reason.
> >>
> >> As an example of the cost of compatibility breakage: we spent the last 6
> >> months to backport the guava update change (guava 11 --> 27) throughout
> >> Cloudera's stack, and after 6 months we are not done yet because we have
> >> to
> >> update guava in Hadoop, Hive, Spark ..., and Hadoop, Hive and Spark's
> >> guava
> >> is in the classpath of every application.
> >>
> >> Thoughts?
> >>
> >> On Sat, Mar 7, 2020 at 9:31 AM Wei-Chiu Chuang 
> >> wrote:
> >>
> >> > Hi Hadoop devs,
> >> >
> >> > I the past, Hadoop tends to be pretty far behind the latest versions
> of
> >> > dependencies. Part of that is due to the fear of the breaking changes
> >> > brought in by the dependency updates.
> >> >
> >> > However, things have changed dramatically over the past few years.
> With
> >> > more focus on security vulnerabilities, more vulnerabilities are
> >> discovered
> >> > in our dependencies, and users put more pressure on patching Hadoop
> (and
> >> > its ecosystem) to use the latest dependency versions.
> >> >
> >> > As an example, Jackson-databind had 20 CVEs published in the last year
> >> > alone.
> >> >
> >>
> https://www.cvedetails.com/product/42991/Fasterxml-Jackson-databind.html?vendor_id=15866
> >> >
> >> > Jetty: 4 CVEs in 2019:
> >> >
> >>
> https://www.cvedetails.com/product/34824/Eclipse-Jetty.html?vendor_id=10410
> >> >
> >> > We can no longer keep Hadoop stay behind. The more we stay behind, the
> >> > harder it is to update. A good example is Jersey migration 1 -> 2
> >> > HADOOP-15984 
> >> contributed
> >> > by Akira. Jersey 1 is no longer supported. But Jersey 2 migration is
> >> hard.
> >> > If any critical vulnerability is found in Jersey 1, it will leave us
> in
> >> a
> >> > bad situation since we can't simply update Jersey version and be done.
> >> >
> >> > Hadoop 3 adds new public artifacts that shade these dependencies. We
> >> > should advocate downstream applications to use the public artifacts to
> >> > avoid breakage.
> >> >
> >> > I'd like to hear your thoughts: are you okay to see Hadoop keep up
> with
> >> > the latest dependency updates, or would rather stay behind to ensure
> >> > compatibility?
> >> >
> >> > Coupled with that, I'd like to call for more frequent Hadoop releases
> >> for
> >> > the same purpose. IMHO that'll require better infrastructure to assist
> >> the
> >> > release work and some rethinking our current Hadoop code structure,
> like
> >> > separate each subproject into its own repository and release cadence.
> >> This
> >> > can be controversial but I think it'll be good for the project in the
> >> long
> >> > run.
> >> >
> >> > Thanks,
> >> > Wei-Chiu
> >> >
> >>
> >
>


smime.p7s
Description: S/MIME Cryptographic Signature