Hi Hadoop devs, I spent a good part of the past 7 months working with a dozen of colleagues to update the guava version in Cloudera's software (that includes Hadoop, HBase, Spark, Hive, Cloudera Manager ... more than 20+ projects)
After 7 months, I finally came to a conclusion: Update to Hadoop 3.3 / 3.2.1 / 3.1.3, even if you just go from Hadoop 3.0/ 3.1.0 is going to be really hard because of guava. Because of Guava, the amount of work to certify a minor release update is almost equivalent to a major release update. That is because: (1) Going from guava 11 to guava 27 is a big jump. There are several incompatible API changes in many places. Too bad the Google developers are not sympathetic about its users. (2) guava is used in all Hadoop jars. Not just Hadoop servers but also client jars and Hadoop common libs. (3) The Hadoop library is used in practically all software at Cloudera. Here is my proposal: (1) shade guava into hadoop-thirdparty, relocate the classpath to org.hadoop.thirdparty.com.google.common.* (2) make a hadoop-thirdparty 1.1.0 release. (3) update existing references to guava to the relocated path. There are more than 2k imports that need an update. (4) release Hadoop 3.3.1 / 3.2.2 that contains this change. In this way, we will be able to update guava in Hadoop in the future without disrupting Hadoop applications. Note: HBase already did this and this guava update project would have been much more difficult if HBase didn't do so. Thoughts? Other options include (1) force downstream applications to migrate to Hadoop client artifacts as listed here https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/DownstreamDev.html but that's nearly impossible. (2) Migrate Guava to Java APIs. I suppose this is a big project and I can't estimate how much work it's going to be. Weichiu