I understand what you are saying . However, I am not sure how to implement when i create a docker image using spark 3.2.1 with hadoop 3.2 which has guava jar already added as part of distribution.
On Tue, Feb 15, 2022, 01:17 Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi Raj, > > I found the old email. That is what I did but it is 2018 stuff. > > The email says > > I sorted out this problem. I rewrote the assembly with shade rules to > avoid old jar files as follows: > > lazy val root = (project in file(".")). > settings( > name := "${APPLICATION}", > version := "1.0", > scalaVersion := "2.11.8", > mainClass in Compile := Some("myPackage.${APPLICATION}") > ) > assemblyShadeRules in assembly := Seq( > ShadeRule.rename("com.google.common.**" -> "my_conf.@1").inAll > ) > libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" % > "provided" > libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.4.0" > libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" % > "provided" exclude("org.apache.hadoop", "hadoop-client") > resolvers += "Akka Repository" at "http://repo.akka.io/releases/" > libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.7.8" > libraryDependencies += "commons-io" % "commons-io" % "2.4" > libraryDependencies += "javax.servlet" % "javax.servlet-api" % "3.0.1" % > "provided" > libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" % > "provided" > libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" % > "provided" > libraryDependencies += "com.google.cloud.bigdataoss" % > "bigquery-connector" % "0.13.4-hadoop3" > libraryDependencies += "com.google.cloud.bigdataoss" % "gcs-connector" % > "1.9.4-hadoop3" > libraryDependencies += "com.google.code.gson" % "gson" % "2.8.5" > libraryDependencies += "org.apache.httpcomponents" % "httpcore" % "4.4.8" > libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.4.0" > libraryDependencies += "com.github.samelamin" %% "spark-bigquery" % > "0.2.5" > > // META-INF discarding > assemblyMergeStrategy in assembly := { > case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard > case PathList("META-INF", xs @ _*) => MergeStrategy.discard > case x => MergeStrategy.first > } > > HTH > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 14 Feb 2022 at 19:40, Raj ks <rajabhupati....@gmail.com> wrote: > >> Should we remove the existing jar and upgrade it to some recent version? >> >> On Tue, Feb 15, 2022, 01:08 Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> I recall I had similar issues running Spark on Google Dataproc. >>> >>> sounds like it gets Hadoop's jars on the classpath which include an >>> older version of Guava. The solution is to shade/relocate Guava in your >>> distribution >>> >>> >>> HTH >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Mon, 14 Feb 2022 at 19:10, Raj ks <rajabhupati....@gmail.com> wrote: >>> >>>> Hi Team , >>>> >>>> We are trying to build a docker image using Centos and trying to >>>> connect through S3. Same works with Hadoop 3.2.0 and spark.3.1.2 >>>> >>>> #Installing spark binaries >>>> ENV SPARK_HOME /opt/spark >>>> ENV SPARK_VERSION 3.2.1 >>>> ENV HADOOP_VERSION 3.2.0 >>>> ARG HADOOP_VERSION_SHORT=3.2 >>>> ARG HADOOP_AWS_VERSION=3.3.0 >>>> ARG AWS_SDK_VERSION=1.11.563 >>>> >>>> >>>> RUN set -xe \ >>>> && cd /tmp \ >>>> && wget >>>> http://mirrors.gigenet.com/apache/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION_SHORT}.tgz >>>> \ >>>> && tar -zxvf >>>> spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION_SHORT}.tgz \ >>>> && rm *.tgz \ >>>> && mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION_SHORT} >>>> ${SPARK_HOME} \ >>>> && cp ${SPARK_HOME}/kubernetes/dockerfiles/spark/entrypoint.sh >>>> ${SPARK_HOME} \ >>>> && wget >>>> https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/${HADOOP_AWS_VERSION}/hadoop-aws-${HADOOP_AWS_VERSION}.jar >>>> \ >>>> && wget >>>> https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/${AWS_SDK_VERSION}/aws-java-sdk-bundle-${AWS_SDK_VERSION}.jar >>>> \ >>>> && wget >>>> https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/${AWS_SDK_VERSION}/aws-java-sdk-${AWS_SDK_VERSION}.jar >>>> \ >>>> && mv *.jar /opt/spark/jars/ >>>> >>>> Error: >>>> >>>> Any help on this is appreciated >>>> java.lang.NoSuchMethodError: >>>> com/google/common/base/Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V >>>> (loaded from file:/opt/spark/jars/guava-14.0.1.jar by >>>> jdk.internal.loader.ClassLoaders$AppClassLoader@1e4553e) called from >>>> class org.apache.hadoop.fs.s3a.S3AUtils (loaded from >>>> file:/opt/spark/jars/hadoop-aws-3.3.0.jar by >>>> jdk.internal.loader.ClassLoaders$AppClassLoader@1e4553e). >>>> >>>>