Hi Deepak,
Following your suggestion, I put exclusion of guava in topmost POM (under
Spark home directly) as follows.
2227- </dependency>
2228- <dependency>
2229- <groupId>org.apache.hadoop</groupId>
2230: <artifactId>hadoop-common</artifactId>
2231- <version>3.2.1</version>
2232- <exclusions>
2233- <exclusion>
2234- <groupId>com.google.guava</groupId>
2235- <artifactId>guava</artifactId>
2236- </exclusion>
2237- </exclusions>
2238- </dependency>
2239- </dependencies>
2240- </dependencyManagement>
I also set properties for spark.executor.userClassPathFirst=true and
spark.driver.userClassPathFirst=true
D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
-Dspark.executor.userClassPathFirst=true
-Dspark.driver.userClassPathFirst=true -DskipTests clean package
and rebuilt spark.
But I got the same error when running spark-shell.
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [
25.092 s]
[INFO] Spark Project Tags ................................. SUCCESS [
22.093 s]
[INFO] Spark Project Sketch ............................... SUCCESS [
19.546 s]
[INFO] Spark Project Local DB ............................. SUCCESS [
10.468 s]
[INFO] Spark Project Networking ........................... SUCCESS [
17.733 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
6.531 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [
25.327 s]
[INFO] Spark Project Launcher ............................. SUCCESS [
27.264 s]
[INFO] Spark Project Core ................................. SUCCESS [07:59
min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:39
min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:08
min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:56
min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:55
min]
[INFO] Spark Project SQL .................................. SUCCESS [12:33
min]
[INFO] Spark Project ML Library ........................... SUCCESS [08:49
min]
[INFO] Spark Project Tools ................................ SUCCESS [
16.967 s]
[INFO] Spark Project Hive ................................. SUCCESS [06:15
min]
[INFO] Spark Project Graph API ............................ SUCCESS [
10.219 s]
[INFO] Spark Project Cypher ............................... SUCCESS [
11.952 s]
[INFO] Spark Project Graph ................................ SUCCESS [
11.171 s]
[INFO] Spark Project REPL ................................. SUCCESS [
55.029 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:07
min]
[INFO] Spark Project YARN ................................. SUCCESS [02:22
min]
[INFO] Spark Project Assembly ............................. SUCCESS [
21.483 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
56.450 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:21
min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:33
min]
[INFO] Spark Project Examples ............................. SUCCESS [02:05
min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
30.780 s]
[INFO] Spark Avro ......................................... SUCCESS [01:43
min]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 01:08 h
[INFO] Finished at: 2019-12-06T11:43:08-08:00
[INFO]
------------------------------------------------------------------------
D:\apache\spark>spark-shell
'spark-shell' is not recognized as an internal or external command,
operable program or batch file.
D:\apache\spark>cd bin
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
Source)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Before building spark, I went to my local Maven repo and removed guava at
all. But after building, I found the same versions of guava have been
downloaded.
D:\mavenrepo\com\google\guava\guava>ls
14.0.1 16.0.1 18.0 19.0
On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <[email protected]> wrote:
> Just to clarify, excluding Hadoop provided guava in pom.xml is an
> alternative to using an Uber jar, which is a more involved process.
>
> On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
> [email protected]> wrote:
>
>
> Hi Sean,
>
> Thanks for your response!
>
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to
> Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
>
> Thanks Sean.
>
> Ping
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03
> min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51
> min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20
> min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16
> min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45
> min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12
> min]
> [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50
> min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03
> min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19
> min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19
> min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20
> min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26
> min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00
> min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44
> min]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO]
> ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd
> sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <[email protected]> wrote:
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <[email protected]> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
> >
>
>