Re: Is it feasible to build and run Spark on Windows?
I am new and plan to be an individual contributor for bug fix. I assume I need building the project if I'll be working on source code based on master branch that the binaries are behind. Do you think this makes sense? Please let me know if, in this case, I can still use binary instead of building the project. On Tue, Dec 10, 2019 at 7:00 AM Deepak Vohra wrote: > The initial question was to build from source. Any reason to build when > binaries are available at https://spark.apache.org/downloads.html > > On Tuesday, December 10, 2019, 03:05:44 AM UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > Super. Thanks Deepak! > > On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra wrote: > > Please install Apache Spark on Windows as discussed in Apache Spark on > Windows - DZone Open Source > <https://dzone.com/articles/working-on-apache-spark-on-windows> > > Apache Spark on Windows - DZone Open Source > > This article explains and provides solutions for some of the most common > errors developers come across when inst... > <https://dzone.com/articles/working-on-apache-spark-on-windows> > > > > On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran > out of free period. Is there a shared EC2 for Spark that we can use for > free? > > Ping > > > On Monday, December 9, 2019, Deepak Vohra wrote: > > Haven't tested but the general procedure is to exclude all guava > dependencies that are not needed. The hadoop-common depedency does not have > a dependency on guava according to Maven Repository: org.apache.hadoop » > hadoop-common > > > > Maven Repository: org.apache.hadoop » hadoop-common > > > > Apache Spark 2.4 has dependency on guava 14. > > If a Docker image for Cloudera Hadoop is used Spark is may be installed > on Docker for Windows. > > For Docker on Windows on EC2 refer Getting Started with Docker for > Windows - Developer.com > > > > Getting Started with Docker for Windows - Developer.com > > > > Docker for Windows makes it feasible to run a Docker daemon on Windows > Server 2016. Learn to harness its power. > > > > > > Conflicting versions is not an issue if Docker is used. > > "Apache Spark applications usually have a complex set of required > software dependencies. Spark applications may require specific versions of > these dependencies (such as Pyspark and R) on the Spark executor hosts, > sometimes with conflicting versions." > > Running Spark in Docker Containers on YARN > > > > Running Spark in Docker Containers on YARN > > > > > > > > > > > > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > > Hi Deepak, > > I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't > downloaded for somehow. I'll try something else. Thank you very much for > your help! > > Ping > > > > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra wrote: > > > > As multiple guava versions are found exclude guava from all the > dependecies it could have been downloaded with. And explicitly add a recent > guava version. > > > > org.apache.hadoop > > hadoop-common > > 3.2.1 > > > > > > com.google.guava > > guava > > > > > > > > > > com.google.guava > > guava > > 28.1-jre > > > > > > > > > > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > > Hi Deepak, > > Following your suggestion, I put exclusion of guava in topmost POM > (under Spark home directly) as follows. > > 2227- > > 2228- > > 2229-org.apache.hadoop > > 2230:hadoop-common > > 2231-3.2.1 > > 2232- > > 2233- > > 2234-com.google.guava > > 2235-guava > > 2236- > > 2237- > > 2238- > > 2239- > > 2240- > > I also set properties for spark.executor.userClassPathFirst=true and > spark.driver.userClassPathFirst=true > > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 > -Dspark.executor.userClassPathFirst=true > -Dspark.driver.userClassPathFirst=true -DskipTests clean package > > and rebuilt spark. > > But I got the same error when running spark-shell. > > > > [INFO
Re: Is it feasible to build and run Spark on Windows?
The initial question was to build from source. Any reason to build when binaries are available at https://spark.apache.org/downloads.html On Tuesday, December 10, 2019, 03:05:44 AM UTC, Ping Liu wrote: Super. Thanks Deepak! On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra wrote: Please install Apache Spark on Windows as discussed in Apache Spark on Windows - DZone Open Source | | | | | | | | | | | Apache Spark on Windows - DZone Open Source This article explains and provides solutions for some of the most common errors developers come across when inst... | | | On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu wrote: Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran out of free period. Is there a shared EC2 for Spark that we can use for free? Ping On Monday, December 9, 2019, Deepak Vohra wrote: > Haven't tested but the general procedure is to exclude all guava dependencies > that are not needed. The hadoop-common depedency does not have a dependency > on guava according to Maven Repository: org.apache.hadoop » hadoop-common > > Maven Repository: org.apache.hadoop » hadoop-common > > Apache Spark 2.4 has dependency on guava 14. > If a Docker image for Cloudera Hadoop is used Spark is may be installed on > Docker for Windows. > For Docker on Windows on EC2 refer Getting Started with Docker for Windows - > Developer.com > > Getting Started with Docker for Windows - Developer.com > > Docker for Windows makes it feasible to run a Docker daemon on Windows Server > 2016. Learn to harness its power. > > > Conflicting versions is not an issue if Docker is used. > "Apache Spark applications usually have a complex set of required software > dependencies. Spark applications may require specific versions of these > dependencies (such as Pyspark and R) on the Spark executor hosts, sometimes > with conflicting versions." > Running Spark in Docker Containers on YARN > > Running Spark in Docker Containers on YARN > > > > > > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu > wrote: > > Hi Deepak, > I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't downloaded > for somehow. I'll try something else. Thank you very much for your help! > Ping > > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra wrote: > > As multiple guava versions are found exclude guava from all the dependecies > it could have been downloaded with. And explicitly add a recent guava version. > > org.apache.hadoop > hadoop-common > 3.2.1 > > > com.google.guava > guava > > > > > com.google.guava > guava > 28.1-jre > > > > > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu > wrote: > > Hi Deepak, > Following your suggestion, I put exclusion of guava in topmost POM (under > Spark home directly) as follows. > 2227- > 2228- > 2229- org.apache.hadoop > 2230: hadoop-common > 2231- 3.2.1 > 2232- > 2233- > 2234- com.google.guava > 2235- guava > 2236- > 2237- > 2238- > 2239- > 2240- > I also set properties for spark.executor.userClassPathFirst=true and > spark.driver.userClassPathFirst=true > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 > -Dspark.executor.userClassPathFirst=true > -Dspark.driver.userClassPathFirst=true -DskipTests clean package > and rebuilt spark. > But I got the same error when running spark-shell. > > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 25.092 > s] > [INFO] Spark Project Tags . SUCCESS [ 22.093 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 19.546 > s] > [INFO] Spark Project Local DB . SUCCESS [ 10.468 > s] > [INFO] Spark Project Networking ... SUCCESS [ 17.733 > s] > [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 6.531 > s] > [INFO] Spark Project Unsafe ... SUCCESS [ 25.327 > s] > [INFO] Spark Project Launcher . SUCCESS [ 27.264 > s] > [INFO] Spark Project Core . SUCCESS [07:59 > min] > [INFO] Spark Project ML Local Library . SUCCESS [01:39 > min] > [INFO] Spark Project GraphX ... SUCCESS [02:08 > min] > [INFO] S
Re: Is it feasible to build and run Spark on Windows?
Super. Thanks Deepak! On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra wrote: > Please install Apache Spark on Windows as discussed in Apache Spark on > Windows - DZone Open Source > <https://dzone.com/articles/working-on-apache-spark-on-windows> > > Apache Spark on Windows - DZone Open Source > > This article explains and provides solutions for some of the most common > errors developers come across when inst... > <https://dzone.com/articles/working-on-apache-spark-on-windows> > > > > On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran > out of free period. Is there a shared EC2 for Spark that we can use for > free? > > Ping > > > On Monday, December 9, 2019, Deepak Vohra wrote: > > Haven't tested but the general procedure is to exclude all guava > dependencies that are not needed. The hadoop-common depedency does not have > a dependency on guava according to Maven Repository: org.apache.hadoop » > hadoop-common > > > > Maven Repository: org.apache.hadoop » hadoop-common > > > > Apache Spark 2.4 has dependency on guava 14. > > If a Docker image for Cloudera Hadoop is used Spark is may be installed > on Docker for Windows. > > For Docker on Windows on EC2 refer Getting Started with Docker for > Windows - Developer.com > > > > Getting Started with Docker for Windows - Developer.com > > > > Docker for Windows makes it feasible to run a Docker daemon on Windows > Server 2016. Learn to harness its power. > > > > > > Conflicting versions is not an issue if Docker is used. > > "Apache Spark applications usually have a complex set of required > software dependencies. Spark applications may require specific versions of > these dependencies (such as Pyspark and R) on the Spark executor hosts, > sometimes with conflicting versions." > > Running Spark in Docker Containers on YARN > > > > Running Spark in Docker Containers on YARN > > > > > > > > > > > > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > > Hi Deepak, > > I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't > downloaded for somehow. I'll try something else. Thank you very much for > your help! > > Ping > > > > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra wrote: > > > > As multiple guava versions are found exclude guava from all the > dependecies it could have been downloaded with. And explicitly add a recent > guava version. > > > > org.apache.hadoop > > hadoop-common > > 3.2.1 > > > > > > com.google.guava > > guava > > > > > > > > > > com.google.guava > > guava > > 28.1-jre > > > > > > > > > > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > > Hi Deepak, > > Following your suggestion, I put exclusion of guava in topmost POM > (under Spark home directly) as follows. > > 2227- > > 2228- > > 2229-org.apache.hadoop > > 2230:hadoop-common > > 2231-3.2.1 > > 2232- > > 2233- > > 2234-com.google.guava > > 2235-guava > > 2236- > > 2237- > > 2238- > > 2239- > > 2240- > > I also set properties for spark.executor.userClassPathFirst=true and > spark.driver.userClassPathFirst=true > > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 > -Dspark.executor.userClassPathFirst=true > -Dspark.driver.userClassPathFirst=true -DskipTests clean package > > and rebuilt spark. > > But I got the same error when running spark-shell. > > > > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > > [INFO] > > [INFO] Spark Project Parent POM ... SUCCESS [ > 25.092 s] > > [INFO] Spark Project Tags . SUCCESS [ > 22.093 s] > > [INFO] Spark Project Sketch ... SUCCESS [ > 19.546 s] > > [INFO] Spark Project Local DB . SUCCESS [ > 10.468 s] > > [INFO] Spark Project Networking ... SUCCESS [ > 17.733 s] > > [INFO] Spark Project Shuffle Streaming Service SUCCESS [ > 6.531 s] > > [INFO
Re: Is it feasible to build and run Spark on Windows?
Please install Apache Spark on Windows as discussed in Apache Spark on Windows - DZone Open Source | | | | | | | | | | | Apache Spark on Windows - DZone Open Source This article explains and provides solutions for some of the most common errors developers come across when inst... | | | On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu wrote: Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran out of free period. Is there a shared EC2 for Spark that we can use for free? Ping On Monday, December 9, 2019, Deepak Vohra wrote: > Haven't tested but the general procedure is to exclude all guava dependencies > that are not needed. The hadoop-common depedency does not have a dependency > on guava according to Maven Repository: org.apache.hadoop » hadoop-common > > Maven Repository: org.apache.hadoop » hadoop-common > > Apache Spark 2.4 has dependency on guava 14. > If a Docker image for Cloudera Hadoop is used Spark is may be installed on > Docker for Windows. > For Docker on Windows on EC2 refer Getting Started with Docker for Windows - > Developer.com > > Getting Started with Docker for Windows - Developer.com > > Docker for Windows makes it feasible to run a Docker daemon on Windows Server > 2016. Learn to harness its power. > > > Conflicting versions is not an issue if Docker is used. > "Apache Spark applications usually have a complex set of required software > dependencies. Spark applications may require specific versions of these > dependencies (such as Pyspark and R) on the Spark executor hosts, sometimes > with conflicting versions." > Running Spark in Docker Containers on YARN > > Running Spark in Docker Containers on YARN > > > > > > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu > wrote: > > Hi Deepak, > I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't downloaded > for somehow. I'll try something else. Thank you very much for your help! > Ping > > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra wrote: > > As multiple guava versions are found exclude guava from all the dependecies > it could have been downloaded with. And explicitly add a recent guava version. > > org.apache.hadoop > hadoop-common > 3.2.1 > > > com.google.guava > guava > > > > > com.google.guava > guava > 28.1-jre > > > > > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu > wrote: > > Hi Deepak, > Following your suggestion, I put exclusion of guava in topmost POM (under > Spark home directly) as follows. > 2227- > 2228- > 2229- org.apache.hadoop > 2230: hadoop-common > 2231- 3.2.1 > 2232- > 2233- > 2234- com.google.guava > 2235- guava > 2236- > 2237- > 2238- > 2239- > 2240- > I also set properties for spark.executor.userClassPathFirst=true and > spark.driver.userClassPathFirst=true > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 > -Dspark.executor.userClassPathFirst=true > -Dspark.driver.userClassPathFirst=true -DskipTests clean package > and rebuilt spark. > But I got the same error when running spark-shell. > > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 25.092 > s] > [INFO] Spark Project Tags . SUCCESS [ 22.093 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 19.546 > s] > [INFO] Spark Project Local DB . SUCCESS [ 10.468 > s] > [INFO] Spark Project Networking ... SUCCESS [ 17.733 > s] > [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 6.531 > s] > [INFO] Spark Project Unsafe ... SUCCESS [ 25.327 > s] > [INFO] Spark Project Launcher . SUCCESS [ 27.264 > s] > [INFO] Spark Project Core . SUCCESS [07:59 > min] > [INFO] Spark Project ML Local Library . SUCCESS [01:39 > min] > [INFO] Spark Project GraphX ... SUCCESS [02:08 > min] > [INFO] Spark Project Streaming SUCCESS [02:56 > min] > [INFO] Spark Project Catalyst . SUCCESS [08:55 > min] > [INFO] Spark Project SQL .. SUCCESS [12:33 > min] > [INFO] Spark Project ML Library
Re: Is it feasible to build and run Spark on Windows?
SUCCESS [03:16 min] > [INFO] Spark Project Catalyst . SUCCESS [08:45 min] > [INFO] Spark Project SQL .. SUCCESS [12:12 min] > [INFO] Spark Project ML Library ... SUCCESS [ 16:28 h] > [INFO] Spark Project Tools SUCCESS [ 23.602 s] > [INFO] Spark Project Hive . SUCCESS [07:50 min] > [INFO] Spark Project Graph API SUCCESS [ 8.734 s] > [INFO] Spark Project Cypher ... SUCCESS [ 12.420 s] > [INFO] Spark Project Graph SUCCESS [ 10.186 s] > [INFO] Spark Project REPL . SUCCESS [01:03 min] > [INFO] Spark Project YARN Shuffle Service . SUCCESS [01:19 min] > [INFO] Spark Project YARN . SUCCESS [02:19 min] > [INFO] Spark Project Assembly . SUCCESS [ 18.912 s] > [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [ 57.925 s] > [INFO] Spark Integration for Kafka 0.10 ... SUCCESS [01:20 min] > [INFO] Kafka 0.10+ Source for Structured Streaming SUCCESS [02:26 min] > [INFO] Spark Project Examples . SUCCESS [02:00 min] > [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [ 28.354 s] > [INFO] Spark Avro . SUCCESS [01:44 min] > [INFO] > [INFO] BUILD SUCCESS > [INFO] > [INFO] Total time: 17:30 h > [INFO] Finished at: 2019-12-05T12:20:01-08:00 > [INFO] > > D:\apache\spark>cd bin > > D:\apache\spark\bin>ls > beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd > beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd > docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd > find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd > find-spark-home.cmd pyspark2.cmdspark-class2.cmd spark-sql.cmd sparkR > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at org.apache.spark.deploy.SparkSubmit.org $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > D:\apache\spark\bin> > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: > > What was the build error? you didn't say. Are you sure it succeeded? > Try running from the Spark home dir, not bin. > I know we do run Windows tests and it appears to pass tests, etc. > > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: >> >> Hello, >> >> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment. >> >> Unfortunately, >> >> # Apache Hadoop 2.6.X >> ./build/mvn -Pyarn -DskipTests clean package >> >> # Apache Hadoop 2.7.X and later >> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package >> >> >> Both are listed on http://sp
Re: Is it feasible to build and run Spark on Windows?
ssembly . SUCCESS [ 18.912 s] [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [ 57.925 s] [INFO] Spark Integration for Kafka 0.10 ... SUCCESS [01:20 min] [INFO] Kafka 0.10+ Source for Structured Streaming SUCCESS [02:26 min] [INFO] Spark Project Examples . SUCCESS [02:00 min] [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [ 28.354 s] [INFO] Spark Avro . SUCCESS [01:44 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 17:30 h [INFO] Finished at: 2019-12-05T12:20:01-08:00 [INFO] D:\apache\spark>cd bin D:\apache\spark\bin>ls beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd sparkR D:\apache\spark\bin>spark-shell Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) D:\apache\spark\bin> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build and > run Spark code on Windows environment. > > Unfortunately, > > # Apache Hadoop 2.6.X > ./build/mvn -Pyarn -DskipTests clean package > > # Apache Hadoop 2.7.X and later > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > But neither works for me (I stay directly under spark root directory and run > "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" > > and > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > Now build works. But when I run spark-shell. I got the following error. > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at >org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at >org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at >org.apache.spark.de
Re: Is it feasible to build and run Spark on Windows?
gt; 18.912 s] > [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [ > 57.925 s] > [INFO] Spark Integration for Kafka 0.10 ... SUCCESS [01:20 > min] > [INFO] Kafka 0.10+ Source for Structured Streaming SUCCESS [02:26 > min] > [INFO] Spark Project Examples . SUCCESS [02:00 > min] > [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [ > 28.354 s] > [INFO] Spark Avro . SUCCESS [01:44 > min] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 17:30 h > [INFO] Finished at: 2019-12-05T12:20:01-08:00 > [INFO] > > > D:\apache\spark>cd bin > > D:\apache\spark\bin>ls > beeline load-spark-env.cmd run-example spark-shell > spark-sql2.cmd sparkR.cmd > beeline.cmd load-spark-env.sh run-example.cmd > spark-shell.cmd spark-submit sparkR2.cmd > docker-image-tool.sh pyspark spark-class > spark-shell2.cmd spark-submit.cmd > find-spark-home pyspark.cmd spark-class.cmd spark-sql > spark-submit2.cmd > find-spark-home.cmd pyspark2.cmdspark-class2.cmd spark-sql.cmd > sparkR > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > D:\apache\spark\bin> > > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: > > What was the build error? you didn't say. Are you sure it succeeded? > Try running from the Spark home dir, not bin. > I know we do run Windows tests and it appears to pass tests, etc. > > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > > > Hello, > > > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build > and run Spark code on Windows environment. > > > > Unfortunately, > > > > # Apache Hadoop 2.6.X > > ./build/mvn -Pyarn -DskipTests clean package > > > > # Apache Hadoop 2.7.X and later > > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > > > But neither works for me (I stay directly under spark root directory and > run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package" > > > > and > > > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > > > Now build works. But when I run spark-shell. I got the following error. > > > > D:\apache\spark\bin>spark-shell > > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1
Re: Is it feasible to build and run Spark on Windows?
conditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) D:\apache\spark\bin> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build and > run Spark code on Windows environment. > > Unfortunately, > > # Apache Hadoop 2.6.X > ./build/mvn -Pyarn -DskipTests clean package > > # Apache Hadoop 2.7.X and later > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > But neither works for me (I stay directly under spark root directory and run > "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" > > and > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > Now build works. But when I run spark-shell. I got the following error. > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at >org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at >org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at >org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) > at scala.Option.getOrElse(Option.scala:189) > at >org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at >org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at >org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > Has anyone experienced building and running Spark source code successfully on > Windows? Could you please share your experience? > > Thanks a lot! > > Ping >
Re: Is it feasible to build and run Spark on Windows?
eption in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > D:\apache\spark\bin> > > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: > > What was the build error? you didn't say. Are you sure it succeeded? > Try running from the Spark home dir, not bin. > I know we do run Windows tests and it appears to pass tests, etc. > > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > > > Hello, > > > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build > and run Spark code on Windows environment. > > > > Unfortunately, > > > > # Apache Hadoop 2.6.X > > ./build/mvn -Pyarn -DskipTests clean package > > > > # Apache Hadoop 2.7.X and later > > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > > > But neither works for me (I stay directly under spark root directory and > run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package" > > > > and > > > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > > > Now build works. But when I run spark-shell. I got the following error. > > > > D:\apache\spark\bin>spark-shell > > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > > at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > > at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > > at scala.Option.getOrElse(Option.scala:189) > > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > > at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > > > > Has anyone experienced building and running Spark source code > successfully on Windows? Could you please share your experience? > > > > Thanks a lot! > > > > Ping > > > >
Re: Is it feasible to build and run Spark on Windows?
runMain(SparkSubmit.scala:871) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) D:\apache\spark\bin> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build and > run Spark code on Windows environment. > > Unfortunately, > > # Apache Hadoop 2.6.X > ./build/mvn -Pyarn -DskipTests clean package > > # Apache Hadoop 2.7.X and later > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > But neither works for me (I stay directly under spark root directory and run > "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" > > and > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > Now build works. But when I run spark-shell. I got the following error. > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at >org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at >org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at >org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) > at scala.Option.getOrElse(Option.scala:189) > at >org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at >org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at >org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > Has anyone experienced building and running Spark source code successfully on > Windows? Could you please share your experience? > > Thanks a lot! > > Ping >
Re: Is it feasible to build and run Spark on Windows?
Is Hadoop 3.x not set as a dependency? If so, exclude the guava provided by Hadoop. org.apache.hadoop hadoop-common 3.2.1 com.google.guava guava On Friday, December 6, 2019, 12:20:49 AM UTC, Ping Liu wrote: Thanks Deepak! I'll try it. On Thu, Dec 5, 2019 at 4:13 PM Deepak Vohra wrote: The Guava issue could be fixed in one of two ways: - Use Hadoop v3- Create an Uber jar, referhttps://gite.lirmm.fr/yagoubi/spark/commit/c9f743957fa963bc1dbed7a44a346ffce1a45cf2 Managing Java dependencies for Apache Spark applications on Cloud Dataproc | Google Cloud Blog | | | | | | | | | | | Managing Java dependencies for Apache Spark applications on Cloud Datapr... Learn how to set up Java imported packages for Apache Spark on Cloud Dataproc to avoid conflicts. | | | On Thursday, December 5, 2019, 11:49:47 PM UTC, Ping Liu wrote: Hi Deepak, For Spark, I am using master branch and just have code updated yesterday. For Guava, I actually deleted my old versions from the local Maven repo. The build process of Spark automatically downloaded a few versions. The oldest version is 14.0.1. But even in 14.0,1 (https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html) Preconditions already requires boolean as first parameter. | static void | checkArgument(boolean expression, String errorMessageTemplate, Object... errorMessageArgs) | The newer Guava version, checkArgument() all require boolean as first parameter. For Docker, using EC2 is a good idea. Is there a document or guidance for it? Thanks. Ping On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra wrote: Such type exception could occur if a dependency (most likely Guava) version is not supported by the Spark version. What is the Spark and Guava versions? Use a more recent Guava version dependency in Maven pom.xml. Regarding Docker, a cloud platform instance such as EC2 could be used with Hyper-V support. On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu wrote: Hi Deepak, Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email. Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if possible. Thanks! Ping On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra wrote: Several alternatives are available: - Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven - Use Docker image for CDH on WindowsDocker Hub | | | | Docker Hub | | | On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen wrote: What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build and > run Spark code on Windows environment. > > Unfortunately, > > # Apache Hadoop 2.6.X > ./build/mvn -Pyarn -DskipTests clean package > > # Apache Hadoop 2.7.X and later > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > But neither works for me (I stay directly under spark root directory and run > "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" > > and > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > Now build works. But when I run spark-shell. I got the following error. > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at >org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at >org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at >org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) > at scala.Option.getOrElse(Option.scala:189) > at >org.apache.spark.deploy.S
Re: Is it feasible to build and run Spark on Windows?
Sorry, didn't notice, Hadoop v3.x is already being used. On Thursday, December 5, 2019, 11:49:47 PM UTC, Ping Liu wrote: Hi Deepak, For Spark, I am using master branch and just have code updated yesterday. For Guava, I actually deleted my old versions from the local Maven repo. The build process of Spark automatically downloaded a few versions. The oldest version is 14.0.1. But even in 14.0,1 (https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html) Preconditions already requires boolean as first parameter. | static void | checkArgument(boolean expression, String errorMessageTemplate, Object... errorMessageArgs) | The newer Guava version, checkArgument() all require boolean as first parameter. For Docker, using EC2 is a good idea. Is there a document or guidance for it? Thanks. Ping On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra wrote: Such type exception could occur if a dependency (most likely Guava) version is not supported by the Spark version. What is the Spark and Guava versions? Use a more recent Guava version dependency in Maven pom.xml. Regarding Docker, a cloud platform instance such as EC2 could be used with Hyper-V support. On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu wrote: Hi Deepak, Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email. Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if possible. Thanks! Ping On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra wrote: Several alternatives are available: - Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven - Use Docker image for CDH on WindowsDocker Hub | | | | Docker Hub | | | On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen wrote: What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build and > run Spark code on Windows environment. > > Unfortunately, > > # Apache Hadoop 2.6.X > ./build/mvn -Pyarn -DskipTests clean package > > # Apache Hadoop 2.7.X and later > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > But neither works for me (I stay directly under spark root directory and run > "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" > > and > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > Now build works. But when I run spark-shell. I got the following error. > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at >org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at >org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at >org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) > at scala.Option.getOrElse(Option.scala:189) > at >org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at >org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at >org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > Has anyone experienced building and running Spark source code successfully on > Windows? Could you please share your experience? > > Thanks a lot! > > Ping > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Thanks Deepak! I'll try it. On Thu, Dec 5, 2019 at 4:13 PM Deepak Vohra wrote: > The Guava issue could be fixed in one of two ways: > > - Use Hadoop v3 > - Create an Uber jar, refer > > https://gite.lirmm.fr/yagoubi/spark/commit/c9f743957fa963bc1dbed7a44a346ffce1a45cf2 > Managing Java dependencies for Apache Spark applications on Cloud > Dataproc | Google Cloud Blog > <https://cloud.google.com/blog/products/data-analytics/managing-java-dependencies-apache-spark-applications-cloud-dataproc> > > Managing Java dependencies for Apache Spark applications on Cloud Datapr... > > Learn how to set up Java imported packages for Apache Spark on Cloud > Dataproc to avoid conflicts. > > <https://cloud.google.com/blog/products/data-analytics/managing-java-dependencies-apache-spark-applications-cloud-dataproc> > > > > On Thursday, December 5, 2019, 11:49:47 PM UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > Hi Deepak, > > For Spark, I am using master branch and just have code updated yesterday. > > For Guava, I actually deleted my old versions from the local Maven repo. > The build process of Spark automatically downloaded a few versions. The > oldest version is 14.0.1. > > But even in 14.0,1 ( > https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html) > Preconditions already requires boolean as first parameter. > > static void *checkArgument > <https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html#checkArgument(boolean,%20java.lang.String,%20java.lang.Object...)>*(boolean > expression, > String > <http://download.oracle.com/javase/6/docs/api/java/lang/String.html?is-external=true> > errorMessageTemplate, > Object > <http://download.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true> > ... errorMessageArgs) > > The newer Guava version, checkArgument() all require boolean as first > parameter. > > For Docker, using EC2 is a good idea. Is there a document or guidance for > it? > > Thanks. > > Ping > > > > On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra wrote: > > Such type exception could occur if a dependency (most likely Guava) > version is not supported by the Spark version. What is the Spark and Guava > versions? Use a more recent Guava version dependency in Maven pom.xml. > > Regarding Docker, a cloud platform instance such as EC2 could be used with > Hyper-V support. > > On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > Hi Deepak, > > Yes, I did use Maven. I even have the build pass successfully when setting > Hadoop version to 3.2. Please see my response to Sean's email. > > Unfortunately, I only have Docker Toolbox as my Windows doesn't have > Microsoft Hyper-V. So I want to avoid using Docker to do major work if > possible. > > Thanks! > > Ping > > > On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra wrote: > > Several alternatives are available: > > - Use Maven to build Spark on Windows. > http://spark.apache.org/docs/latest/building-spark.html#apache-maven > > - Use Docker image for CDH on Windows > Docker Hub <https://hub.docker.com/u/cloudera> > > Docker Hub > > <https://hub.docker.com/u/cloudera> > > > > > On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen < > sro...@gmail.com> wrote: > > > What was the build error? you didn't say. Are you sure it succeeded? > Try running from the Spark home dir, not bin. > I know we do run Windows tests and it appears to pass tests, etc. > > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > > > Hello, > > > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build > and run Spark code on Windows environment. > > > > Unfortunately, > > > > # Apache Hadoop 2.6.X > > ./build/mvn -Pyarn -DskipTests clean package > > > > # Apache Hadoop 2.7.X and later > > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > > > But neither works for me (I stay directly under spark root directory and > run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package" > > > > and > > > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > > > Now build works. But when
Re: Is it feasible to build and run Spark on Windows?
The Guava issue could be fixed in one of two ways: - Use Hadoop v3- Create an Uber jar, referhttps://gite.lirmm.fr/yagoubi/spark/commit/c9f743957fa963bc1dbed7a44a346ffce1a45cf2 Managing Java dependencies for Apache Spark applications on Cloud Dataproc | Google Cloud Blog | | | | | | | | | | | Managing Java dependencies for Apache Spark applications on Cloud Datapr... Learn how to set up Java imported packages for Apache Spark on Cloud Dataproc to avoid conflicts. | | | On Thursday, December 5, 2019, 11:49:47 PM UTC, Ping Liu wrote: Hi Deepak, For Spark, I am using master branch and just have code updated yesterday. For Guava, I actually deleted my old versions from the local Maven repo. The build process of Spark automatically downloaded a few versions. The oldest version is 14.0.1. But even in 14.0,1 (https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html) Preconditions already requires boolean as first parameter. | static void | checkArgument(boolean expression, String errorMessageTemplate, Object... errorMessageArgs) | The newer Guava version, checkArgument() all require boolean as first parameter. For Docker, using EC2 is a good idea. Is there a document or guidance for it? Thanks. Ping On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra wrote: Such type exception could occur if a dependency (most likely Guava) version is not supported by the Spark version. What is the Spark and Guava versions? Use a more recent Guava version dependency in Maven pom.xml. Regarding Docker, a cloud platform instance such as EC2 could be used with Hyper-V support. On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu wrote: Hi Deepak, Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email. Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if possible. Thanks! Ping On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra wrote: Several alternatives are available: - Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven - Use Docker image for CDH on WindowsDocker Hub | | | | Docker Hub | | | On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen wrote: What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build and > run Spark code on Windows environment. > > Unfortunately, > > # Apache Hadoop 2.6.X > ./build/mvn -Pyarn -DskipTests clean package > > # Apache Hadoop 2.7.X and later > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > But neither works for me (I stay directly under spark root directory and run > "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" > > and > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > Now build works. But when I run spark-shell. I got the following error. > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at >org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at >org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at >org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) > at scala.Option.getOrElse(Option.scala:189) > at >org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at >org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(Spark
Re: Is it feasible to build and run Spark on Windows?
Hi Deepak, For Spark, I am using master branch and just have code updated yesterday. For Guava, I actually deleted my old versions from the local Maven repo. The build process of Spark automatically downloaded a few versions. The oldest version is 14.0.1. But even in 14.0,1 ( https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html) Preconditions already requires boolean as first parameter. static void *checkArgument <https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html#checkArgument(boolean, java.lang.String, java.lang.Object...)>*(boolean expression, String <http://download.oracle.com/javase/6/docs/api/java/lang/String.html?is-external=true> errorMessageTemplate, Object <http://download.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true> ... errorMessageArgs) The newer Guava version, checkArgument() all require boolean as first parameter. For Docker, using EC2 is a good idea. Is there a document or guidance for it? Thanks. Ping On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra wrote: > Such type exception could occur if a dependency (most likely Guava) > version is not supported by the Spark version. What is the Spark and Guava > versions? Use a more recent Guava version dependency in Maven pom.xml. > > Regarding Docker, a cloud platform instance such as EC2 could be used with > Hyper-V support. > > On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > Hi Deepak, > > Yes, I did use Maven. I even have the build pass successfully when setting > Hadoop version to 3.2. Please see my response to Sean's email. > > Unfortunately, I only have Docker Toolbox as my Windows doesn't have > Microsoft Hyper-V. So I want to avoid using Docker to do major work if > possible. > > Thanks! > > Ping > > > On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra wrote: > > Several alternatives are available: > > - Use Maven to build Spark on Windows. > http://spark.apache.org/docs/latest/building-spark.html#apache-maven > > - Use Docker image for CDH on Windows > Docker Hub <https://hub.docker.com/u/cloudera> > > Docker Hub > > <https://hub.docker.com/u/cloudera> > > > > > On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen < > sro...@gmail.com> wrote: > > > What was the build error? you didn't say. Are you sure it succeeded? > Try running from the Spark home dir, not bin. > I know we do run Windows tests and it appears to pass tests, etc. > > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > > > Hello, > > > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build > and run Spark code on Windows environment. > > > > Unfortunately, > > > > # Apache Hadoop 2.6.X > > ./build/mvn -Pyarn -DskipTests clean package > > > > # Apache Hadoop 2.7.X and later > > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > > > But neither works for me (I stay directly under spark root directory and > run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package" > > > > and > > > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > > > Now build works. But when I run spark-shell. I got the following error. > > > > D:\apache\spark\bin>spark-shell > > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > >at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > >at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > >at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > >at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > >at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > >at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > >at scala.Option.getOrElse(Option.scala:189) > >at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > >at org.apache.spark.deploy
Re: Is it feasible to build and run Spark on Windows?
ark Project Streaming SUCCESS [03:16 > min] > [INFO] Spark Project Catalyst . SUCCESS [08:45 > min] > [INFO] Spark Project SQL .. SUCCESS [12:12 > min] > [INFO] Spark Project ML Library ... SUCCESS [ 16:28 > h] > [INFO] Spark Project Tools SUCCESS [ 23.602 > s] > [INFO] Spark Project Hive . SUCCESS [07:50 > min] > [INFO] Spark Project Graph API SUCCESS [ 8.734 > s] > [INFO] Spark Project Cypher ... SUCCESS [ 12.420 > s] > [INFO] Spark Project Graph SUCCESS [ 10.186 > s] > [INFO] Spark Project REPL . SUCCESS [01:03 > min] > [INFO] Spark Project YARN Shuffle Service . SUCCESS [01:19 > min] > [INFO] Spark Project YARN . SUCCESS [02:19 > min] > [INFO] Spark Project Assembly . SUCCESS [ 18.912 > s] > [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [ 57.925 > s] > [INFO] Spark Integration for Kafka 0.10 ... SUCCESS [01:20 > min] > [INFO] Kafka 0.10+ Source for Structured Streaming SUCCESS [02:26 > min] > [INFO] Spark Project Examples . SUCCESS [02:00 > min] > [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [ 28.354 > s] > [INFO] Spark Avro . SUCCESS [01:44 > min] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 17:30 h > [INFO] Finished at: 2019-12-05T12:20:01-08:00 > [INFO] > > > D:\apache\spark>cd bin > > D:\apache\spark\bin>ls > beeline load-spark-env.cmd run-example spark-shell > spark-sql2.cmd sparkR.cmd > beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd > spark-submit sparkR2.cmd > docker-image-tool.sh pyspark spark-class spark-shell2.cmd > spark-submit.cmd > find-spark-home pyspark.cmd spark-class.cmd spark-sql > spark-submit2.cmd > find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd > sparkR > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at >org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at >org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at >org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) > at scala.Option.getOrElse(Option.scala:189) > at >org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at >org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at >org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > D:\apache\spark\bin> > > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: >> >> What was the build error? you didn't say. Are you sure it succeeded? >> Try running from the Spark home dir, not bin. >> I know we do run Windows tests and it appears to pass tests, etc. >> >> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: >> > >> > Hello, >> > >> > I understand Spark is preferably built on Linux. But I have a Windows >> > machine with a slow Virtual Box for Linux. So I wish I am able to build >> > and run Spark code on Windows
Re: Is it feasible to build and run Spark on Windows?
Project ML Local Library . SUCCESS > [01:51 min] > > [INFO] Spark Project GraphX ... SUCCESS > [02:20 min] > > [INFO] Spark Project Streaming SUCCESS > [03:16 min] > > [INFO] Spark Project Catalyst . SUCCESS > [08:45 min] > > [INFO] Spark Project SQL .. SUCCESS > [12:12 min] > > [INFO] Spark Project ML Library ... SUCCESS [ > 16:28 h] > > [INFO] Spark Project Tools SUCCESS [ > 23.602 s] > > [INFO] Spark Project Hive . SUCCESS > [07:50 min] > > [INFO] Spark Project Graph API SUCCESS [ > 8.734 s] > > [INFO] Spark Project Cypher ... SUCCESS [ > 12.420 s] > > [INFO] Spark Project Graph SUCCESS [ > 10.186 s] > > [INFO] Spark Project REPL . SUCCESS > [01:03 min] > > [INFO] Spark Project YARN Shuffle Service . SUCCESS > [01:19 min] > > [INFO] Spark Project YARN . SUCCESS > [02:19 min] > > [INFO] Spark Project Assembly . SUCCESS [ > 18.912 s] > > [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [ > 57.925 s] > > [INFO] Spark Integration for Kafka 0.10 ... SUCCESS > [01:20 min] > > [INFO] Kafka 0.10+ Source for Structured Streaming SUCCESS > [02:26 min] > > [INFO] Spark Project Examples . SUCCESS > [02:00 min] > > [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [ > 28.354 s] > > [INFO] Spark Avro . SUCCESS > [01:44 min] > > [INFO] > > > [INFO] BUILD SUCCESS > > [INFO] > > > [INFO] Total time: 17:30 h > > [INFO] Finished at: 2019-12-05T12:20:01-08:00 > > [INFO] > > > > > D:\apache\spark>cd bin > > > > D:\apache\spark\bin>ls > > beeline load-spark-env.cmd run-example spark-shell > spark-sql2.cmd sparkR.cmd > > beeline.cmd load-spark-env.sh run-example.cmd > spark-shell.cmd spark-submit sparkR2.cmd > > docker-image-tool.sh pyspark spark-class > spark-shell2.cmd spark-submit.cmd > > find-spark-home pyspark.cmd spark-class.cmd spark-sql > spark-submit2.cmd > > find-spark-home.cmd pyspark2.cmdspark-class2.cmd > spark-sql.cmd sparkR > > > > D:\apache\spark\bin>spark-shell > > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > > at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > > at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > > at scala.Option.getOrElse(Option.scala:189) > > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > > at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > > D:\apache\spark\bin> > > > > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: > >> > >> What was the build error? you didn't say. Are y
Re: Is it feasible to build and run Spark on Windows?
Such type exception could occur if a dependency (most likely Guava) version is not supported by the Spark version. What is the Spark and Guava versions? Use a more recent Guava version dependency in Maven pom.xml. Regarding Docker, a cloud platform instance such as EC2 could be used with Hyper-V support. On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu wrote: Hi Deepak, Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email. Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if possible. Thanks! Ping On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra wrote: Several alternatives are available: - Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven - Use Docker image for CDH on WindowsDocker Hub | | | | Docker Hub | | | On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen wrote: What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build and > run Spark code on Windows environment. > > Unfortunately, > > # Apache Hadoop 2.6.X > ./build/mvn -Pyarn -DskipTests clean package > > # Apache Hadoop 2.7.X and later > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > But neither works for me (I stay directly under spark root directory and run > "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" > > and > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > Now build works. But when I run spark-shell. I got the following error. > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at >org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at >org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at >org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) > at scala.Option.getOrElse(Option.scala:189) > at >org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at >org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at >org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > Has anyone experienced building and running Spark source code successfully on > Windows? Could you please share your experience? > > Thanks a lot! > > Ping > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Hi Deepak, Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email. Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if possible. Thanks! Ping On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra wrote: > Several alternatives are available: > > - Use Maven to build Spark on Windows. > http://spark.apache.org/docs/latest/building-spark.html#apache-maven > > - Use Docker image for CDH on Windows > Docker Hub <https://hub.docker.com/u/cloudera> > > Docker Hub > > <https://hub.docker.com/u/cloudera> > > > > > On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen < > sro...@gmail.com> wrote: > > > What was the build error? you didn't say. Are you sure it succeeded? > Try running from the Spark home dir, not bin. > I know we do run Windows tests and it appears to pass tests, etc. > > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > > > Hello, > > > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build > and run Spark code on Windows environment. > > > > Unfortunately, > > > > # Apache Hadoop 2.6.X > > ./build/mvn -Pyarn -DskipTests clean package > > > > # Apache Hadoop 2.7.X and later > > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > > > But neither works for me (I stay directly under spark root directory and > run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package" > > > > and > > > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > > > Now build works. But when I run spark-shell. I got the following error. > > > > D:\apache\spark\bin>spark-shell > > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > >at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > >at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > >at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > >at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > >at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > >at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > >at scala.Option.getOrElse(Option.scala:189) > >at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > >at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > >at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > >at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > >at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > >at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > >at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > >at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > > > > Has anyone experienced building and running Spark source code > successfully on Windows? Could you please share your experience? > > > > Thanks a lot! > > > > Ping > > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > >
Re: Is it feasible to build and run Spark on Windows?
at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > D:\apache\spark\bin> > > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: >> >> What was the build error? you didn't say. Are you sure it succeeded? >> Try running from the Spark home dir, not bin. >> I know we do run Windows tests and it appears to pass tests, etc. >> >> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: >> > >> > Hello, >> > >> > I understand Spark is preferably built on Linux. But I have a Windows >> > machine with a slow Virtual Box for Linux. So I wish I am able to build >> > and run Spark code on Windows environment. >> > >> > Unfortunately, >> > >> > # Apache Hadoop 2.6.X >> > ./build/mvn -Pyarn -DskipTests clean package >> > >> > # Apache Hadoop 2.7.X and later >> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean >> > package >> > >> > >> > Both are listed on >> > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn >> > >> > But neither works for me (I stay directly under spark root directory and >> > run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean >> > package" >> > >> > and >> > >> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests >> > clean package" >> > >> > Now build works. But when I run spark-shell. I got the following error. >> > >> > D:\apache\spark\bin>spark-shell >> > Exception in thread "main" java.lang.NoSuchMethodError: >> > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V >> > at >> > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) >> > at >> > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) >> > at >> > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) >> > at >> > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) >> > at >> > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) >> > at >> > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown >> > Source) >> > at scala.Option.getOrElse(Option.scala:189) >> > at >> > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) >> > at >> > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) >> > at >> > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) >> > at >> > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) >> > at >> > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) >> > at >> > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) >> > at >> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) >> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> > >> > >> > Has anyone experienced building and running Spark source code successfully >> > on Windows? Could you please share your experience? >> > >> > Thanks a lot! >> > >> > Ping >> > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
ache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) D:\apache\spark\bin> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen wrote: > What was the build error? you didn't say. Are you sure it succeeded? > Try running from the Spark home dir, not bin. > I know we do run Windows tests and it appears to pass tests, etc. > > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > > > Hello, > > > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build > and run Spark code on Windows environment. > > > > Unfortunately, > > > > # Apache Hadoop 2.6.X > > ./build/mvn -Pyarn -DskipTests clean package > > > > # Apache Hadoop 2.7.X and later > > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > > > But neither works for me (I stay directly under spark root directory and > run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package" > > > > and > > > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > > > Now build works. But when I run spark-shell. I got the following error. > > > > D:\apache\spark\bin>spark-shell > > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > > at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > > at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > > at scala.Option.getOrElse(Option.scala:189) > > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > > at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > > > > Has anyone experienced building and running Spark source code > successfully on Windows? Could you please share your experience? > > > > Thanks a lot! > > > > Ping > > >
Re: Is it feasible to build and run Spark on Windows?
Several alternatives are available: - Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven - Use Docker image for CDH on WindowsDocker Hub | | | | Docker Hub | | | On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen wrote: What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build and > run Spark code on Windows environment. > > Unfortunately, > > # Apache Hadoop 2.6.X > ./build/mvn -Pyarn -DskipTests clean package > > # Apache Hadoop 2.7.X and later > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > But neither works for me (I stay directly under spark root directory and run > "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" > > and > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > Now build works. But when I run spark-shell. I got the following error. > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at >org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at >org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at >org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) > at scala.Option.getOrElse(Option.scala:189) > at >org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at >org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at >org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at >org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > Has anyone experienced building and running Spark source code successfully on > Windows? Could you please share your experience? > > Thanks a lot! > > Ping > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build and > run Spark code on Windows environment. > > Unfortunately, > > # Apache Hadoop 2.6.X > ./build/mvn -Pyarn -DskipTests clean package > > # Apache Hadoop 2.7.X and later > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package > > > Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > > But neither works for me (I stay directly under spark root directory and run > "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" > > and > > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests > clean package" > > Now build works. But when I run spark-shell. I got the following error. > > D:\apache\spark\bin>spark-shell > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > Has anyone experienced building and running Spark source code successfully on > Windows? Could you please share your experience? > > Thanks a lot! > > Ping > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Is it feasible to build and run Spark on Windows?
Hello, I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment. Unfortunately, # Apache Hadoop 2.6.X ./build/mvn -Pyarn -DskipTests clean package # Apache Hadoop 2.7.X and later ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package" and Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package" Now build works. But when I run spark-shell. I got the following error. D:\apache\spark\bin>spark-shell Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience? Thanks a lot! Ping
Re: Not able to sort out environment settings to start spark from windows
Thank you. But there is no special char or space, I actually copied it from Program Files to the root to ensure no space in the path. ** *Sincerely yours,* *Raymond* On Sat, Jun 16, 2018 at 3:42 PM, vaquar khan wrote: > Plz check ur Java Home path . > May be spacial char or space on ur path. > > Regards, > Vaquar khan > > On Sat, Jun 16, 2018, 1:36 PM Raymond Xie wrote: > >> I am trying to run spark-shell in Windows but receive error of: >> >> \Java\jre1.8.0_151\bin\java was unexpected at this time. >> >> Environment: >> >> System variables: >> >> SPARK_HOME: >> >> c:\spark >> >> Path: >> >> C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\ >> ProgramData\Anaconda2;C:\ProgramData\Anaconda2\Library\ >> mingw-w64\bin;C:\ProgramData\Anaconda2\Library\usr\bin;C:\ >> ProgramData\Anaconda2\Library\bin;C:\ProgramData\Anaconda2\ >> Scripts;C:\ProgramData\Oracle\Java\javapath;C:\Windows\ >> system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\ >> WindowsPowerShell\v1.0\;I:\Anaconda2;I:\Anaconda2\ >> Scripts;I:\Anaconda2\Library\bin;C:\Program Files >> (x86)\sbt\\bin;C:\Program Files (x86)\Microsoft SQL >> Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL >> Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL >> Server\100\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL >> Server\100\Tools\Binn\VSShell\Common7\IDE\;C:\Program Files >> (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\;C:\Program >> Files (x86)\Microsoft SQL Server\100\DTS\Binn\;%DDPATH%; >> %USERPROFILE%\.dnx\bin;C:\Program Files\Microsoft DNX\Dnvm\;C:\Program >> Files\Microsoft SQL >> Server\130\Tools\Binn\;C:\jre1.8.0_151\bin\server;C:\Program >> Files (x86)\OpenSSH\bin;C:\Program Files (x86)\Calibre2\;C:\Program >> Files\nodejs\;C:\Program Files (x86)\Skype\Phone\; >> %JAVA_HOME%\bin;%JAVA_HOME%\jre\bin;C:\Program Files >> (x86)\scala\bin;C:\hadoop\bin;C:\Program Files\Git\cmd;I:\Program >> Files\EmEditor; C:\RXIE\Learning\Spark\bin;C:\spark\bin >> >> JAVA_HOME: >> >> C:\jdk1.8.0_151\bin >> >> JDK_HOME: >> >> C:\jdk1.8.0_151 >> >> I also copied all C:\jdk1.8.0_151 to C:\Java\jdk1.8.0_151, and >> received the same error. >> >> Any help is greatly appreciated. >> >> Thanks. >> >> >> >> >> ** >> *Sincerely yours,* >> >> >> *Raymond* >> >
Re: Not able to sort out environment settings to start spark from windows
Plz check ur Java Home path . May be spacial char or space on ur path. Regards, Vaquar khan On Sat, Jun 16, 2018, 1:36 PM Raymond Xie wrote: > I am trying to run spark-shell in Windows but receive error of: > > \Java\jre1.8.0_151\bin\java was unexpected at this time. > > Environment: > > System variables: > > SPARK_HOME: > > c:\spark > > Path: > > C:\Program Files (x86)\Common > Files\Oracle\Java\javapath;C:\ProgramData\Anaconda2;C:\ProgramData\Anaconda2\Library\mingw-w64\bin;C:\ProgramData\Anaconda2\Library\usr\bin;C:\ProgramData\Anaconda2\Library\bin;C:\ProgramData\Anaconda2\Scripts;C:\ProgramData\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;I:\Anaconda2;I:\Anaconda2\Scripts;I:\Anaconda2\Library\bin;C:\Program > Files (x86)\sbt\\bin;C:\Program Files (x86)\Microsoft SQL > Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL > Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL > Server\100\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL > Server\100\Tools\Binn\VSShell\Common7\IDE\;C:\Program Files (x86)\Microsoft > Visual Studio 9.0\Common7\IDE\PrivateAssemblies\;C:\Program Files > (x86)\Microsoft SQL > Server\100\DTS\Binn\;%DDPATH%;%USERPROFILE%\.dnx\bin;C:\Program > Files\Microsoft DNX\Dnvm\;C:\Program Files\Microsoft SQL > Server\130\Tools\Binn\;C:\jre1.8.0_151\bin\server;C:\Program Files > (x86)\OpenSSH\bin;C:\Program Files (x86)\Calibre2\;C:\Program > Files\nodejs\;C:\Program Files (x86)\Skype\Phone\; > %JAVA_HOME%\bin;%JAVA_HOME%\jre\bin;C:\Program Files > (x86)\scala\bin;C:\hadoop\bin;C:\Program Files\Git\cmd;I:\Program > Files\EmEditor; C:\RXIE\Learning\Spark\bin;C:\spark\bin > > JAVA_HOME: > > C:\jdk1.8.0_151\bin > > JDK_HOME: > > C:\jdk1.8.0_151 > > I also copied all C:\jdk1.8.0_151 to C:\Java\jdk1.8.0_151, and received > the same error. > > Any help is greatly appreciated. > > Thanks. > > > > > ** > *Sincerely yours,* > > > *Raymond* >
Not able to sort out environment settings to start spark from windows
I am trying to run spark-shell in Windows but receive error of: \Java\jre1.8.0_151\bin\java was unexpected at this time. Environment: System variables: SPARK_HOME: c:\spark Path: C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\ProgramData\Anaconda2;C:\ProgramData\Anaconda2\Library\mingw-w64\bin;C:\ProgramData\Anaconda2\Library\usr\bin;C:\ProgramData\Anaconda2\Library\bin;C:\ProgramData\Anaconda2\Scripts;C:\ProgramData\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;I:\Anaconda2;I:\Anaconda2\Scripts;I:\Anaconda2\Library\bin;C:\Program Files (x86)\sbt\\bin;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\;C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\;%DDPATH%;%USERPROFILE%\.dnx\bin;C:\Program Files\Microsoft DNX\Dnvm\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\jre1.8.0_151\bin\server;C:\Program Files (x86)\OpenSSH\bin;C:\Program Files (x86)\Calibre2\;C:\Program Files\nodejs\;C:\Program Files (x86)\Skype\Phone\; %JAVA_HOME%\bin;%JAVA_HOME%\jre\bin;C:\Program Files (x86)\scala\bin;C:\hadoop\bin;C:\Program Files\Git\cmd;I:\Program Files\EmEditor; C:\RXIE\Learning\Spark\bin;C:\spark\bin JAVA_HOME: C:\jdk1.8.0_151\bin JDK_HOME: C:\jdk1.8.0_151 I also copied all C:\jdk1.8.0_151 to C:\Java\jdk1.8.0_151, and received the same error. Any help is greatly appreciated. Thanks. ** *Sincerely yours,* *Raymond*
Apache spark on windows without shortnames enabled
Hi, We use Apache Spark 2.2.0 in our stack. Our software by default like other softwares gets installed under "C:\Program Files\". We have a restriction that we cannot ask our customers to enable short names on their machines. From our experience, spark does not handle the absolute paths well if there is a whitespace in the path neither while calling spark-class2.cmd from commandline nor the paths inside spark-env.cmd. We have tried the following: 1. Using double quotes around the path. 2. Escaping the whitespaces. 3. Using relative paths. None of them have been successful in bringing up spark. How do you recommend handling this? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Add snappy support for spark in Windows
I have put winutils and hadoop.dll within HADOOP_HOME, and spark works well with it, but snappy decompress function throw the above exception. Regard, Junfeng Chen On Mon, Dec 4, 2017 at 7:07 PM, Qiao, Richard <richard.q...@capitalone.com> wrote: > Junjeng, it worth a try to start your spark local with > hadoop.dll/winutils.exe etc hadoop windows support package in HADOOP_HOME, > if you didn’t do that yet. > > > > Best Regards > > Richard > > > > > > *From: *Junfeng Chen <darou...@gmail.com> > *Date: *Monday, December 4, 2017 at 3:53 AM > *To: *"Qiao, Richard" <richard.q...@capitalone.com> > *Cc: *"user@spark.apache.org" <user@spark.apache.org> > *Subject: *Re: Add snappy support for spark in Windows > > > > But I am working on my local development machine, so it should have no > relative to workers/executers. > > > > I find some documents about enable snappy on hadoop. If I want to use > snappy with spark, do I need to config spark as hadoop or have some easy > way to access it? > > > > > Regard, > Junfeng Chen > > > > On Mon, Dec 4, 2017 at 4:12 PM, Qiao, Richard <richard.q...@capitalone.com> > wrote: > > It seems a common mistake that the path is not accessible by > workers/executors. > > > > Best regards > > Richard > > Sent from my iPhone > > > On Dec 3, 2017, at 22:32, Junfeng Chen <darou...@gmail.com> wrote: > > I am working on importing snappy compressed json file into spark rdd or > dataset. However I meet this error: java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z > > I have set the following configuration: > > SparkConf conf = new SparkConf() > > .setAppName("normal spark") > > .setMaster("local") > > .set("spark.io.compression.codec", > "org.apache.spark.io.SnappyCompressionCodec") > > > .set("spark.driver.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") > > > .set("spark.driver.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") > > > .set("spark.executor.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") > > > .set("spark.executor.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") > > ; > > Where D:\Downloads\spark-2.2.0-bin-hadoop2.7 is my spark unpacked path, > and I can find the snappy jar file snappy-0.2.jar and > snappy-java-1.1.2.6.jar in > > D:\Downloads\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7\jars\ > > However nothing works and even the error message not change. > > How can I fix it? > > > > ref of stackoverflow: https://stackoverflow.com/questions/ > 47626012/config-snappy-support-for-spark-in-windows > <https://stackoverflow.com/questions/47626012/config-snappy-support-for-spark-in-windows> > > > > > > > Regard, > Junfeng Chen > > > -- > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. > > > > -- > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >
Re: Add snappy support for spark in Windows
Junjeng, it worth a try to start your spark local with hadoop.dll/winutils.exe etc hadoop windows support package in HADOOP_HOME, if you didn’t do that yet. Best Regards Richard From: Junfeng Chen <darou...@gmail.com> Date: Monday, December 4, 2017 at 3:53 AM To: "Qiao, Richard" <richard.q...@capitalone.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: Add snappy support for spark in Windows But I am working on my local development machine, so it should have no relative to workers/executers. I find some documents about enable snappy on hadoop. If I want to use snappy with spark, do I need to config spark as hadoop or have some easy way to access it? Regard, Junfeng Chen On Mon, Dec 4, 2017 at 4:12 PM, Qiao, Richard <richard.q...@capitalone.com<mailto:richard.q...@capitalone.com>> wrote: It seems a common mistake that the path is not accessible by workers/executors. Best regards Richard Sent from my iPhone On Dec 3, 2017, at 22:32, Junfeng Chen <darou...@gmail.com<mailto:darou...@gmail.com>> wrote: I am working on importing snappy compressed json file into spark rdd or dataset. However I meet this error: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z I have set the following configuration: SparkConf conf = new SparkConf() .setAppName("normal spark") .setMaster("local") .set("spark.io.compression.codec", "org.apache.spark.io<http://org.apache.spark.io>.SnappyCompressionCodec") .set("spark.driver.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") .set("spark.driver.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") .set("spark.executor.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") .set("spark.executor.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") ; Where D:\Downloads\spark-2.2.0-bin-hadoop2.7 is my spark unpacked path, and I can find the snappy jar file snappy-0.2.jar and snappy-java-1.1.2.6.jar in D:\Downloads\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7\jars\ However nothing works and even the error message not change. How can I fix it? ref of stackoverflow: https://stackoverflow.com/questions/47626012/config-snappy-support-for-spark-in-windows <https://stackoverflow.com/questions/47626012/config-snappy-support-for-spark-in-windows> Regard, Junfeng Chen The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Re: Add snappy support for spark in Windows
But I am working on my local development machine, so it should have no relative to workers/executers. I find some documents about enable snappy on hadoop. If I want to use snappy with spark, do I need to config spark as hadoop or have some easy way to access it? Regard, Junfeng Chen On Mon, Dec 4, 2017 at 4:12 PM, Qiao, Richard <richard.q...@capitalone.com> wrote: > It seems a common mistake that the path is not accessible by > workers/executors. > > Best regards > Richard > > Sent from my iPhone > > On Dec 3, 2017, at 22:32, Junfeng Chen <darou...@gmail.com> wrote: > > I am working on importing snappy compressed json file into spark rdd or > dataset. However I meet this error: java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z > > I have set the following configuration: > > SparkConf conf = new SparkConf() > .setAppName("normal spark") > .setMaster("local") > .set("spark.io.compression.codec", > "org.apache.spark.io.SnappyCompressionCodec") > > .set("spark.driver.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") > > .set("spark.driver.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") > > .set("spark.executor.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") > > .set("spark.executor.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") > ; > > Where D:\Downloads\spark-2.2.0-bin-hadoop2.7 is my spark unpacked path, > and I can find the snappy jar file snappy-0.2.jar and > snappy-java-1.1.2.6.jar in > > D:\Downloads\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7\jars\ > > However nothing works and even the error message not change. > > How can I fix it? > > > ref of stackoverflow: https://stackoverflow.com/questions/47626012/ > config-snappy-support-for-spark-in-windows > <https://stackoverflow.com/questions/47626012/config-snappy-support-for-spark-in-windows> > > > > Regard, > Junfeng Chen > > > -- > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >
Re: Add snappy support for spark in Windows
It seems a common mistake that the path is not accessible by workers/executors. Best regards Richard Sent from my iPhone On Dec 3, 2017, at 22:32, Junfeng Chen <darou...@gmail.com<mailto:darou...@gmail.com>> wrote: I am working on importing snappy compressed json file into spark rdd or dataset. However I meet this error: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z I have set the following configuration: SparkConf conf = new SparkConf() .setAppName("normal spark") .setMaster("local") .set("spark.io.compression.codec", "org.apache.spark.io<http://org.apache.spark.io>.SnappyCompressionCodec") .set("spark.driver.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") .set("spark.driver.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") .set("spark.executor.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") .set("spark.executor.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") ; Where D:\Downloads\spark-2.2.0-bin-hadoop2.7 is my spark unpacked path, and I can find the snappy jar file snappy-0.2.jar and snappy-java-1.1.2.6.jar in D:\Downloads\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7\jars\ However nothing works and even the error message not change. How can I fix it? ref of stackoverflow: https://stackoverflow.com/questions/47626012/config-snappy-support-for-spark-in-windows <https://stackoverflow.com/questions/47626012/config-snappy-support-for-spark-in-windows> Regard, Junfeng Chen The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Add snappy support for spark in Windows
I am working on importing snappy compressed json file into spark rdd or dataset. However I meet this error: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z I have set the following configuration: SparkConf conf = new SparkConf() .setAppName("normal spark") .setMaster("local") .set("spark.io.compression.codec", "org.apache.spark.io.SnappyCompressionCodec") .set("spark.driver.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") .set("spark.driver.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") .set("spark.executor.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") .set("spark.executor.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars") ; Where D:\Downloads\spark-2.2.0-bin-hadoop2.7 is my spark unpacked path, and I can find the snappy jar file snappy-0.2.jar and snappy-java-1.1.2.6.jar in D:\Downloads\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7\jars\ However nothing works and even the error message not change. How can I fix it? ref of stackoverflow: https://stackoverflow.com/questions/ 47626012/config-snappy-support-for-spark-in-windows <https://stackoverflow.com/questions/47626012/config-snappy-support-for-spark-in-windows> Regard, Junfeng Chen
Re: Spark (on Windows) not picking up HADOOP_CONF_DIR
Hi, How did you set it? How do you run the app? Use sys.env to know whether it was set or not. Jacek On 17 Jul 2016 11:33 a.m., "Daniel Haviv"wrote: > Hi, > I'm running Spark using IntelliJ on Windows and even though I set > HADOOP_CONF_DIR it does not affect the contents of sc.hadoopConfiguration. > > Anybody encountered it ? > > Thanks, > Daniel >
Spark (on Windows) not picking up HADOOP_CONF_DIR
Hi, I'm running Spark using IntelliJ on Windows and even though I set HADOOP_CONF_DIR it does not affect the contents of sc.hadoopConfiguration. Anybody encountered it ? Thanks, Daniel
Re: Spark on Windows platform
If all you want is Spark standalone then its as simple as installing the binaries and calling Spark submit passing your main class. I would advise against running on Hadoop on Windows, it's a bit of trouble. But yes you can do it if you want to. Regards Sab Regards Sab On 29-Feb-2016 6:58 pm, "gaurav pathak" <gauravpathak...@gmail.com> wrote: > Can someone guide me the steps and information regarding, installation of > SPARK on Windows 7/8.1/10 , as well as on Windows Server. Also, it will be > great to read your experiences in using SPARK on Windows platform. > > > Thanks & Regards, > Gaurav Pathak >
Re: Spark on Windows platform
On 29 Feb 2016, at 13:40, gaurav pathak <gauravpathak...@gmail.com<mailto:gauravpathak...@gmail.com>> wrote: Thanks Jorn. Any guidance on how to get started with getting SPARK on Windows, is highly appreciated. Thanks & Regards Gaurav Pathak you are at risk of seeing stack traces when you try to talk to the local filesystem, on account of (a) hadoop being part of the process and (b) it needing some native windows binaries details: https://wiki.apache.org/hadoop/WindowsProblems those binaries: https://github.com/steveloughran/winutils (I need to add some 2.7.2 binaries in there, by the look of things)
Re: Spark on Windows platform
> Hi > I am running spark on windows but a standalone one. > > Use this code > > SparkConf conf = new SparkConf().setMaster("local[1]").seatAppName("spark").setSparkHome("c:/spark/bin/spark-submit.cmd"); > > Where sparkhome is the path where u extracted ur spark binaries till bin/*.cmd > > You will get spark context or streaming context > > Thanks > > On Feb 29, 2016 7:10 PM, "gaurav pathak" <gauravpathak...@gmail.com> wrote: >> >> Thanks Jorn. >> >> Any guidance on how to get started with getting SPARK on Windows, is highly appreciated. >> >> Thanks & Regards >> >> Gaurav Pathak >> >> ~ sent from handheld device >> >> On Feb 29, 2016 5:34 AM, "Jörn Franke" <jornfra...@gmail.com> wrote: >>> >>> I think Hortonworks has a Windows Spark distribution. Maybe Bigtop as well? >>> >>> > On 29 Feb 2016, at 14:27, gaurav pathak <gauravpathak...@gmail.com> wrote: >>> > >>> > Can someone guide me the steps and information regarding, installation of SPARK on Windows 7/8.1/10 , as well as on Windows Server. Also, it will be great to read your experiences in using SPARK on Windows platform. >>> > >>> > >>> > Thanks & Regards, >>> > Gaurav Pathak
Re: Spark on Windows platform
Thanks Jorn. Any guidance on how to get started with getting SPARK on Windows, is highly appreciated. Thanks & Regards Gaurav Pathak ~ sent from handheld device On Feb 29, 2016 5:34 AM, "Jörn Franke" <jornfra...@gmail.com> wrote: > I think Hortonworks has a Windows Spark distribution. Maybe Bigtop as well? > > > On 29 Feb 2016, at 14:27, gaurav pathak <gauravpathak...@gmail.com> > wrote: > > > > Can someone guide me the steps and information regarding, installation > of SPARK on Windows 7/8.1/10 , as well as on Windows Server. Also, it will > be great to read your experiences in using SPARK on Windows platform. > > > > > > Thanks & Regards, > > Gaurav Pathak >
Re: Spark on Windows platform
I think Hortonworks has a Windows Spark distribution. Maybe Bigtop as well? > On 29 Feb 2016, at 14:27, gaurav pathak <gauravpathak...@gmail.com> wrote: > > Can someone guide me the steps and information regarding, installation of > SPARK on Windows 7/8.1/10 , as well as on Windows Server. Also, it will be > great to read your experiences in using SPARK on Windows platform. > > > Thanks & Regards, > Gaurav Pathak - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark on Windows platform
Can someone guide me the steps and information regarding, installation of SPARK on Windows 7/8.1/10 , as well as on Windows Server. Also, it will be great to read your experiences in using SPARK on Windows platform. Thanks & Regards, Gaurav Pathak
Re: Spark on Windows
You can check "spark.master" property in conf/spark-defaults.conf and try to give IP of the VM in place of "localhost". On Tue, Feb 16, 2016 at 7:48 AM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > I am new to spark and starting working on it by writing small programs. I > am able to run those in cloudera quickstart VM but not able to run in the > eclipse when giving master URL > > *Steps I perfromed:* > > Started Master and can access it through http://localhost:8080 > > Started worker and access it. > > Ran the wordcount by giving master as spark://localhost:7077 but no output > and I cant see the application Id also in master web UI. > > I tried with master as local and was able to run successfully. I want to > run on the master so that I can view logs in master and worker. any > suggestions for this? > > Thanks, > Asmath > > >
Spark on Windows
Hi, I am new to spark and starting working on it by writing small programs. I am able to run those in cloudera quickstart VM but not able to run in the eclipse when giving master URL *Steps I perfromed:* Started Master and can access it through http://localhost:8080 Started worker and access it. Ran the wordcount by giving master as spark://localhost:7077 but no output and I cant see the application Id also in master web UI. I tried with master as local and was able to run successfully. I want to run on the master so that I can view logs in master and worker. any suggestions for this? Thanks, Asmath
Spark 1.5.2 error on quitting spark in windows 7
If I start spark-shell then just quit, I get an error. scala> :q Stopping spark context. 15/12/09 23:43:32 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\Stefan\AppData\Local\Temp\spark-68d3a813-9c55-4649-aa7a-5fc269e669e7 java.io.IOException: Failed to delete: C:\Users\Stefan\AppData\Local\Temp\spark-68d3a813-9c55-4649-aa7a-5fc269e669e7 at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) *So, if u use winutils to examine the directory:* C:\Users\Stefan\AppData\Local\Temp>winutils ls spark-cb325426-4a3c-48ec-becc-baaa077bea1f drwx-- 1 BloomBear-SSD\Stefan BloomBear-SSD\None 0 Dec 10 2015 spark-cb325426-4a3c-48ec-becc-baaa077bea1f *I interpret this to mean that the OWNER has read/write/execute privs on this folder. So why does scala have a problem deleting it? Just for fun I also installed a set of windows executables that are ports of common UNIX utilities - http://sourceforge.net/projects/unxutils/?source=typ_redirect So now I can run a command like ls and get* C:\Users\Stefan\AppData\Local\Temp>ls -al total 61 drwxrwxrwx 1 user group 0 Dec 9 23:44 . drwxrwxrwx 1 user group 0 Dec 9 22:27 .. drwxrwxrwx 1 user group 0 Dec 9 23:43 61135062-623a-4624-b406-fbd0ae9308ae_resources drwxrwxrwx 1 user group 0 Dec 9 23:43 9cc17e8c-2941-4768-9f55-e740e54dab0b_resources -rw-rw-rw- 1 user group 0 Sep 4 2013 FXSAPIDebugLogFile.txt drwxrwxrwx 1 user group 0 Dec 9 23:43 Stefan -rw-rw-rw- 1 user group 16400 Dec 9 21:07 etilqs_3SQb9MejUX0BHwy -rw-rw-rw- 1 user group2052 Dec 9 21:41 etilqs_8YWZWJEClIYRrKf drwxrwxrwx 1 user group 0 Dec 9 23:43 hsperfdata_Stefan -rw-rw-rw- 1 user group 19968 Dec 9 23:09 jansi-64-1-8475478299913367674.11 -rw-rw-rw- 1 user group 18944 Dec 9 23:43 jansi-64-1.5.2.dll -rw-rw-rw- 1 user group2031 Dec 9 23:15 sbt3359615202868869571.log drwxrwxrwx 1 user group 0 Dec 9 23:43 spark-68d3a813-9c55-4649-aa7a-5fc269e669e7 *Now the spark directory is being seen by windows as fully readable by EVERYONE. In any event, can someone enlighten me about their environment to avoid this irritating error. Here is my environment: * windows 7 64 bit Spark 1.5.2 Scala 2.10.6 Python 2.7.10 (from Anaconda) PATH includes: C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin C:\ProgramData\Oracle\Java\javapath C:\Users\Stefan\scala C:\Users\Stefan\hadoop-2.6.0\bin C:\ProgramData\Oracle\Java\javapath SYSTEM variables set are: SPARK_HOME=C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6 JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0 (where the bin\winutils resides) winutils.exe chmod 777 /tmp/hive \tmp\hive directory at root on C; drive with full permissions, e.g. >winutils ls \tmp\hive drwxrwxrwx 1 BloomBear-SSD\Stefan BloomBear-SSD\None 0 Dec 8 2015 \tmp\hive -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-5-2-error-on-quitting-spark-in-windows-7-tp25659.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Error building Spark on Windows with sbt
I have not had any success building using sbt/sbt on windows. However, I have been able to binary by using maven command directly. From: Richard Eggert [mailto:richard.egg...@gmail.com] Sent: Sunday, October 25, 2015 12:51 PM To: Ted Yu <yuzhih...@gmail.com> Cc: User <user@spark.apache.org> Subject: Re: Error building Spark on Windows with sbt Yes, I know, but it would be nice to be able to test things myself before I push commits. On Sun, Oct 25, 2015 at 3:50 PM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: If you have a pull request, Jenkins can test your change for you. FYI On Oct 25, 2015, at 12:43 PM, Richard Eggert <richard.egg...@gmail.com<mailto:richard.egg...@gmail.com>> wrote: Also, if I run the Maven build on Windows or Linux without setting -DskipTests=true, it hangs indefinitely when it gets to org.apache.spark.JavaAPISuite. It's hard to test patches when the build doesn't work. :-/ On Sun, Oct 25, 2015 at 3:41 PM, Richard Eggert <richard.egg...@gmail.com<mailto:richard.egg...@gmail.com>> wrote: By "it works", I mean, "It gets past that particular error". It still fails several minutes later with a different error: java.lang.IllegalStateException: impossible to get artifacts when data has not been loaded. IvyNode = org.scala-lang#scala-library;2.10.3 On Sun, Oct 25, 2015 at 3:38 PM, Richard Eggert <richard.egg...@gmail.com<mailto:richard.egg...@gmail.com>> wrote: When I try to start up sbt for the Spark build, or if I try to import it in IntelliJ IDEA as an sbt project, it fails with a "No such file or directory" error when it attempts to "git clone" sbt-pom-reader into .sbt/0.13/staging/some-sha1-hash. If I manually create the expected directory before running sbt or importing into IntelliJ, then it works. Why is it necessary to do this, and what can be done to make it not necessary? Rich -- Rich -- Rich -- Rich
Re: Error building Spark on Windows with sbt
If you have a pull request, Jenkins can test your change for you. FYI > On Oct 25, 2015, at 12:43 PM, Richard Eggertwrote: > > Also, if I run the Maven build on Windows or Linux without setting > -DskipTests=true, it hangs indefinitely when it gets to > org.apache.spark.JavaAPISuite. > > It's hard to test patches when the build doesn't work. :-/ > >> On Sun, Oct 25, 2015 at 3:41 PM, Richard Eggert >> wrote: >> By "it works", I mean, "It gets past that particular error". It still fails >> several minutes later with a different error: >> >> java.lang.IllegalStateException: impossible to get artifacts when data has >> not been loaded. IvyNode = org.scala-lang#scala-library;2.10.3 >> >> >>> On Sun, Oct 25, 2015 at 3:38 PM, Richard Eggert >>> wrote: >>> When I try to start up sbt for the Spark build, or if I try to import it >>> in IntelliJ IDEA as an sbt project, it fails with a "No such file or >>> directory" error when it attempts to "git clone" sbt-pom-reader into >>> .sbt/0.13/staging/some-sha1-hash. >>> >>> If I manually create the expected directory before running sbt or importing >>> into IntelliJ, then it works. Why is it necessary to do this, and what can >>> be done to make it not necessary? >>> >>> Rich >>> >> >> >> >> -- >> Rich > > > > -- > Rich
Re: Error building Spark on Windows with sbt
Yes, I know, but it would be nice to be able to test things myself before I push commits. On Sun, Oct 25, 2015 at 3:50 PM, Ted Yuwrote: > If you have a pull request, Jenkins can test your change for you. > > FYI > > On Oct 25, 2015, at 12:43 PM, Richard Eggert > wrote: > > Also, if I run the Maven build on Windows or Linux without setting > -DskipTests=true, it hangs indefinitely when it gets to > org.apache.spark.JavaAPISuite. > > It's hard to test patches when the build doesn't work. :-/ > > On Sun, Oct 25, 2015 at 3:41 PM, Richard Eggert > wrote: > >> By "it works", I mean, "It gets past that particular error". It still >> fails several minutes later with a different error: >> >> java.lang.IllegalStateException: impossible to get artifacts when data >> has not been loaded. IvyNode = org.scala-lang#scala-library;2.10.3 >> >> >> On Sun, Oct 25, 2015 at 3:38 PM, Richard Eggert > > wrote: >> >>> When I try to start up sbt for the Spark build, or if I try to import >>> it in IntelliJ IDEA as an sbt project, it fails with a "No such file or >>> directory" error when it attempts to "git clone" sbt-pom-reader into >>> .sbt/0.13/staging/some-sha1-hash. >>> >>> If I manually create the expected directory before running sbt or >>> importing into IntelliJ, then it works. Why is it necessary to do this, >>> and what can be done to make it not necessary? >>> >>> Rich >>> >> >> >> >> -- >> Rich >> > > > > -- > Rich > > -- Rich
Error building Spark on Windows with sbt
When I try to start up sbt for the Spark build, or if I try to import it in IntelliJ IDEA as an sbt project, it fails with a "No such file or directory" error when it attempts to "git clone" sbt-pom-reader into .sbt/0.13/staging/some-sha1-hash. If I manually create the expected directory before running sbt or importing into IntelliJ, then it works. Why is it necessary to do this, and what can be done to make it not necessary? Rich
Re: Error building Spark on Windows with sbt
Also, if I run the Maven build on Windows or Linux without setting -DskipTests=true, it hangs indefinitely when it gets to org.apache.spark.JavaAPISuite. It's hard to test patches when the build doesn't work. :-/ On Sun, Oct 25, 2015 at 3:41 PM, Richard Eggertwrote: > By "it works", I mean, "It gets past that particular error". It still > fails several minutes later with a different error: > > java.lang.IllegalStateException: impossible to get artifacts when data has > not been loaded. IvyNode = org.scala-lang#scala-library;2.10.3 > > > On Sun, Oct 25, 2015 at 3:38 PM, Richard Eggert > wrote: > >> When I try to start up sbt for the Spark build, or if I try to import it >> in IntelliJ IDEA as an sbt project, it fails with a "No such file or >> directory" error when it attempts to "git clone" sbt-pom-reader into >> .sbt/0.13/staging/some-sha1-hash. >> >> If I manually create the expected directory before running sbt or >> importing into IntelliJ, then it works. Why is it necessary to do this, >> and what can be done to make it not necessary? >> >> Rich >> > > > > -- > Rich > -- Rich
Re: Error building Spark on Windows with sbt
By "it works", I mean, "It gets past that particular error". It still fails several minutes later with a different error: java.lang.IllegalStateException: impossible to get artifacts when data has not been loaded. IvyNode = org.scala-lang#scala-library;2.10.3 On Sun, Oct 25, 2015 at 3:38 PM, Richard Eggertwrote: > When I try to start up sbt for the Spark build, or if I try to import it > in IntelliJ IDEA as an sbt project, it fails with a "No such file or > directory" error when it attempts to "git clone" sbt-pom-reader into > .sbt/0.13/staging/some-sha1-hash. > > If I manually create the expected directory before running sbt or > importing into IntelliJ, then it works. Why is it necessary to do this, > and what can be done to make it not necessary? > > Rich > -- Rich
Re: Download Apache Spark on Windows 7 for a Proof of Concept installation
Use a Hadoop distribution that supports Windows and has Spark included. Generally - if you want to use windows - you should use the server version. Le sam. 25 juil. 2015 à 20:11, Peter Leventis pleven...@telkomsa.net a écrit : I just wanted an easy step by step guide as to exactly what version of what ever to download for a Proof of Concept installation of Apache Spark on Windows 7. I have spent quite some time following a number of different recipes to no avail. I have tried about 10 different permutations to date. I prefer the easiest approach, e.g. download Pre-build Version of ... etc -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Download-Apache-Spark-on-Windows-7-for-a-Proof-of-Concept-installation-tp23992.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Download Apache Spark on Windows 7 for a Proof of Concept installation
Thank you for the answers. I followed numerous recipes including videos and uncounted many obstacles such as 7-Zip unable to unzip the *.gx file and to the need to use SBT. My situation is fixed. I use a Windows 7 PC (not Linux). I would be very grateful for an approach that simply works. This is the first time in 15 years that I have struggled so much to download and install Open Source software from Apache. I managed to download and install Apache Drill in minutes. Apache Spark is just so awkward! Please help. Any version would do for the required proof of concept. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Download-Apache-Spark-on-Windows-7-for-a-Proof-of-Concept-installation-tp23992p23998.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Download Apache Spark on Windows 7 for a Proof of Concept installation
I just wanted an easy step by step guide as to exactly what version of what ever to download for a Proof of Concept installation of Apache Spark on Windows 7. I have spent quite some time following a number of different recipes to no avail. I have tried about 10 different permutations to date. I prefer the easiest approach, e.g. download Pre-build Version of ... etc -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Download-Apache-Spark-on-Windows-7-for-a-Proof-of-Concept-installation-tp23992.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: spark on Windows 2008 failed to save RDD to windows shared folder
It is Hadoop-2.4.0 with spark-1.3.0. I found that the problem only happen if there are multi nodes. If the cluster has only one node, it works fine. For example if the cluster has a spark-master on machine A and a spark-worker on machine B, this problem happen. If both spark-master and spark-worker are on machine A, then no problem. I do not use HDFS. I am just saving the RDD to a window share folder rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.objfile:///T:\lab4-win02\IndexRoot01\tobacco-07\myrdd.obj”) With T: drive mapped to \\10.196.119.230\mysharefile:///\\10.196.119.230\myshare Ningjun From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, May 22, 2015 5:02 PM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: spark on Windows 2008 failed to save RDD to windows shared folder The stack trace is related to hdfs. Can you tell us which hadoop release you are using ? Is this a secure cluster ? Thanks On Fri, May 22, 2015 at 1:55 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com wrote: I used spark standalone cluster on Windows 2008. I kept on getting the following error when trying to save an RDD to a windows shared folder rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.objfile:///T:\lab4-win02\IndexRoot01\tobacco-07\myrdd.obj”) 15/05/22 16:49:05 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12) java.io.IOException: Mkdirs failed to create file:/T:/lab4-win02/IndexRoot01/tobacco-07/tmp/docs-150522204904805.op/_temporary/0/_temporary/attempt_201505221649_0012_m_00_12 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1071) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527) at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) The T: drive is mapped to a windows shared folder, e.g. T: - \\10.196.119.230\mysharefile:///\\10.196.119.230\myshare The id running spark does have write permission to this folder. It works most of the time but failed sometime. Can anybody tell me what is the problem here? Please advise. Thanks.
spark on Windows 2008 failed to save RDD to windows shared folder
I used spark standalone cluster on Windows 2008. I kept on getting the following error when trying to save an RDD to a windows shared folder rdd.saveAsObjectFile(file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.obj) 15/05/22 16:49:05 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12) java.io.IOException: Mkdirs failed to create file:/T:/lab4-win02/IndexRoot01/tobacco-07/tmp/docs-150522204904805.op/_temporary/0/_temporary/attempt_201505221649_0012_m_00_12 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1071) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527) at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) The T: drive is mapped to a windows shared folder, e.g. T: - \\10.196.119.230\myshare The id running spark does have write permission to this folder. It works most of the time but failed sometime. Can anybody tell me what is the problem here? Please advise. Thanks.
Re: spark on Windows 2008 failed to save RDD to windows shared folder
The stack trace is related to hdfs. Can you tell us which hadoop release you are using ? Is this a secure cluster ? Thanks On Fri, May 22, 2015 at 1:55 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: I used spark standalone cluster on Windows 2008. I kept on getting the following error when trying to save an RDD to a windows shared folder rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.obj”) 15/05/22 16:49:05 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12) java.io.IOException: Mkdirs failed to create file:/T:/lab4-win02/IndexRoot01/tobacco-07/tmp/docs-150522204904805.op/_temporary/0/_temporary/attempt_201505221649_0012_m_00_12 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1071) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527) at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) The T: drive is mapped to a windows shared folder, e.g. T: - \\10.196.119.230\myshare The id running spark does have write permission to this folder. It works most of the time but failed sometime. Can anybody tell me what is the problem here? Please advise. Thanks.
Fwd: Change ivy cache for spark on Windows
+user -- Forwarded message -- From: Burak Yavuz brk...@gmail.com Date: Mon, Apr 27, 2015 at 1:59 PM Subject: Re: Change ivy cache for spark on Windows To: mj jone...@gmail.com Hi, In your conf file (SPARK_HOME\conf\spark-defaults.conf) you can set: `spark.jars.ivy \your\path` Best, Burak On Mon, Apr 27, 2015 at 1:49 PM, mj jone...@gmail.com wrote: Hi, I'm having trouble using the --packages option for spark-shell.cmd - I have to use Windows at work and have been issued a username with a space in it that means when I use the --packages option it fails with this message: Exception in thread main java.net.URISyntaxException: Illegal character in path at index 13: C:/Users/My Name/.ivy2/jars/spark-csv_2.10.jar The command I'm trying to run is: .\spark-shell.cmd --packages com.databricks:spark-csv_2.10:1.0.3 I've tried creating an ivysettings.xml file with the content below in my .ivy2 directory, but spark doesn't seem to pick it up. Does anyone have any ideas of how to get around this issue? ivysettings caches defaultCacheDir=c:\ivy_cache/ /ivysettings -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Change-ivy-cache-for-spark-on-Windows-tp22675.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Change ivy cache for spark on Windows
Hi, I'm having trouble using the --packages option for spark-shell.cmd - I have to use Windows at work and have been issued a username with a space in it that means when I use the --packages option it fails with this message: Exception in thread main java.net.URISyntaxException: Illegal character in path at index 13: C:/Users/My Name/.ivy2/jars/spark-csv_2.10.jar The command I'm trying to run is: .\spark-shell.cmd --packages com.databricks:spark-csv_2.10:1.0.3 I've tried creating an ivysettings.xml file with the content below in my .ivy2 directory, but spark doesn't seem to pick it up. Does anyone have any ideas of how to get around this issue? ivysettings caches defaultCacheDir=c:\ivy_cache/ /ivysettings -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Change-ivy-cache-for-spark-on-Windows-tp22675.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark on Windows
spark 'master' branch (i.e. v1.4.0) builds successfully on windows 8.1 intel i7 64-bit with oracle jdk8_45.with maven opts without the flag -XX:ReservedCodeCacheSize=1g. takes about 33 minutes. Thanking you. With Regards Sree On Thursday, April 16, 2015 9:07 PM, Arun Lists lists.a...@gmail.com wrote: Here is what I got from the engineer who worked on building Spark and using it on Windows: 1) Hadoop winutils.exe is needed on Windows, even for local files – and you have to set the Hadoop.home.dir in the spark-class2.cmd (for the two lines with $RUNNER near the end, by adding “-Dhadoop.home.dir=dir” file after downloading Hadoop binaries + winutils. 2) Java/Spark cannot delete the spark temporary files and it throws an exception (program still works though). Manual clean-up works just fine, and it is not a permissions issue as it has rights to create the file (I have also tried using my own directory rather than the default, same error).3) tried building Spark again, and have attached the log – I don’t get any errors, just warnings. However when I try to use that JAR I just get the error message “Error: Could not find or load main class org.apache.spark.deploy.SparkSubmit”. On Thu, Apr 16, 2015 at 12:19 PM, Arun Lists lists.a...@gmail.com wrote: Thanks, Matei! We'll try that and let you know if it works. You are correct in inferring that some of the problems we had were with dependencies. We also had problems with the spark-submit scripts. I will get the details from the engineer who worked on the Windows builds and provide them to you. arun On Thu, Apr 16, 2015 at 10:44 AM, Matei Zaharia matei.zaha...@gmail.com wrote: You could build Spark with Scala 2.11 on Mac / Linux and transfer it over to Windows. AFAIK it should build on Windows too, the only problem is that Maven might take a long time to download dependencies. What errors are you seeing? Matei On Apr 16, 2015, at 9:23 AM, Arun Lists lists.a...@gmail.com wrote: We run Spark on Mac and Linux but also need to run it on Windows 8.1 and Windows Server. We ran into problems with the Scala 2.10 binary bundle for Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we are on Scala 2.11.6 (we built Spark from the sources). On Windows, however despite our best efforts we cannot get Spark 1.3.0 as built from sources working for Scala 2.11.6. Spark has too many moving parts and dependencies! When can we expect to see a binary bundle for Spark 1.3.0 that is built for Scala 2.11.6? I read somewhere that the only reason that Spark 1.3.0 is still built for Scala 2.10 is because Kafka is still on Scala 2.10. For those of us who don't use Kafka, can we have a Scala 2.10 bundle. If there isn't an official bundle arriving any time soon, can someone who has built it for Windows 8.1 successfully please share with the group? Thanks, arun - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark on Windows
Thanks, Sree! Are you able to run your applications using spark-submit? Even after we were able to build successfully, we ran into problems with running the spark-submit script. If everything worked correctly for you, we can hope that things will be smoother when 1.4.0 is made generally available. arun On Thu, Apr 16, 2015 at 10:18 PM, Sree V sree_at_ch...@yahoo.com wrote: spark 'master' branch (i.e. v1.4.0) builds successfully on windows 8.1 intel i7 64-bit with oracle jdk8_45. with maven opts without the flag -XX:ReservedCodeCacheSize=1g. takes about 33 minutes. Thanking you. With Regards Sree On Thursday, April 16, 2015 9:07 PM, Arun Lists lists.a...@gmail.com wrote: Here is what I got from the engineer who worked on building Spark and using it on Windows: 1) Hadoop winutils.exe is needed on Windows, even for local files – and you have to set the Hadoop.home.dir in the spark-class2.cmd (for the two lines with $RUNNER near the end, by adding “-Dhadoop.home.dir=dir” file after downloading Hadoop binaries + winutils. 2) Java/Spark cannot delete the spark temporary files and it throws an exception (program still works though). Manual clean-up works just fine, and it is not a permissions issue as it has rights to create the file (I have also tried using my own directory rather than the default, same error). 3) tried building Spark again, and have attached the log – I don’t get any errors, just warnings. However when I try to use that JAR I just get the error message “Error: Could not find or load main class org.apache.spark.deploy.SparkSubmit”. On Thu, Apr 16, 2015 at 12:19 PM, Arun Lists lists.a...@gmail.com wrote: Thanks, Matei! We'll try that and let you know if it works. You are correct in inferring that some of the problems we had were with dependencies. We also had problems with the spark-submit scripts. I will get the details from the engineer who worked on the Windows builds and provide them to you. arun On Thu, Apr 16, 2015 at 10:44 AM, Matei Zaharia matei.zaha...@gmail.com wrote: You could build Spark with Scala 2.11 on Mac / Linux and transfer it over to Windows. AFAIK it should build on Windows too, the only problem is that Maven might take a long time to download dependencies. What errors are you seeing? Matei On Apr 16, 2015, at 9:23 AM, Arun Lists lists.a...@gmail.com wrote: We run Spark on Mac and Linux but also need to run it on Windows 8.1 and Windows Server. We ran into problems with the Scala 2.10 binary bundle for Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we are on Scala 2.11.6 (we built Spark from the sources). On Windows, however despite our best efforts we cannot get Spark 1.3.0 as built from sources working for Scala 2.11.6. Spark has too many moving parts and dependencies! When can we expect to see a binary bundle for Spark 1.3.0 that is built for Scala 2.11.6? I read somewhere that the only reason that Spark 1.3.0 is still built for Scala 2.10 is because Kafka is still on Scala 2.10. For those of us who don't use Kafka, can we have a Scala 2.10 bundle. If there isn't an official bundle arriving any time soon, can someone who has built it for Windows 8.1 successfully please share with the group? Thanks, arun - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark on Windows
We run Spark on Mac and Linux but also need to run it on Windows 8.1 and Windows Server. We ran into problems with the Scala 2.10 binary bundle for Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we are on Scala 2.11.6 (we built Spark from the sources). On Windows, however despite our best efforts we cannot get Spark 1.3.0 as built from sources working for Scala 2.11.6. Spark has too many moving parts and dependencies! When can we expect to see a binary bundle for Spark 1.3.0 that is built for Scala 2.11.6? I read somewhere that the only reason that Spark 1.3.0 is still built for Scala 2.10 is because Kafka is still on Scala 2.10. For those of us who don't use Kafka, can we have a Scala 2.10 bundle. If there isn't an official bundle arriving any time soon, can someone who has built it for Windows 8.1 successfully please share with the group? Thanks, arun
Re: Spark on Windows
You could build Spark with Scala 2.11 on Mac / Linux and transfer it over to Windows. AFAIK it should build on Windows too, the only problem is that Maven might take a long time to download dependencies. What errors are you seeing? Matei On Apr 16, 2015, at 9:23 AM, Arun Lists lists.a...@gmail.com wrote: We run Spark on Mac and Linux but also need to run it on Windows 8.1 and Windows Server. We ran into problems with the Scala 2.10 binary bundle for Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we are on Scala 2.11.6 (we built Spark from the sources). On Windows, however despite our best efforts we cannot get Spark 1.3.0 as built from sources working for Scala 2.11.6. Spark has too many moving parts and dependencies! When can we expect to see a binary bundle for Spark 1.3.0 that is built for Scala 2.11.6? I read somewhere that the only reason that Spark 1.3.0 is still built for Scala 2.10 is because Kafka is still on Scala 2.10. For those of us who don't use Kafka, can we have a Scala 2.10 bundle. If there isn't an official bundle arriving any time soon, can someone who has built it for Windows 8.1 successfully please share with the group? Thanks, arun - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark on Windows
Thanks, Matei! We'll try that and let you know if it works. You are correct in inferring that some of the problems we had were with dependencies. We also had problems with the spark-submit scripts. I will get the details from the engineer who worked on the Windows builds and provide them to you. arun On Thu, Apr 16, 2015 at 10:44 AM, Matei Zaharia matei.zaha...@gmail.com wrote: You could build Spark with Scala 2.11 on Mac / Linux and transfer it over to Windows. AFAIK it should build on Windows too, the only problem is that Maven might take a long time to download dependencies. What errors are you seeing? Matei On Apr 16, 2015, at 9:23 AM, Arun Lists lists.a...@gmail.com wrote: We run Spark on Mac and Linux but also need to run it on Windows 8.1 and Windows Server. We ran into problems with the Scala 2.10 binary bundle for Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we are on Scala 2.11.6 (we built Spark from the sources). On Windows, however despite our best efforts we cannot get Spark 1.3.0 as built from sources working for Scala 2.11.6. Spark has too many moving parts and dependencies! When can we expect to see a binary bundle for Spark 1.3.0 that is built for Scala 2.11.6? I read somewhere that the only reason that Spark 1.3.0 is still built for Scala 2.10 is because Kafka is still on Scala 2.10. For those of us who don't use Kafka, can we have a Scala 2.10 bundle. If there isn't an official bundle arriving any time soon, can someone who has built it for Windows 8.1 successfully please share with the group? Thanks, arun
Re: Spark on Windows
The hadoop support from HortonWorks only *actually *works with Windows Server - well at least as of Spark Summit last year : and AFAIK that has not changed since 2015-04-16 15:18 GMT-07:00 Dean Wampler deanwamp...@gmail.com: If you're running Hadoop, too, now that Hortonworks supports Spark, you might be able to use their distribution. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Thu, Apr 16, 2015 at 2:19 PM, Arun Lists lists.a...@gmail.com wrote: Thanks, Matei! We'll try that and let you know if it works. You are correct in inferring that some of the problems we had were with dependencies. We also had problems with the spark-submit scripts. I will get the details from the engineer who worked on the Windows builds and provide them to you. arun On Thu, Apr 16, 2015 at 10:44 AM, Matei Zaharia matei.zaha...@gmail.com wrote: You could build Spark with Scala 2.11 on Mac / Linux and transfer it over to Windows. AFAIK it should build on Windows too, the only problem is that Maven might take a long time to download dependencies. What errors are you seeing? Matei On Apr 16, 2015, at 9:23 AM, Arun Lists lists.a...@gmail.com wrote: We run Spark on Mac and Linux but also need to run it on Windows 8.1 and Windows Server. We ran into problems with the Scala 2.10 binary bundle for Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we are on Scala 2.11.6 (we built Spark from the sources). On Windows, however despite our best efforts we cannot get Spark 1.3.0 as built from sources working for Scala 2.11.6. Spark has too many moving parts and dependencies! When can we expect to see a binary bundle for Spark 1.3.0 that is built for Scala 2.11.6? I read somewhere that the only reason that Spark 1.3.0 is still built for Scala 2.10 is because Kafka is still on Scala 2.10. For those of us who don't use Kafka, can we have a Scala 2.10 bundle. If there isn't an official bundle arriving any time soon, can someone who has built it for Windows 8.1 successfully please share with the group? Thanks, arun
Re: Spark on Windows
If you're running Hadoop, too, now that Hortonworks supports Spark, you might be able to use their distribution. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Thu, Apr 16, 2015 at 2:19 PM, Arun Lists lists.a...@gmail.com wrote: Thanks, Matei! We'll try that and let you know if it works. You are correct in inferring that some of the problems we had were with dependencies. We also had problems with the spark-submit scripts. I will get the details from the engineer who worked on the Windows builds and provide them to you. arun On Thu, Apr 16, 2015 at 10:44 AM, Matei Zaharia matei.zaha...@gmail.com wrote: You could build Spark with Scala 2.11 on Mac / Linux and transfer it over to Windows. AFAIK it should build on Windows too, the only problem is that Maven might take a long time to download dependencies. What errors are you seeing? Matei On Apr 16, 2015, at 9:23 AM, Arun Lists lists.a...@gmail.com wrote: We run Spark on Mac and Linux but also need to run it on Windows 8.1 and Windows Server. We ran into problems with the Scala 2.10 binary bundle for Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we are on Scala 2.11.6 (we built Spark from the sources). On Windows, however despite our best efforts we cannot get Spark 1.3.0 as built from sources working for Scala 2.11.6. Spark has too many moving parts and dependencies! When can we expect to see a binary bundle for Spark 1.3.0 that is built for Scala 2.11.6? I read somewhere that the only reason that Spark 1.3.0 is still built for Scala 2.10 is because Kafka is still on Scala 2.10. For those of us who don't use Kafka, can we have a Scala 2.10 bundle. If there isn't an official bundle arriving any time soon, can someone who has built it for Windows 8.1 successfully please share with the group? Thanks, arun
Error when running Spark on Windows 8.1
Hi, We are trying to run a Spark application using spark-submit on Windows 8.1. The application runs successfully to completion on MacOS 10.10 and on Ubuntu Linux. On Windows, we get the following error messages (see below). It appears that Spark is trying to delete some temporary directory that it creates. How do we solve this problem? Thanks, arun 5/04/07 10:55:14 ERROR Utils: Exception while deleting Spark temp dir: C:\Users\JOSHMC~1\AppData\Local\Temp\spark-339bf2d9-8b89-46e9-b5c1-404caf9d3cd7\userFiles-62976ef7-ab56-41c0-a35b-793c7dca31c7 java.io.IOException: Failed to delete: C:\Users\JOSHMC~1\AppData\Local\Temp\spark-339bf2d9-8b89-46e9-b5c1-404caf9d3cd7\userFiles-62976ef7-ab56-41c0-a35b-793c7dca31c7 at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:932) at org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:181) at org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:179) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:179) at org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:177) at org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:177) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:177)
Building Spark on Windows WAS: Any IRC channel on Spark?
Have you tried with -X switch ? Thanks On Mar 17, 2015, at 1:47 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, Is there any instructions to build spark 1.3.0 on windows 7. I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but i got below errors [INFO] Spark Project Parent POM ... SUCCESS [ 7.845 s] [INFO] Spark Project Networking ... SUCCESS [ 26.209 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 9.701 s] [INFO] Spark Project Core . SUCCESS [04:29 min] [INFO] Spark Project Bagel SUCCESS [ 22.215 s] [INFO] Spark Project GraphX ... SUCCESS [ 59.676 s] [INFO] Spark Project Streaming SUCCESS [01:46 min] [INFO] Spark Project Catalyst . SUCCESS [01:40 min] [INFO] Spark Project SQL .. SUCCESS [03:05 min] [INFO] Spark Project ML Library ... FAILURE [03:49 min] [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project External Kafka ... SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 16:58 min [INFO] Finished at: 2015-03-17T11:04:40+03:00 [INFO] Final Memory: 77M/1840M [INFO] [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle exe p 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 On Tue, Mar 17, 2015 at 10:06 AM, Akhil Das ak...@sigmoidanalytics.com wrote: There's one on Freenode, You can join #Apache-Spark There's like 60 people idling. :) Thanks Best Regards On Mon, Mar 16, 2015 at 10:46 PM, Feng Lin lfliu.x...@gmail.com wrote: Hi, everyone, I'm wondering whether there is a possibility to setup an official IRC channel on freenode. I noticed that a lot of apache projects would have a such channel to let people talk directly. Best Michael
Re: Building Spark on Windows WAS: Any IRC channel on Spark?
Scalastyle violation(s). at org.scalastyle.maven.plugin.ScalastyleViolationCheckMojo.performCheck(ScalastyleViolationCheckMojo.java:230) ... 22 more [ERROR] [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 C:\Nawwar\Hadoop\spark\spark-1.3.0mvn -X -Phive -Phive-thriftserver -DskipTests clean package On Tue, Mar 17, 2015 at 12:14 PM, Ted Yu yuzhih...@gmail.com wrote: Have you tried with -X switch ? Thanks On Mar 17, 2015, at 1:47 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, Is there any instructions to build spark 1.3.0 on windows 7. I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but i got below errors [INFO] Spark Project Parent POM ... SUCCESS [ 7.845 s] [INFO] Spark Project Networking ... SUCCESS [ 26.209 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 9.701 s] [INFO] Spark Project Core . SUCCESS [04:29 min] [INFO] Spark Project Bagel SUCCESS [ 22.215 s] [INFO] Spark Project GraphX ... SUCCESS [ 59.676 s] [INFO] Spark Project Streaming SUCCESS [01:46 min] [INFO] Spark Project Catalyst . SUCCESS [01:40 min] [INFO] Spark Project SQL .. SUCCESS [03:05 min] [INFO] Spark Project ML Library ... FAILURE [03:49 min] [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project External Kafka ... SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 16:58 min [INFO] Finished at: 2015-03-17T11:04:40+03:00 [INFO] Final Memory: 77M/1840M [INFO] [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle exe p 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 On Tue, Mar 17, 2015 at 10:06 AM, Akhil Das ak...@sigmoidanalytics.com wrote: There's one on Freenode, You can join #Apache-Spark There's like 60 people idling. :) Thanks Best Regards On Mon, Mar 16, 2015 at 10:46 PM, Feng Lin lfliu.x...@gmail.com wrote: Hi, everyone, I'm wondering whether there is a possibility to setup an official IRC channel on freenode. I noticed that a lot of apache projects would have a such channel to let people talk directly. Best Michael
Re: Building Spark on Windows WAS: Any IRC channel on Spark?
) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 19 more Caused by: org.apache.maven.plugin.MojoFailureException: You have 1 Scalastyle violation(s). at org.scalastyle.maven.plugin.ScalastyleViolationCheckMojo.performCheck(ScalastyleViolationCheckMojo.java:230) ... 22 more [ERROR] [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 C:\Nawwar\Hadoop\spark\spark-1.3.0mvn -X -Phive -Phive-thriftserver -DskipTests clean package On Tue, Mar 17, 2015 at 12:14 PM, Ted Yu yuzhih...@gmail.com wrote: Have you tried with -X switch ? Thanks On Mar 17, 2015, at 1:47 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, Is there any instructions to build spark 1.3.0 on windows 7. I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but i got below errors [INFO] Spark Project Parent POM ... SUCCESS [ 7.845 s] [INFO] Spark Project Networking ... SUCCESS [ 26.209 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 9.701 s] [INFO] Spark Project Core . SUCCESS [04:29 min] [INFO] Spark Project Bagel SUCCESS [ 22.215 s] [INFO] Spark Project GraphX ... SUCCESS [ 59.676 s] [INFO] Spark Project Streaming SUCCESS [01:46 min] [INFO] Spark Project Catalyst . SUCCESS [01:40 min] [INFO] Spark Project SQL .. SUCCESS [03:05 min] [INFO] Spark Project ML Library ... FAILURE [03:49 min] [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project External Kafka ... SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 16:58 min [INFO] Finished at: 2015-03-17T11:04:40+03:00 [INFO] Final Memory: 77M/1840M [INFO] [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle exe p 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 On Tue, Mar 17, 2015 at 10:06 AM, Akhil Das ak...@sigmoidanalytics.com wrote: There's one on Freenode, You can join #Apache-Spark There's like 60 people idling. :) Thanks Best Regards On Mon, Mar 16, 2015 at 10:46 PM, Feng Lin lfliu.x...@gmail.com wrote: Hi, everyone, I'm wondering whether there is a possibility to setup an official IRC channel on freenode. I noticed that a lot of apache projects would have a such channel to let people talk directly. Best Michael
Re: Building Spark on Windows WAS: Any IRC channel on Spark?
(Launcher.java:356) Caused by: org.apache.maven.plugin.MojoExecutionException: Failed during scalastyle execution at org.scalastyle.maven.plugin.ScalastyleViolationCheckMojo.performCheck(ScalastyleViolationCheckMojo.java:238) at org.scalastyle.maven.plugin.ScalastyleViolationCheckMojo.execute(ScalastyleViolationCheckMojo.java:199) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 19 more Caused by: org.apache.maven.plugin.MojoFailureException: You have 1 Scalastyle violation(s). at org.scalastyle.maven.plugin.ScalastyleViolationCheckMojo.performCheck(ScalastyleViolationCheckMojo.java:230) ... 22 more [ERROR] [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 C:\Nawwar\Hadoop\spark\spark-1.3.0mvn -X -Phive -Phive-thriftserver -DskipTests clean package On Tue, Mar 17, 2015 at 12:14 PM, Ted Yu yuzhih...@gmail.com wrote: Have you tried with -X switch ? Thanks On Mar 17, 2015, at 1:47 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, Is there any instructions to build spark 1.3.0 on windows 7. I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but i got below errors [INFO] Spark Project Parent POM ... SUCCESS [ 7.845 s] [INFO] Spark Project Networking ... SUCCESS [ 26.209 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 9.701 s] [INFO] Spark Project Core . SUCCESS [04:29 min] [INFO] Spark Project Bagel SUCCESS [ 22.215 s] [INFO] Spark Project GraphX ... SUCCESS [ 59.676 s] [INFO] Spark Project Streaming SUCCESS [01:46 min] [INFO] Spark Project Catalyst . SUCCESS [01:40 min] [INFO] Spark Project SQL .. SUCCESS [03:05 min] [INFO] Spark Project ML Library ... FAILURE [03:49 min] [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project External Kafka ... SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 16:58 min [INFO] Finished at: 2015-03-17T11:04:40+03:00 [INFO] Final Memory: 77M/1840M [INFO] [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle exe p 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 On Tue, Mar 17, 2015 at 10:06 AM, Akhil Das ak...@sigmoidanalytics.com wrote: There's one on Freenode, You can join #Apache-Spark There's like 60 people idling. :) Thanks Best Regards On Mon, Mar 16, 2015 at 10:46 PM, Feng Lin lfliu.x...@gmail.com wrote: Hi, everyone, I'm wondering whether there is a possibility to setup an official IRC channel on freenode. I noticed that a lot of apache projects would have a such channel to let people talk directly. Best Michael
Re: can not submit job to spark in windows
-56b32155-2779-4345-9597-2bfa6a87a51d\pi.py Traceback (most recent call last): File C:/spark-1.2.1-bin-hadoop2.4/bin/pi.py, line 29, in module sc = SparkContext(appName=PythonPi) File C:\spark-1.2.1-bin-hadoop2.4\python\pyspark\context.py, line 105, in __ init__ conf, jsc) File C:\spark-1.2.1-bin-hadoop2.4\python\pyspark\context.py, line 153, in _d o_init self._jsc = jsc or self._initialize_context(self._conf._jconf) File C:\spark-1.2.1-bin-hadoop2.4\python\pyspark\context.py, line 202, in _i nitialize_context return self._jvm.JavaSparkContext(jconf) File C:\spark-1.2.1-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\java_g ateway.py, line 701, in __call__ File C:\spark-1.2.1-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\protoc ol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spa rk.api.java.JavaSparkContext. : java.lang.NullPointerException at java.lang.ProcessBuilder.start(Unknown Source) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java: 650) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:445) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1004) at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:28 8) at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:28 8) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.SparkContext.init(SparkContext.scala:288) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.sc ala:61) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Sou rce) at java.lang.reflect.Constructor.newInstance(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand .java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Unknown Source) What is wrong on my side? Should I run some scripts before spark-submit.cmd? Regards, Sergey. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/can-not-submit-job-to-spark-in-windows-tp21824.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com *Arush Kharbanda* || Technical Teamlead ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
can not submit job to spark in windows
-bin-hadoop2.4\python\pyspark\context.py, line 153, in _d o_init self._jsc = jsc or self._initialize_context(self._conf._jconf) File C:\spark-1.2.1-bin-hadoop2.4\python\pyspark\context.py, line 202, in _i nitialize_context return self._jvm.JavaSparkContext(jconf) File C:\spark-1.2.1-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\java_g ateway.py, line 701, in __call__ File C:\spark-1.2.1-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\protoc ol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spa rk.api.java.JavaSparkContext. : java.lang.NullPointerException at java.lang.ProcessBuilder.start(Unknown Source) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java: 650) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:445) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1004) at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:28 8) at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:28 8) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.SparkContext.init(SparkContext.scala:288) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.sc ala:61) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Sou rce) at java.lang.reflect.Constructor.newInstance(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand .java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Unknown Source) What is wrong on my side? Should I run some scripts before spark-submit.cmd? Regards, Sergey. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/can-not-submit-job-to-spark-in-windows-tp21824.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Stand-alone Spark on windows
Hi! I downloaded Spark binaries unpacked and could successfully run pyspark shell and write and execute some code here BUT I failed with submitting stand-alone python scripts or jar files via spark-submit: spark-submit pi.py I always get exception stack trace with NullPointerException in java.lang.ProcessBuilder.start(). What could be wrong? Should I start some scripts before spark-submit? I have windows 7 and spark 1.2.1 Sergey. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Spark on Windows 2008 R2 serv er does not work
I solved this problem following this article http://qnalist.com/questions/4994960/run-spark-unit-test-on-windows-7 1) download compiled winutils.exe from http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\) It solved my original problem. But then I got a new error lang.NullPointerException org.apache.hadoop.util.Shell.runCommand at java.lang.ProcessBuilder.start(Unknown Source) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:411) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDep endencies$6.apply(Executor.scala:350) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDep endencies$6.apply(Executor.scala:347) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scal a:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies (Executor.scala:347) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Very frustrated. Does anybody successfully get spark running on Windows? Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 -Original Message- From: Marcelo Vanzin [mailto:van...@cloudera.com] Sent: Wednesday, January 28, 2015 5:15 PM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: Spark on Windows 2008 R2 serv er does not work https://issues.apache.org/jira/browse/SPARK-2356 Take a look through the comments, there are some workarounds listed there. On Wed, Jan 28, 2015 at 1:40 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: Has anybody successfully install and run spark-1.2.0 on windows 2008 R2 or windows 7? How do you get that works? Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: Wang, Ningjun (LNG-NPV) [mailto:ningjun.w...@lexisnexis.com] Sent: Tuesday, January 27, 2015 10:28 PM To: user@spark.apache.org Subject: Spark on Windows 2008 R2 serv er does not work I download and install spark-1.2.0-bin-hadoop2.4.tgz pre-built version on Windows 2008 R2 server. When I submit a job using spark-submit, I got the following error WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform ... using builtin-java classes where applicable ERROR org.apache.hadoop.util.Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups .java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupIn formation.java:255) Please advise. Thanks. Ningjun -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Spark on Windows 2008 R2 serv er does not work
Has anybody successfully install and run spark-1.2.0 on windows 2008 R2 or windows 7? How do you get that works? Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: Wang, Ningjun (LNG-NPV) [mailto:ningjun.w...@lexisnexis.com] Sent: Tuesday, January 27, 2015 10:28 PM To: user@spark.apache.org Subject: Spark on Windows 2008 R2 serv er does not work I download and install spark-1.2.0-bin-hadoop2.4.tgz pre-built version on Windows 2008 R2 server. When I submit a job using spark-submit, I got the following error WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform ... using builtin-java classes where applicable ERROR org.apache.hadoop.util.Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) Please advise. Thanks. Ningjun
Re: Spark on Windows 2008 R2 serv er does not work
https://issues.apache.org/jira/browse/SPARK-2356 Take a look through the comments, there are some workarounds listed there. On Wed, Jan 28, 2015 at 1:40 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: Has anybody successfully install and run spark-1.2.0 on windows 2008 R2 or windows 7? How do you get that works? Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: Wang, Ningjun (LNG-NPV) [mailto:ningjun.w...@lexisnexis.com] Sent: Tuesday, January 27, 2015 10:28 PM To: user@spark.apache.org Subject: Spark on Windows 2008 R2 serv er does not work I download and install spark-1.2.0-bin-hadoop2.4.tgz pre-built version on Windows 2008 R2 server. When I submit a job using spark-submit, I got the following error WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform ... using builtin-java classes where applicable ERROR org.apache.hadoop.util.Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) Please advise. Thanks. Ningjun -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark on Windows 2008 R2 serv er does not work
I download and install spark-1.2.0-bin-hadoop2.4.tgz pre-built version on Windows 2008 R2 server. When I submit a job using spark-submit, I got the following error WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform ... using builtin-java classes where applicable ERROR org.apache.hadoop.util.Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) Please advise. Thanks. Ningjun