Re: No module named pyspark - latest built
I see. The general known constraints on building your assembly jar for pyspark on Yarn are: Java 6 NOT RedHat Maven Some of these are documented here <http://spark.apache.org/docs/latest/building-with-maven.html> (bottom). Maybe we should make it more explicit. 2014-11-13 2:31 GMT-08:00 jamborta : > it was built with 1.6 (tried 1.7, too) > > On Thu, Nov 13, 2014 at 2:52 AM, Andrew Or-2 [via Apache Spark User > List] <[hidden email] <http://user/SendEmail.jtp?type=node&node=18833&i=0>> > wrote: > > > Hey Jamborta, > > > > What java version did you build the jar with? > > > > 2014-11-12 16:48 GMT-08:00 jamborta <[hidden email]>: > >> > >> I have figured out that building the fat jar with sbt does not seem to > >> included the pyspark scripts using the following command: > >> > >> sbt/sbt -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive clean > >> publish-local assembly > >> > >> however the maven command works OK: > >> > >> mvn -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive -DskipTests > >> clean package > >> > >> am I running the correct sbt command? > >> > >> > >> > >> -- > >> View this message in context: > >> > http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18787.html > >> Sent from the Apache Spark User List mailing list archive at > Nabble.com. > >> > >> - > >> To unsubscribe, e-mail: [hidden email] > >> For additional commands, e-mail: [hidden email] > >> > > > > > > > > > > If you reply to this email, your message will be added to the discussion > > below: > > > http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18797.html > > To unsubscribe from No module named pyspark - latest built, click here. > > NAML > > -- > View this message in context: Re: No module named pyspark - latest built > <http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18833.html> > > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >
Re: No module named pyspark - latest built
it was built with 1.6 (tried 1.7, too) On Thu, Nov 13, 2014 at 2:52 AM, Andrew Or-2 [via Apache Spark User List] wrote: > Hey Jamborta, > > What java version did you build the jar with? > > 2014-11-12 16:48 GMT-08:00 jamborta <[hidden email]>: >> >> I have figured out that building the fat jar with sbt does not seem to >> included the pyspark scripts using the following command: >> >> sbt/sbt -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive clean >> publish-local assembly >> >> however the maven command works OK: >> >> mvn -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive -DskipTests >> clean package >> >> am I running the correct sbt command? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18787.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: [hidden email] >> For additional commands, e-mail: [hidden email] >> > > > > ____________ > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18797.html > To unsubscribe from No module named pyspark - latest built, click here. > NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18833.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: No module named pyspark - latest built
Hey Jamborta, What java version did you build the jar with? 2014-11-12 16:48 GMT-08:00 jamborta : > I have figured out that building the fat jar with sbt does not seem to > included the pyspark scripts using the following command: > > sbt/sbt -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive clean > publish-local assembly > > however the maven command works OK: > > mvn -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive -DskipTests > clean package > > am I running the correct sbt command? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18787.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: No module named pyspark - latest built
You need to use maven to include python files. See https://github.com/apache/spark/pull/1223 . -Xiangrui On Wed, Nov 12, 2014 at 4:48 PM, jamborta wrote: > I have figured out that building the fat jar with sbt does not seem to > included the pyspark scripts using the following command: > > sbt/sbt -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive clean > publish-local assembly > > however the maven command works OK: > > mvn -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive -DskipTests > clean package > > am I running the correct sbt command? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18787.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: No module named pyspark - latest built
Thanks. Will it work with sbt at some point? On Thu, 13 Nov 2014 01:03 Xiangrui Meng wrote: > You need to use maven to include python files. See > https://github.com/apache/spark/pull/1223 . -Xiangrui > > On Wed, Nov 12, 2014 at 4:48 PM, jamborta wrote: > > I have figured out that building the fat jar with sbt does not seem to > > included the pyspark scripts using the following command: > > > > sbt/sbt -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive clean > > publish-local assembly > > > > however the maven command works OK: > > > > mvn -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive -DskipTests > > clean package > > > > am I running the correct sbt command? > > > > > > > > -- > > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/No-module-named-pyspark-latest- > built-tp18740p18787.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > >
Re: No module named pyspark - latest built
I have figured out that building the fat jar with sbt does not seem to included the pyspark scripts using the following command: sbt/sbt -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive clean publish-local assembly however the maven command works OK: mvn -Pdeb -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive -DskipTests clean package am I running the correct sbt command? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18787.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: No module named pyspark - latest built
forgot to mention, that this setup works in spark standalone mode, only problem when I run on yarn. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740p18777.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
No module named pyspark - latest built
Hi all, I am trying to run spark with the latest build (from branch-1.2), as far as I can see, all the paths are set and SparkContext starts up OK, however, I cannot run anything that goes to the nodes. I get the following error: Error from python worker: /usr/bin/python2.7: No module named pyspark PYTHONPATH was: /mnt/yarn/nm/usercache/massive/filecache/15/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0.jar java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:102) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) any idea where it is picking up this path from? thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-module-named-pyspark-latest-built-tp18740.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org