Thanks for your continuing help. Here is some additional info. *OS/architecture* output of *cat /proc/version*: Linux version 2.6.18-400.1.1.el5 (mockbu...@x86-012.build.bos.redhat.com)
output of *lsb_release -a*: LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 5.11 (Tikanga) Release: 5.11 Codename: Tikanga *Running a local job* I have confirmed that I can successfully run python jobs using bin/spark-submit --master local[*] Specifically, this is the command I am using: *./bin/spark-submit --master local[8] ./examples/src/main/python/wordcount.py file:/home/<username>/spark-1.6.0-bin-hadoop2.4/README.md* And it works! *Additional info* I am also able to successfully run the Java SparkPi example using yarn in cluster mode using this command: * ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 lib/spark-examples*.jar 10* This Java job also runs successfully when I change --deploy-mode to client. The fact that I can run Java jobs in cluster mode makes me thing that everything is installed correctly--is that a valid assumption? The problem remains that I cannot submit python jobs. Here is the command that I am using to try to submit python jobs: * ./bin/spark-submit --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 ./examples/src/main/python/pi.py 10* Does that look like a correct command? I wasn't sure what to put for --class so I omitted it. At any rate, the result of the above command is a syntax error, similar to the one I posted in the original email: Traceback (most recent call last): File "pi.py", line 24, in ? from pyspark import SparkContext File "/home/<username>/spark-1.6.0-bin-hadoop2.4/python/pyspark/__init__.py", line 61 indent = ' ' * (min(len(m) for m in indents) if indents else 0) ^ SyntaxError: invalid syntax This really looks to me like a problem with the python version. Python 2.4 would throw this syntax error but Python 2.7 would not. And yet I am using Python 2.7.8. Is there any chance that Spark or Yarn is somehow using an older version of Python without my knowledge? Finally, when I try to run the same command in client mode... * ./bin/spark-submit --master yarn --deploy-mode client --driver-memory 4g --executor-memory 2g --executor-cores 1 ./examples/src/main/python/pi.py 10* I get the error I mentioned in the prior email: Error from python worker: python: module pyspark.daemon not found Any thoughts? Best, Andrew On Mon, Jan 11, 2016 at 12:25 PM, Bryan Cutler <cutl...@gmail.com> wrote: > This could be an environment issue, could you give more details about the > OS/architecture that you are using? If you are sure everything is > installed correctly on each node following the guide on "Running Spark on > Yarn" http://spark.apache.org/docs/latest/running-on-yarn.html and that > the spark assembly jar is reachable, then I would check to see if you can > submit a local job to just run on one node. > > On Fri, Jan 8, 2016 at 5:22 PM, Andrew Weiner < > andrewweiner2...@u.northwestern.edu> wrote: > >> Now for simplicity I'm testing with wordcount.py from the provided >> examples, and using Spark 1.6.0 >> >> The first error I get is: >> >> 16/01/08 19:14:46 ERROR lzo.GPLNativeCodeLoader: Could not load native >> gpl library >> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path >> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864) >> at [....] >> >> A bit lower down, I see this error: >> >> 16/01/08 19:14:48 WARN scheduler.TaskSetManager: Lost task 0.0 in stage >> 0.0 (TID 0, mundonovo-priv): org.apache.spark.SparkException: >> Error from python worker: >> python: module pyspark.daemon not found >> PYTHONPATH was: >> >> /scratch5/hadoop/yarn/local/usercache/<username>/filecache/22/spark-assembly-1.6.0-hadoop2.4.0.jar:/home/jpr123/hg.pacific/python-common:/home/jpr123/python-libs:/home/jpr123/lib/python2.7/site-packages:/home/zsb739/local/lib/python2.7/site-packages:/home/jpr123/mobile-cdn-analysis:/home/<username>/lib/python2.7/site-packages:/scratch4/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0136/container_1450370639491_0136_01_000002/pyspark.zip:/scratch4/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0136/container_1450370639491_0136_01_000002/py4j-0.9-src.zip >> java.io.EOFException >> at java.io.DataInputStream.readInt(DataInputStream.java:392) >> at [....] >> >> And then a few more similar pyspark.daemon not found errors... >> >> Andrew >> >> >> >> On Fri, Jan 8, 2016 at 2:31 PM, Bryan Cutler <cutl...@gmail.com> wrote: >> >>> Hi Andrew, >>> >>> I know that older versions of Spark could not run PySpark on YARN in >>> cluster mode. I'm not sure if that is fixed in 1.6.0 though. Can you try >>> setting deploy-mode option to "client" when calling spark-submit? >>> >>> Bryan >>> >>> On Thu, Jan 7, 2016 at 2:39 PM, weineran < >>> andrewweiner2...@u.northwestern.edu> wrote: >>> >>>> Hello, >>>> >>>> When I try to submit a python job using spark-submit (using --master >>>> yarn >>>> --deploy-mode cluster), I get the following error: >>>> >>>> /Traceback (most recent call last): >>>> File "loss_rate_by_probe.py", line 15, in ? >>>> from pyspark import SparkContext >>>> File >>>> >>>> "/scratch5/hadoop/yarn/local/usercache/<username>/filecache/18/spark-assembly-1.3.1-hadoop2.4.0.jar/pyspark/__init__.py", >>>> line 41, in ? >>>> File >>>> >>>> "/scratch5/hadoop/yarn/local/usercache/<username>/filecache/18/spark-assembly-1.3.1-hadoop2.4.0.jar/pyspark/context.py", >>>> line 219 >>>> with SparkContext._lock: >>>> ^ >>>> SyntaxError: invalid syntax/ >>>> >>>> This is very similar to this post from 2014 >>>> < >>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-lock-Error-td18233.html >>>> > >>>> , but unlike that person I am using Python 2.7.8. >>>> >>>> Here is what I'm using: >>>> Spark 1.3.1 >>>> Hadoop 2.4.0.2.1.5.0-695 >>>> Python 2.7.8 >>>> >>>> Another clue: I also installed Spark 1.6.0 and tried to submit the same >>>> job. I got a similar error: >>>> >>>> /Traceback (most recent call last): >>>> File "loss_rate_by_probe.py", line 15, in ? >>>> from pyspark import SparkContext >>>> File >>>> >>>> "/scratch5/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0119/container_1450370639491_0119_01_000001/pyspark.zip/pyspark/__init__.py", >>>> line 61 >>>> indent = ' ' * (min(len(m) for m in indents) if indents else 0) >>>> ^ >>>> SyntaxError: invalid syntax/ >>>> >>>> Any thoughts? >>>> >>>> Andrew >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-SyntaxError-invalid-syntax-tp25910.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >