Now for simplicity I'm testing with wordcount.py from the provided examples, and using Spark 1.6.0
The first error I get is: 16/01/08 19:14:46 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864) at [....] A bit lower down, I see this error: 16/01/08 19:14:48 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, mundonovo-priv): org.apache.spark.SparkException: Error from python worker: python: module pyspark.daemon not found PYTHONPATH was: /scratch5/hadoop/yarn/local/usercache/awp066/filecache/22/spark-assembly-1.6.0-hadoop2.4.0.jar:/home/jpr123/hg.pacific/python-common:/home/jpr123/python-libs:/home/jpr123/lib/python2.7/site-packages:/home/zsb739/local/lib/python2.7/site-packages:/home/jpr123/mobile-cdn-analysis:/home/awp066/lib/python2.7/site-packages:/scratch4/hadoop/yarn/local/usercache/awp066/appcache/application_1450370639491_0136/container_1450370639491_0136_01_000002/pyspark.zip:/scratch4/hadoop/yarn/local/usercache/awp066/appcache/application_1450370639491_0136/container_1450370639491_0136_01_000002/py4j-0.9-src.zip java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at [....] And then a few more similar pyspark.daemon not found errors... Andrew On Fri, Jan 8, 2016 at 2:31 PM, Bryan Cutler <cutl...@gmail.com> wrote: > Hi Andrew, > > I know that older versions of Spark could not run PySpark on YARN in > cluster mode. I'm not sure if that is fixed in 1.6.0 though. Can you try > setting deploy-mode option to "client" when calling spark-submit? > > Bryan > > On Thu, Jan 7, 2016 at 2:39 PM, weineran < > andrewweiner2...@u.northwestern.edu> wrote: > >> Hello, >> >> When I try to submit a python job using spark-submit (using --master yarn >> --deploy-mode cluster), I get the following error: >> >> /Traceback (most recent call last): >> File "loss_rate_by_probe.py", line 15, in ? >> from pyspark import SparkContext >> File >> >> "/scratch5/hadoop/yarn/local/usercache/<username>/filecache/18/spark-assembly-1.3.1-hadoop2.4.0.jar/pyspark/__init__.py", >> line 41, in ? >> File >> >> "/scratch5/hadoop/yarn/local/usercache/<username>/filecache/18/spark-assembly-1.3.1-hadoop2.4.0.jar/pyspark/context.py", >> line 219 >> with SparkContext._lock: >> ^ >> SyntaxError: invalid syntax/ >> >> This is very similar to this post from 2014 >> < >> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-lock-Error-td18233.html >> > >> , but unlike that person I am using Python 2.7.8. >> >> Here is what I'm using: >> Spark 1.3.1 >> Hadoop 2.4.0.2.1.5.0-695 >> Python 2.7.8 >> >> Another clue: I also installed Spark 1.6.0 and tried to submit the same >> job. I got a similar error: >> >> /Traceback (most recent call last): >> File "loss_rate_by_probe.py", line 15, in ? >> from pyspark import SparkContext >> File >> >> "/scratch5/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0119/container_1450370639491_0119_01_000001/pyspark.zip/pyspark/__init__.py", >> line 61 >> indent = ' ' * (min(len(m) for m in indents) if indents else 0) >> ^ >> SyntaxError: invalid syntax/ >> >> Any thoughts? >> >> Andrew >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-SyntaxError-invalid-syntax-tp25910.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >