Re: SparkContext SyntaxError: invalid syntax

Bryan Cutler Wed, 13 Jan 2016 16:14:12 -0800

Hi Andrew,

There are a couple of things to check.  First, is Python 2.7 the default
version on all nodes in the cluster or is it an alternate install? Meaning
what is the output of this command "$>  python --version"  If it is an
alternate install, you could set the environment variable "PYSPARK_PYTHON"
Python binary executable to use for PySpark in both driver and workers
(default is python).


Did you try to submit the Python example under client mode?  Otherwise, the
command looks fine, you don't use the --class option for submitting python
files
* ./bin/spark-submit      --master yarn     --deploy-mode client
--driver-memory 4g     --executor-memory 2g     --executor-cores 1
./examples/src/main/python/pi.py     10*

That is a good sign that local jobs and Java examples work, probably just a
small configuration issue :)

Bryan

On Wed, Jan 13, 2016 at 3:51 PM, Andrew Weiner <
andrewweiner2...@u.northwestern.edu> wrote:

> Thanks for your continuing help.  Here is some additional info.
>
> *OS/architecture*
> output of *cat /proc/version*:
> Linux version 2.6.18-400.1.1.el5 (mockbu...@x86-012.build.bos.redhat.com)
>
> output of *lsb_release -a*:
> LSB Version:
>  
> :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
> Distributor ID: RedHatEnterpriseServer
> Description:    Red Hat Enterprise Linux Server release 5.11 (Tikanga)
> Release:        5.11
> Codename:       Tikanga
>
> *Running a local job*
> I have confirmed that I can successfully run python jobs using
> bin/spark-submit --master local[*]
> Specifically, this is the command I am using:
> *./bin/spark-submit --master local[8]
> ./examples/src/main/python/wordcount.py
> file:/home/<username>/spark-1.6.0-bin-hadoop2.4/README.md*
> And it works!
>
> *Additional info*
> I am also able to successfully run the Java SparkPi example using yarn in
> cluster mode using this command:
> * ./bin/spark-submit --class org.apache.spark.examples.SparkPi
> --master yarn     --deploy-mode cluster     --driver-memory 4g
> --executor-memory 2g     --executor-cores 1     lib/spark-examples*.jar
> 10*
> This Java job also runs successfully when I change --deploy-mode to
> client.  The fact that I can run Java jobs in cluster mode makes me thing
> that everything is installed correctly--is that a valid assumption?
>
> The problem remains that I cannot submit python jobs.  Here is the command
> that I am using to try to submit python jobs:
> * ./bin/spark-submit      --master yarn     --deploy-mode cluster
> --driver-memory 4g     --executor-memory 2g     --executor-cores 1
> ./examples/src/main/python/pi.py     10*
> Does that look like a correct command?  I wasn't sure what to put for
> --class so I omitted it.  At any rate, the result of the above command is a
> syntax error, similar to the one I posted in the original email:
>
> Traceback (most recent call last):
>   File "pi.py", line 24, in ?
>     from pyspark import SparkContext
>   File 
> "/home/<username>/spark-1.6.0-bin-hadoop2.4/python/pyspark/__init__.py", line 
> 61
>     indent = ' ' * (min(len(m) for m in indents) if indents else 0)
>                                                   ^
> SyntaxError: invalid syntax
>
>
> This really looks to me like a problem with the python version.  Python
> 2.4 would throw this syntax error but Python 2.7 would not.  And yet I am
> using Python 2.7.8.  Is there any chance that Spark or Yarn is somehow
> using an older version of Python without my knowledge?
>
> Finally, when I try to run the same command in client mode...
> * ./bin/spark-submit      --master yarn     --deploy-mode client
> --driver-memory 4g     --executor-memory 2g     --executor-cores 1
> ./examples/src/main/python/pi.py 10*
> I get the error I mentioned in the prior email:
> Error from python worker:
>   python: module pyspark.daemon not found
>
> Any thoughts?
>
> Best,
> Andrew
>
>
> On Mon, Jan 11, 2016 at 12:25 PM, Bryan Cutler <cutl...@gmail.com> wrote:
>
>> This could be an environment issue, could you give more details about the
>> OS/architecture that you are using?  If you are sure everything is
>> installed correctly on each node following the guide on "Running Spark on
>> Yarn" http://spark.apache.org/docs/latest/running-on-yarn.html and that
>> the spark assembly jar is reachable, then I would check to see if you can
>> submit a local job to just run on one node.
>>
>> On Fri, Jan 8, 2016 at 5:22 PM, Andrew Weiner <
>> andrewweiner2...@u.northwestern.edu> wrote:
>>
>>> Now for simplicity I'm testing with wordcount.py from the provided
>>> examples, and using Spark 1.6.0
>>>
>>> The first error I get is:
>>>
>>> 16/01/08 19:14:46 ERROR lzo.GPLNativeCodeLoader: Could not load native
>>> gpl library
>>> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
>>>         at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
>>>         at [....]
>>>
>>> A bit lower down, I see this error:
>>>
>>> 16/01/08 19:14:48 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
>>> 0.0 (TID 0, mundonovo-priv): org.apache.spark.SparkException:
>>> Error from python worker:
>>>   python: module pyspark.daemon not found
>>> PYTHONPATH was:
>>>
>>> /scratch5/hadoop/yarn/local/usercache/<username>/filecache/22/spark-assembly-1.6.0-hadoop2.4.0.jar:/home/jpr123/hg.pacific/python-common:/home/jpr123/python-libs:/home/jpr123/lib/python2.7/site-packages:/home/zsb739/local/lib/python2.7/site-packages:/home/jpr123/mobile-cdn-analysis:/home/<username>/lib/python2.7/site-packages:/scratch4/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0136/container_1450370639491_0136_01_000002/pyspark.zip:/scratch4/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0136/container_1450370639491_0136_01_000002/py4j-0.9-src.zip
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>>         at [....]
>>>
>>> And then a few more similar pyspark.daemon not found errors...
>>>
>>> Andrew
>>>
>>>
>>>
>>> On Fri, Jan 8, 2016 at 2:31 PM, Bryan Cutler <cutl...@gmail.com> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> I know that older versions of Spark could not run PySpark on YARN in
>>>> cluster mode.  I'm not sure if that is fixed in 1.6.0 though.  Can you try
>>>> setting deploy-mode option to "client" when calling spark-submit?
>>>>
>>>> Bryan
>>>>
>>>> On Thu, Jan 7, 2016 at 2:39 PM, weineran <
>>>> andrewweiner2...@u.northwestern.edu> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> When I try to submit a python job using spark-submit (using --master
>>>>> yarn
>>>>> --deploy-mode cluster), I get the following error:
>>>>>
>>>>> /Traceback (most recent call last):
>>>>>   File "loss_rate_by_probe.py", line 15, in ?
>>>>>     from pyspark import SparkContext
>>>>>   File
>>>>>
>>>>> "/scratch5/hadoop/yarn/local/usercache/<username>/filecache/18/spark-assembly-1.3.1-hadoop2.4.0.jar/pyspark/__init__.py",
>>>>> line 41, in ?
>>>>>   File
>>>>>
>>>>> "/scratch5/hadoop/yarn/local/usercache/<username>/filecache/18/spark-assembly-1.3.1-hadoop2.4.0.jar/pyspark/context.py",
>>>>> line 219
>>>>>     with SparkContext._lock:
>>>>>                     ^
>>>>> SyntaxError: invalid syntax/
>>>>>
>>>>> This is very similar to  this post from 2014
>>>>> <
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-lock-Error-td18233.html
>>>>> >
>>>>> , but unlike that person I am using Python 2.7.8.
>>>>>
>>>>> Here is what I'm using:
>>>>> Spark 1.3.1
>>>>> Hadoop 2.4.0.2.1.5.0-695
>>>>> Python 2.7.8
>>>>>
>>>>> Another clue:  I also installed Spark 1.6.0 and tried to submit the
>>>>> same
>>>>> job.  I got a similar error:
>>>>>
>>>>> /Traceback (most recent call last):
>>>>>   File "loss_rate_by_probe.py", line 15, in ?
>>>>>     from pyspark import SparkContext
>>>>>   File
>>>>>
>>>>> "/scratch5/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0119/container_1450370639491_0119_01_000001/pyspark.zip/pyspark/__init__.py",
>>>>> line 61
>>>>>     indent = ' ' * (min(len(m) for m in indents) if indents else 0)
>>>>>                                                   ^
>>>>> SyntaxError: invalid syntax/
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Andrew
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-SyntaxError-invalid-syntax-tp25910.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: SparkContext SyntaxError: invalid syntax

Reply via email to