Re: SparkContext SyntaxError: invalid syntax

Andrew Weiner Wed, 13 Jan 2016 15:52:02 -0800

Thanks for your continuing help.  Here is some additional info.

*OS/architecture*
output of *cat /proc/version*:
Linux version 2.6.18-400.1.1.el5 (mockbu...@x86-012.build.bos.redhat.com)


output of *lsb_release -a*:
LSB Version:
 
:core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 5.11 (Tikanga)
Release:        5.11
Codename:       Tikanga

*Running a local job*
I have confirmed that I can successfully run python jobs using
bin/spark-submit --master local[*]
Specifically, this is the command I am using:
*./bin/spark-submit --master local[8]
./examples/src/main/python/wordcount.py
file:/home/<username>/spark-1.6.0-bin-hadoop2.4/README.md*
And it works!

*Additional info*
I am also able to successfully run the Java SparkPi example using yarn in
cluster mode using this command:
* ./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master
yarn     --deploy-mode cluster     --driver-memory 4g     --executor-memory
2g     --executor-cores 1     lib/spark-examples*.jar     10*
This Java job also runs successfully when I change --deploy-mode to
client.  The fact that I can run Java jobs in cluster mode makes me thing
that everything is installed correctly--is that a valid assumption?

The problem remains that I cannot submit python jobs.  Here is the command
that I am using to try to submit python jobs:
* ./bin/spark-submit      --master yarn     --deploy-mode cluster
--driver-memory 4g     --executor-memory 2g     --executor-cores 1
./examples/src/main/python/pi.py     10*
Does that look like a correct command?  I wasn't sure what to put for
--class so I omitted it.  At any rate, the result of the above command is a
syntax error, similar to the one I posted in the original email:

Traceback (most recent call last):
  File "pi.py", line 24, in ?
    from pyspark import SparkContext
  File "/home/<username>/spark-1.6.0-bin-hadoop2.4/python/pyspark/__init__.py",
line 61
    indent = ' ' * (min(len(m) for m in indents) if indents else 0)
                                                  ^
SyntaxError: invalid syntax


This really looks to me like a problem with the python version.  Python 2.4
would throw this syntax error but Python 2.7 would not.  And yet I am using
Python 2.7.8.  Is there any chance that Spark or Yarn is somehow using an
older version of Python without my knowledge?

Finally, when I try to run the same command in client mode...
* ./bin/spark-submit      --master yarn     --deploy-mode client
--driver-memory 4g     --executor-memory 2g     --executor-cores 1
./examples/src/main/python/pi.py 10*
I get the error I mentioned in the prior email:
Error from python worker:
  python: module pyspark.daemon not found

Any thoughts?

Best,
Andrew


On Mon, Jan 11, 2016 at 12:25 PM, Bryan Cutler <cutl...@gmail.com> wrote:

> This could be an environment issue, could you give more details about the
> OS/architecture that you are using?  If you are sure everything is
> installed correctly on each node following the guide on "Running Spark on
> Yarn" http://spark.apache.org/docs/latest/running-on-yarn.html and that
> the spark assembly jar is reachable, then I would check to see if you can
> submit a local job to just run on one node.
>
> On Fri, Jan 8, 2016 at 5:22 PM, Andrew Weiner <
> andrewweiner2...@u.northwestern.edu> wrote:
>
>> Now for simplicity I'm testing with wordcount.py from the provided
>> examples, and using Spark 1.6.0
>>
>> The first error I get is:
>>
>> 16/01/08 19:14:46 ERROR lzo.GPLNativeCodeLoader: Could not load native
>> gpl library
>> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
>>         at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
>>         at [....]
>>
>> A bit lower down, I see this error:
>>
>> 16/01/08 19:14:48 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
>> 0.0 (TID 0, mundonovo-priv): org.apache.spark.SparkException:
>> Error from python worker:
>>   python: module pyspark.daemon not found
>> PYTHONPATH was:
>>
>> /scratch5/hadoop/yarn/local/usercache/<username>/filecache/22/spark-assembly-1.6.0-hadoop2.4.0.jar:/home/jpr123/hg.pacific/python-common:/home/jpr123/python-libs:/home/jpr123/lib/python2.7/site-packages:/home/zsb739/local/lib/python2.7/site-packages:/home/jpr123/mobile-cdn-analysis:/home/<username>/lib/python2.7/site-packages:/scratch4/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0136/container_1450370639491_0136_01_000002/pyspark.zip:/scratch4/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0136/container_1450370639491_0136_01_000002/py4j-0.9-src.zip
>> java.io.EOFException
>>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>         at [....]
>>
>> And then a few more similar pyspark.daemon not found errors...
>>
>> Andrew
>>
>>
>>
>> On Fri, Jan 8, 2016 at 2:31 PM, Bryan Cutler <cutl...@gmail.com> wrote:
>>
>>> Hi Andrew,
>>>
>>> I know that older versions of Spark could not run PySpark on YARN in
>>> cluster mode.  I'm not sure if that is fixed in 1.6.0 though.  Can you try
>>> setting deploy-mode option to "client" when calling spark-submit?
>>>
>>> Bryan
>>>
>>> On Thu, Jan 7, 2016 at 2:39 PM, weineran <
>>> andrewweiner2...@u.northwestern.edu> wrote:
>>>
>>>> Hello,
>>>>
>>>> When I try to submit a python job using spark-submit (using --master
>>>> yarn
>>>> --deploy-mode cluster), I get the following error:
>>>>
>>>> /Traceback (most recent call last):
>>>>   File "loss_rate_by_probe.py", line 15, in ?
>>>>     from pyspark import SparkContext
>>>>   File
>>>>
>>>> "/scratch5/hadoop/yarn/local/usercache/<username>/filecache/18/spark-assembly-1.3.1-hadoop2.4.0.jar/pyspark/__init__.py",
>>>> line 41, in ?
>>>>   File
>>>>
>>>> "/scratch5/hadoop/yarn/local/usercache/<username>/filecache/18/spark-assembly-1.3.1-hadoop2.4.0.jar/pyspark/context.py",
>>>> line 219
>>>>     with SparkContext._lock:
>>>>                     ^
>>>> SyntaxError: invalid syntax/
>>>>
>>>> This is very similar to  this post from 2014
>>>> <
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-lock-Error-td18233.html
>>>> >
>>>> , but unlike that person I am using Python 2.7.8.
>>>>
>>>> Here is what I'm using:
>>>> Spark 1.3.1
>>>> Hadoop 2.4.0.2.1.5.0-695
>>>> Python 2.7.8
>>>>
>>>> Another clue:  I also installed Spark 1.6.0 and tried to submit the same
>>>> job.  I got a similar error:
>>>>
>>>> /Traceback (most recent call last):
>>>>   File "loss_rate_by_probe.py", line 15, in ?
>>>>     from pyspark import SparkContext
>>>>   File
>>>>
>>>> "/scratch5/hadoop/yarn/local/usercache/<username>/appcache/application_1450370639491_0119/container_1450370639491_0119_01_000001/pyspark.zip/pyspark/__init__.py",
>>>> line 61
>>>>     indent = ' ' * (min(len(m) for m in indents) if indents else 0)
>>>>                                                   ^
>>>> SyntaxError: invalid syntax/
>>>>
>>>> Any thoughts?
>>>>
>>>> Andrew
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-SyntaxError-invalid-syntax-tp25910.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: SparkContext SyntaxError: invalid syntax

Reply via email to