FYI

http://www.learn4master.com/algorithms/pyspark-unit-test-set-up-sparkcontext

From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Wednesday, April 4, 2018 at 5:36 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
yield from walk(

> I am having a heck of a time setting up my development environment. I used pip
> to install pyspark. I also downloaded spark from apache.
> 
> My eclipse pyDev intereperter is configured as a python3 virtualenv
> 
> I have a simple unit test that loads a small dataframe. Df.show() generates
> the following error
> 
> 
> 2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0
> (TID 0)
> 
> org.apache.spark.SparkException:
> 
> Error from python worker:
> 
>   Traceback (most recent call last):
> 
>     File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py",
> line 67, in <module>
> 
>       import os
> 
>     File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py", line
> 409
> 
>       yield from walk(new_path, topdown, onerror, followlinks)
> 
>                ^
> 
>   SyntaxError: invalid syntax
> 
> 
> 
> 
> 
> My unittest classs is dervied from.
> 
> 
> 
> class PySparkTestCase(unittest.TestCase):
> 
> 
> 
>     @classmethod
> 
>     def setUpClass(cls):
> 
>         conf = SparkConf().setMaster("local[2]") \
> 
>             .setAppName(cls.__name__) #\
> 
> #             .set("spark.authenticate.secret", "111111")
> 
>         cls.sparkContext = SparkContext(conf=conf)
> 
>         sc_values[cls.__name__] = cls.sparkContext
> 
>         cls.sqlContext = SQLContext(cls.sparkContext)
> 
>         print("aedwip:", SparkContext)
> 
> 
> 
>     @classmethod
> 
>     def tearDownClass(cls):
> 
>         print("....calling stop tearDownClas, the content of sc_values=",
> sc_values)
> 
>         sc_values.clear()
> 
>         cls.sparkContext.stop()
> 
> 
> 
> This looks similar to Class  PySparkTestCase in
> https://github.com/apache/spark/blob/master/python/pyspark/tests.py
> 
> 
> 
> Any suggestions would be greatly appreciated.
> 
> 
> 
> Andy
> 
> 
> 
> My downloaed version is spark-2.3.0-bin-hadoop2.7
> 
> 
> 
> My virtual env version is
> 
> (spark-2.3.0) $ pip show pySpark
> 
> Name: pyspark
> 
> Version: 2.3.0
> 
> Summary: Apache Spark Python API
> 
> Home-page: https://github.com/apache/spark/tree/master/python
> 
> Author: Spark Developers
> 
> Author-email: d...@spark.apache.org
> 
> License: http://www.apache.org/licenses/LICENSE-2.0
> 
> Location: /Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site-packages
> 
> Requires: py4j
> 
> (spark-2.3.0) $ 
> 
> 
> 
> (spark-2.3.0) $ python --version
> 
> Python 3.6.1
> 
> (spark-2.3.0) $ 
> 
> 


Reply via email to