FYI http://www.learn4master.com/algorithms/pyspark-unit-test-set-up-sparkcontext
From: Andrew Davidson <a...@santacruzintegration.com> Date: Wednesday, April 4, 2018 at 5:36 PM To: "user @spark" <user@spark.apache.org> Subject: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError: yield from walk( > I am having a heck of a time setting up my development environment. I used pip > to install pyspark. I also downloaded spark from apache. > > My eclipse pyDev intereperter is configured as a python3 virtualenv > > I have a simple unit test that loads a small dataframe. Df.show() generates > the following error > > > 2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 > (TID 0) > > org.apache.spark.SparkException: > > Error from python worker: > > Traceback (most recent call last): > > File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py", > line 67, in <module> > > import os > > File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py", line > 409 > > yield from walk(new_path, topdown, onerror, followlinks) > > ^ > > SyntaxError: invalid syntax > > > > > > My unittest classs is dervied from. > > > > class PySparkTestCase(unittest.TestCase): > > > > @classmethod > > def setUpClass(cls): > > conf = SparkConf().setMaster("local[2]") \ > > .setAppName(cls.__name__) #\ > > # .set("spark.authenticate.secret", "111111") > > cls.sparkContext = SparkContext(conf=conf) > > sc_values[cls.__name__] = cls.sparkContext > > cls.sqlContext = SQLContext(cls.sparkContext) > > print("aedwip:", SparkContext) > > > > @classmethod > > def tearDownClass(cls): > > print("....calling stop tearDownClas, the content of sc_values=", > sc_values) > > sc_values.clear() > > cls.sparkContext.stop() > > > > This looks similar to Class PySparkTestCase in > https://github.com/apache/spark/blob/master/python/pyspark/tests.py > > > > Any suggestions would be greatly appreciated. > > > > Andy > > > > My downloaed version is spark-2.3.0-bin-hadoop2.7 > > > > My virtual env version is > > (spark-2.3.0) $ pip show pySpark > > Name: pyspark > > Version: 2.3.0 > > Summary: Apache Spark Python API > > Home-page: https://github.com/apache/spark/tree/master/python > > Author: Spark Developers > > Author-email: d...@spark.apache.org > > License: http://www.apache.org/licenses/LICENSE-2.0 > > Location: /Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site-packages > > Requires: py4j > > (spark-2.3.0) $ > > > > (spark-2.3.0) $ python --version > > Python 3.6.1 > > (spark-2.3.0) $ > >