I am having a heck of a time setting up my development environment. I used pip to install pyspark. I also downloaded spark from apache.
My eclipse pyDev intereperter is configured as a python3 virtualenv I have a simple unit test that loads a small dataframe. Df.show() generates the following error 2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0) org.apache.spark.SparkException: Error from python worker: Traceback (most recent call last): File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py", line 67, in <module> import os File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py", line 409 yield from walk(new_path, topdown, onerror, followlinks) ^ SyntaxError: invalid syntax My unittest classs is dervied from. class PySparkTestCase(unittest.TestCase): @classmethod def setUpClass(cls): conf = SparkConf().setMaster("local[2]") \ .setAppName(cls.__name__) #\ # .set("spark.authenticate.secret", "111111") cls.sparkContext = SparkContext(conf=conf) sc_values[cls.__name__] = cls.sparkContext cls.sqlContext = SQLContext(cls.sparkContext) print("aedwip:", SparkContext) @classmethod def tearDownClass(cls): print("....calling stop tearDownClas, the content of sc_values=", sc_values) sc_values.clear() cls.sparkContext.stop() This looks similar to Class PySparkTestCase in https://github.com/apache/spark/blob/master/python/pyspark/tests.py Any suggestions would be greatly appreciated. Andy My downloaed version is spark-2.3.0-bin-hadoop2.7 My virtual env version is (spark-2.3.0) $ pip show pySpark Name: pyspark Version: 2.3.0 Summary: Apache Spark Python API Home-page: https://github.com/apache/spark/tree/master/python Author: Spark Developers Author-email: d...@spark.apache.org License: http://www.apache.org/licenses/LICENSE-2.0 Location: /Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site-packages Requires: py4j (spark-2.3.0) $ (spark-2.3.0) $ python --version Python 3.6.1 (spark-2.3.0) $