Hi I recently installed a new cluster using the spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2. SparkPi sample app works correctly.
I am trying to run iPython notebook on my cluster master and use an ssh tunnel so that I can work with the notebook in a browser running on my mac. Bellow is how I set up the ssh tunnel $ ssh -i $KEY_FILE -N -f -L localhost:8888:localhost:7000 ec2-user@$SPARK_MASTER $ ssh -i $KEY_FILE ec2-user@$SPARK_MASTER $ cd top level notebook dir $ IPYTHON_OPTS="notebook --no-browser --port=7000" /root/spark/bin/pyspark I am able to access my notebooks in the browser by opening http://localhost:8888 When I run the following python code I get an error NameError: name 'sc' is not defined? Any idea what the problem might be? I looked through pyspark and tried various combinations of the following but still get the same error $ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7000" /root/spark/bin/pyspark --master=local[2] Kind regards Andy In [1]: import sys print (sys.version) import os print(os.getcwd() + "\n") 2.6.9 (unknown, Apr 1 2015, 18:16:00) [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] /home/ec2-user/dataScience In [2]: from pyspark import SparkContext textFile = sc.textFile("readme.txt") textFile.take(1) --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-2-b67a9be29bd9> in <module>() 1 from pyspark import SparkContext ----> 2 textFile = sc.textFile("readme.txt") 3 textFile.take(1) NameError: name 'sc' is not defined In [ ]: