Hi

I recently installed a new cluster using the
spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2. SparkPi sample app works correctly.

I am trying to run iPython notebook on my cluster master and use an ssh
tunnel so that I can work with the notebook in a browser running on my mac.
Bellow is how I set up the ssh tunnel

$ ssh -i $KEY_FILE -N -f -L localhost:8888:localhost:7000
ec2-user@$SPARK_MASTER

$ ssh -i $KEY_FILE ec2-user@$SPARK_MASTER
$ cd top level notebook dir
$ IPYTHON_OPTS="notebook --no-browser --port=7000" /root/spark/bin/pyspark

I am able to access my notebooks in the browser by opening
http://localhost:8888

When I run the following python code I get an error NameError: name 'sc' is
not defined? Any idea what the problem might be?

I looked through pyspark and tried various combinations of the following but
still get the same error

$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook
--no-browser --port=7000" /root/spark/bin/pyspark --master=local[2]

Kind regards

Andy





In [1]:
import sys
print (sys.version)
 
import os
print(os.getcwd() + "\n")
2.6.9 (unknown, Apr  1 2015, 18:16:00)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)]
/home/ec2-user/dataScience

In [2]:
from pyspark import SparkContext
textFile = sc.textFile("readme.txt")
textFile.take(1)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-b67a9be29bd9> in <module>()
      1 from pyspark import SparkContext
----> 2 textFile = sc.textFile("readme.txt")
      3 textFile.take(1)

NameError: name 'sc' is not defined

In [ ]:
 


Reply via email to