I am having a heck of a time getting Ipython notebooks to work on my 1.5.1 AWS cluster I created using spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2
I have read the instructions for using iPython notebook on http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell I want to run the notebook server on my master and use an ssh tunnel to connect a web browser running on my mac. I am confident the cluster is set up correctly because the sparkPi example runs. I am able to use IPython notebooks on my local mac and work with spark and local files with out any problems. I know the ssh tunnel is working. On my cluster I am able to use python shell in general [ec2-user@ip-172-31-29-60 dataScience]$ /root/spark/bin/pyspark --master local[2] >>> from pyspark import SparkContext >>> textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.txt") >>> textFile.take(1) When I run the exact same code in iPython notebook I get --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-1-ba11b935529e> in <module>() 11 from pyspark import SparkContext, SparkConf 12 ---> 13 textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.txt") 14 15 textFile.take(1) NameError: name 'sc' is not defined To try an debug I wrote a script to launch pyspark and added set x¹ to pyspark so I could see what the script was doing Any idea how I can debug this? Thanks in advance Andy $ cat notebook.sh set -x export PYSPARK_DRIVER_PYTHON=ipython export PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7000" /root/spark/bin/pyspark --master local[2] [ec2-user@ip-172-31-29-60 dataScience]$ ./notebook.sh ++ export PYSPARK_DRIVER_PYTHON=ipython ++ PYSPARK_DRIVER_PYTHON=ipython ++ export 'PYSPARK_DRIVER_PYTHON_OPTS=notebook --no-browser --port=7000' ++ PYSPARK_DRIVER_PYTHON_OPTS='notebook --no-browser --port=7000' ++ /root/spark/bin/pyspark --master 'local[2]' +++ dirname /root/spark/bin/pyspark ++ cd /root/spark/bin/.. ++ pwd + export SPARK_HOME=/root/spark + SPARK_HOME=/root/spark + source /root/spark/bin/load-spark-env.sh ++++ dirname /root/spark/bin/pyspark +++ cd /root/spark/bin/.. +++ pwd ++ FWDIR=/root/spark ++ '[' -z '' ']' ++ export SPARK_ENV_LOADED=1 ++ SPARK_ENV_LOADED=1 ++++ dirname /root/spark/bin/pyspark +++ cd /root/spark/bin/.. +++ pwd ++ parent_dir=/root/spark ++ user_conf_dir=/root/spark/conf ++ '[' -f /root/spark/conf/spark-env.sh ']' ++ set -a ++ . /root/spark/conf/spark-env.sh +++ export JAVA_HOME=/usr/java/latest +++ JAVA_HOME=/usr/java/latest +++ export SPARK_LOCAL_DIRS=/mnt/spark,/mnt2/spark +++ SPARK_LOCAL_DIRS=/mnt/spark,/mnt2/spark +++ export SPARK_MASTER_OPTS= +++ SPARK_MASTER_OPTS= +++ '[' -n 1 ']' +++ export SPARK_WORKER_INSTANCES=1 +++ SPARK_WORKER_INSTANCES=1 +++ export SPARK_WORKER_CORES=2 +++ SPARK_WORKER_CORES=2 +++ export HADOOP_HOME=/root/ephemeral-hdfs +++ HADOOP_HOME=/root/ephemeral-hdfs +++ export SPARK_MASTER_IP=ec2-54-215-207-132.us-west-1.compute.amazonaws.com +++ SPARK_MASTER_IP=ec2-54-215-207-132.us-west-1.compute.amazonaws.com ++++ cat /root/spark-ec2/cluster-url +++ export MASTER=spark://ec2-54-215-207-132.us-west-1.compute.amazonaws.com:7077 +++ MASTER=spark://ec2-54-215-207-132.us-west-1.compute.amazonaws.com:7077 +++ export SPARK_SUBMIT_LIBRARY_PATH=:/root/ephemeral-hdfs/lib/native/ +++ SPARK_SUBMIT_LIBRARY_PATH=:/root/ephemeral-hdfs/lib/native/ +++ export SPARK_SUBMIT_CLASSPATH=::/root/ephemeral-hdfs/conf +++ SPARK_SUBMIT_CLASSPATH=::/root/ephemeral-hdfs/conf ++++ wget -q -O - http://169.254.169.254/latest/meta-data/public-hostname +++ export SPARK_PUBLIC_DNS=ec2-54-215-207-132.us-west-1.compute.amazonaws.com +++ SPARK_PUBLIC_DNS=ec2-54-215-207-132.us-west-1.compute.amazonaws.com +++ export YARN_CONF_DIR=/root/ephemeral-hdfs/conf +++ YARN_CONF_DIR=/root/ephemeral-hdfs/conf ++++ id -u +++ '[' 222 == 0 ']' ++ set +a ++ '[' -z '' ']' ++ ASSEMBLY_DIR2=/root/spark/assembly/target/scala-2.11 ++ ASSEMBLY_DIR1=/root/spark/assembly/target/scala-2.10 ++ [[ -d /root/spark/assembly/target/scala-2.11 ]] ++ '[' -d /root/spark/assembly/target/scala-2.11 ']' ++ export SPARK_SCALA_VERSION=2.10 ++ SPARK_SCALA_VERSION=2.10 + export '_SPARK_CMD_USAGE=Usage: ./bin/pyspark [options]' + _SPARK_CMD_USAGE='Usage: ./bin/pyspark [options]' + hash python2.7 + DEFAULT_PYTHON=python2.7 + [[ -n '' ]] + [[ '' == \1 ]] + [[ -z ipython ]] + [[ -z '' ]] + [[ ipython == *ipython* ]] + [[ python2.7 != \p\y\t\h\o\n\2\.\7 ]] + PYSPARK_PYTHON=python2.7 + export PYSPARK_PYTHON + export PYTHONPATH=/root/spark/python/: + PYTHONPATH=/root/spark/python/: + export PYTHONPATH=/root/spark/python/lib/py4j-0.8.2.1-src.zip:/root/spark/python/: + PYTHONPATH=/root/spark/python/lib/py4j-0.8.2.1-src.zip:/root/spark/python/: + export OLD_PYTHONSTARTUP= + OLD_PYTHONSTARTUP= + export PYTHONSTARTUP=/root/spark/python/pyspark/shell.py + PYTHONSTARTUP=/root/spark/python/pyspark/shell.py + [[ -n '' ]] + export PYSPARK_DRIVER_PYTHON + export PYSPARK_DRIVER_PYTHON_OPTS + exec /root/spark/bin/spark-submit pyspark-shell-main --name PySparkShell --master 'local[2]' [NotebookApp] Using existing profile dir: u'/home/ec2-user/.ipython/profile_default' [NotebookApp] Serving notebooks from /home/ec2-user/dataScience [NotebookApp] The IPython Notebook is running at: http://127.0.0.1:7000/ [NotebookApp] Use Control-C to stop this server and shut down all kernels. [NotebookApp] Using MathJax from CDN: http://cdn.mathjax.org/mathjax/latest/MathJax.js [NotebookApp] Kernel started: 2ba6864b-0dc8-4814-8e05-4f532cb40e2b [NotebookApp] Connecting to: tcp://127.0.0.1:55099 [NotebookApp] Connecting to: tcp://127.0.0.1:48994 [NotebookApp] Connecting to: tcp://127.0.0.1:57214 [IPKernelApp] To connect another client to this kernel, use: [IPKernelApp] --existing kernel-2ba6864b-0dc8-4814-8e05-4f532cb40e2b.json