I am having a heck of a time getting Ipython notebooks to work on my 1.5.1
AWS cluster I created using spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2

I have read the instructions for using iPython notebook on
http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell

I want to run the notebook server on my master and use an ssh tunnel to
connect a web browser running on my mac.

I am confident the cluster is set up correctly because the sparkPi example
runs. 

I am able to use IPython notebooks on my local mac and work with spark and
local files with out any problems.

I know the ssh tunnel is working.

On my cluster I am able to use python shell in general

[ec2-user@ip-172-31-29-60 dataScience]$ /root/spark/bin/pyspark --master
local[2]



>>> from pyspark import SparkContext

>>> textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.txt")

>>> textFile.take(1)



When I run the exact same code in iPython notebook I get

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-ba11b935529e> in <module>()
     11 from pyspark import SparkContext, SparkConf
     12 
---> 13 textFile = 
sc.textFile("file:///home/ec2-user/dataScience/readme.txt")
     14 
     15 textFile.take(1)

NameError: name 'sc' is not defined



To try an debug I wrote a script to launch pyspark and added Œset ­x¹ to
pyspark so I could see what the script was doing

Any idea how I can debug this?

Thanks in advance

Andy

$ cat notebook.sh

set -x

export PYSPARK_DRIVER_PYTHON=ipython

export PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7000"

/root/spark/bin/pyspark --master local[2]




[ec2-user@ip-172-31-29-60 dataScience]$ ./notebook.sh

++ export PYSPARK_DRIVER_PYTHON=ipython

++ PYSPARK_DRIVER_PYTHON=ipython

++ export 'PYSPARK_DRIVER_PYTHON_OPTS=notebook --no-browser --port=7000'

++ PYSPARK_DRIVER_PYTHON_OPTS='notebook --no-browser --port=7000'

++ /root/spark/bin/pyspark --master 'local[2]'

+++ dirname /root/spark/bin/pyspark

++ cd /root/spark/bin/..

++ pwd

+ export SPARK_HOME=/root/spark

+ SPARK_HOME=/root/spark

+ source /root/spark/bin/load-spark-env.sh

++++ dirname /root/spark/bin/pyspark

+++ cd /root/spark/bin/..

+++ pwd

++ FWDIR=/root/spark

++ '[' -z '' ']'

++ export SPARK_ENV_LOADED=1

++ SPARK_ENV_LOADED=1

++++ dirname /root/spark/bin/pyspark

+++ cd /root/spark/bin/..

+++ pwd

++ parent_dir=/root/spark

++ user_conf_dir=/root/spark/conf

++ '[' -f /root/spark/conf/spark-env.sh ']'

++ set -a

++ . /root/spark/conf/spark-env.sh

+++ export JAVA_HOME=/usr/java/latest

+++ JAVA_HOME=/usr/java/latest

+++ export SPARK_LOCAL_DIRS=/mnt/spark,/mnt2/spark

+++ SPARK_LOCAL_DIRS=/mnt/spark,/mnt2/spark

+++ export SPARK_MASTER_OPTS=

+++ SPARK_MASTER_OPTS=

+++ '[' -n 1 ']'

+++ export SPARK_WORKER_INSTANCES=1

+++ SPARK_WORKER_INSTANCES=1

+++ export SPARK_WORKER_CORES=2

+++ SPARK_WORKER_CORES=2

+++ export HADOOP_HOME=/root/ephemeral-hdfs

+++ HADOOP_HOME=/root/ephemeral-hdfs

+++ export 
SPARK_MASTER_IP=ec2-54-215-207-132.us-west-1.compute.amazonaws.com

+++ SPARK_MASTER_IP=ec2-54-215-207-132.us-west-1.compute.amazonaws.com

++++ cat /root/spark-ec2/cluster-url

+++ export 
MASTER=spark://ec2-54-215-207-132.us-west-1.compute.amazonaws.com:7077

+++ MASTER=spark://ec2-54-215-207-132.us-west-1.compute.amazonaws.com:7077

+++ export SPARK_SUBMIT_LIBRARY_PATH=:/root/ephemeral-hdfs/lib/native/

+++ SPARK_SUBMIT_LIBRARY_PATH=:/root/ephemeral-hdfs/lib/native/

+++ export SPARK_SUBMIT_CLASSPATH=::/root/ephemeral-hdfs/conf

+++ SPARK_SUBMIT_CLASSPATH=::/root/ephemeral-hdfs/conf

++++ wget -q -O - http://169.254.169.254/latest/meta-data/public-hostname

+++ export 
SPARK_PUBLIC_DNS=ec2-54-215-207-132.us-west-1.compute.amazonaws.com

+++ SPARK_PUBLIC_DNS=ec2-54-215-207-132.us-west-1.compute.amazonaws.com

+++ export YARN_CONF_DIR=/root/ephemeral-hdfs/conf

+++ YARN_CONF_DIR=/root/ephemeral-hdfs/conf

++++ id -u

+++ '[' 222 == 0 ']'

++ set +a

++ '[' -z '' ']'

++ ASSEMBLY_DIR2=/root/spark/assembly/target/scala-2.11

++ ASSEMBLY_DIR1=/root/spark/assembly/target/scala-2.10

++ [[ -d /root/spark/assembly/target/scala-2.11 ]]

++ '[' -d /root/spark/assembly/target/scala-2.11 ']'

++ export SPARK_SCALA_VERSION=2.10

++ SPARK_SCALA_VERSION=2.10

+ export '_SPARK_CMD_USAGE=Usage: ./bin/pyspark [options]'

+ _SPARK_CMD_USAGE='Usage: ./bin/pyspark [options]'

+ hash python2.7

+ DEFAULT_PYTHON=python2.7

+ [[ -n '' ]]

+ [[ '' == \1 ]]

+ [[ -z ipython ]]

+ [[ -z '' ]]

+ [[ ipython == *ipython* ]]

+ [[ python2.7 != \p\y\t\h\o\n\2\.\7 ]]

+ PYSPARK_PYTHON=python2.7

+ export PYSPARK_PYTHON

+ export PYTHONPATH=/root/spark/python/:

+ PYTHONPATH=/root/spark/python/:

+ export 
PYTHONPATH=/root/spark/python/lib/py4j-0.8.2.1-src.zip:/root/spark/python/:

+ 
PYTHONPATH=/root/spark/python/lib/py4j-0.8.2.1-src.zip:/root/spark/python/:

+ export OLD_PYTHONSTARTUP=

+ OLD_PYTHONSTARTUP=

+ export PYTHONSTARTUP=/root/spark/python/pyspark/shell.py

+ PYTHONSTARTUP=/root/spark/python/pyspark/shell.py

+ [[ -n '' ]]

+ export PYSPARK_DRIVER_PYTHON

+ export PYSPARK_DRIVER_PYTHON_OPTS

+ exec /root/spark/bin/spark-submit pyspark-shell-main --name PySparkShell
--master 'local[2]'

[NotebookApp] Using existing profile dir:
u'/home/ec2-user/.ipython/profile_default'

[NotebookApp] Serving notebooks from /home/ec2-user/dataScience

[NotebookApp] The IPython Notebook is running at: http://127.0.0.1:7000/

[NotebookApp] Use Control-C to stop this server and shut down all kernels.

[NotebookApp] Using MathJax from CDN:
http://cdn.mathjax.org/mathjax/latest/MathJax.js

[NotebookApp] Kernel started: 2ba6864b-0dc8-4814-8e05-4f532cb40e2b

[NotebookApp] Connecting to: tcp://127.0.0.1:55099

[NotebookApp] Connecting to: tcp://127.0.0.1:48994

[NotebookApp] Connecting to: tcp://127.0.0.1:57214

[IPKernelApp] To connect another client to this kernel, use:

[IPKernelApp] --existing kernel-2ba6864b-0dc8-4814-8e05-4f532cb40e2b.json




Reply via email to