Hi I think I found a bug in the iPython notebook integration. I am not sure how to report it
I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the cluster using the launch script provided by spark I start iPython notebook on my cluster master as follows and use an ssh tunnel to open the notebook in a browser running on my local computer ec2-user@ip-172-31-20-107 ~]$ IPYTHON_OPTS="notebook --pylab inline --no-browser --port=7000" /root/spark/bin/pyspark Bellow is the code my notebook executes Bug list: 1. Why do I need to create a SparkContext? If I run pyspark interactively The context is created automatically for me 2. The print statement causes the output to be displayed in the terminal I started pyspark, not in the notebooks output Any comments or suggestions would be greatly appreciated Thanks Andy import sys from operator import add from pyspark import SparkContext # only stand alone jobs should create a SparkContext sc = SparkContext(appName="pyStreamingSparkRDDPipe²) data = [1, 2, 3, 4, 5] rdd = sc.parallelize(data) def echo(data): print "python recieved: %s" % (data) # output winds up in the shell console in my cluster (ie. The machine I launched pyspark from) rdd.foreach(echo) print "we are done"