PySpark definetly works for me in ipython notebook. A good way to debug is do "setMaster("local")" in your python sc object, see if that works. Then from there, modify it to point to the real spark server.
Also, I added a hack where i did sys.path.insert the path to pyspark in my python note book to get it working properly. You can try these instructions out if you want which i recently put together based on some other stuff online + a few minor modifications . http://jayunit100.blogspot.com/2014/07/ipython-on-spark.html On Thu, Oct 9, 2014 at 2:50 PM, Andy Davidson <a...@santacruzintegration.com > wrote: > I wonder if I am starting iPython notebook incorrectly. The example in my > original email does not work. It looks like stdout is not configured > correctly If I submit it as a python.py file It works correctly > > Any idea how I what the problem is? > > > Thanks > > Andy > > > From: Andrew Davidson <a...@santacruzintegration.com> > Date: Tuesday, October 7, 2014 at 4:23 PM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: bug with IPython notebook? > > Hi > > I think I found a bug in the iPython notebook integration. I am not sure > how to report it > > I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the > cluster using the launch script provided by spark > > I start iPython notebook on my cluster master as follows and use an ssh > tunnel to open the notebook in a browser running on my local computer > > ec2-user@ip-172-31-20-107 ~]$ IPYTHON_OPTS="notebook --pylab inline > --no-browser --port=7000" /root/spark/bin/pyspark > > Bellow is the code my notebook executes > > > Bug list: > > 1. Why do I need to create a SparkContext? If I run pyspark > interactively The context is created automatically for me > 2. The print statement causes the output to be displayed in the > terminal I started pyspark, not in the notebooks output > > Any comments or suggestions would be greatly appreciated > > Thanks > > Andy > > > import sys > from operator import add > > from pyspark import SparkContext > > # only stand alone jobs should create a SparkContext > sc = SparkContext(appName="pyStreamingSparkRDDPipe”) > > data = [1, 2, 3, 4, 5] > rdd = sc.parallelize(data) > > def echo(data): > print "python recieved: %s" % (data) # output winds up in the shell > console in my cluster (ie. The machine I launched pyspark from) > > rdd.foreach(echo) > print "we are done" > > > -- jay vyas