PySpark definetly works for me in ipython notebook.  A good way to debug is
do "setMaster("local")" in your python sc object, see if that works.  Then
from there, modify it to point to the real spark server.

Also, I added a hack where i  did sys.path.insert the path to pyspark in my
python note book to get it working properly.

You can try these instructions out if you want which i recently put
together based on some other stuff online + a few minor modifications .

http://jayunit100.blogspot.com/2014/07/ipython-on-spark.html


On Thu, Oct 9, 2014 at 2:50 PM, Andy Davidson <a...@santacruzintegration.com
> wrote:

> I wonder if I am starting iPython notebook incorrectly. The example in my
> original email does not work. It looks like stdout is not configured
> correctly If I submit it as a python.py file It works correctly
>
> Any idea how I what the problem is?
>
>
> Thanks
>
> Andy
>
>
> From: Andrew Davidson <a...@santacruzintegration.com>
> Date: Tuesday, October 7, 2014 at 4:23 PM
> To: "user@spark.apache.org" <user@spark.apache.org>
> Subject: bug with IPython notebook?
>
> Hi
>
> I think I found a bug in the iPython notebook integration. I am not sure
> how to report it
>
> I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the
> cluster using the launch script provided by spark
>
> I start iPython notebook on my cluster master as follows and use an ssh
> tunnel to open the notebook in a browser running on my local computer
>
> ec2-user@ip-172-31-20-107 ~]$ IPYTHON_OPTS="notebook --pylab inline
> --no-browser --port=7000" /root/spark/bin/pyspark
>
> Bellow is the code my notebook executes
>
>
> Bug list:
>
>    1. Why do I need to create a SparkContext? If I run pyspark
>    interactively The context is created automatically for me
>    2. The print statement causes the output to be displayed in the
>    terminal I started pyspark, not in the notebooks output
>
> Any comments or suggestions would be greatly appreciated
>
> Thanks
>
> Andy
>
>
> import sys
> from operator import add
>
> from pyspark import SparkContext
>
> # only stand alone jobs should create a SparkContext
> sc = SparkContext(appName="pyStreamingSparkRDDPipe”)
>
> data = [1, 2, 3, 4, 5]
> rdd = sc.parallelize(data)
>
> def echo(data):
>     print "python recieved: %s" % (data) # output winds up in the shell
> console in my cluster (ie. The machine I launched pyspark from)
>
> rdd.foreach(echo)
> print "we are done"
>
>
>


-- 
jay vyas

Reply via email to