Hi

I think I found a bug in the iPython notebook integration. I am not sure how
to report it

I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the
cluster using the launch script provided by spark

I start iPython notebook on my cluster master as follows and use an ssh
tunnel to open the notebook in a browser running on my local computer

ec2-user@ip-172-31-20-107 ~]$ IPYTHON_OPTS="notebook --pylab inline
--no-browser --port=7000" /root/spark/bin/pyspark


Bellow is the code my notebook executes


Bug list:
1. Why do I need to create a SparkContext? If I run pyspark interactively
The context is created automatically for me
2. The print statement causes the output to be displayed in the terminal I
started pyspark, not in the notebooks output
Any comments or suggestions would be greatly appreciated

Thanks

Andy


import sys
from operator import add

from pyspark import SparkContext

# only stand alone jobs should create a SparkContext
sc = SparkContext(appName="pyStreamingSparkRDDPipe²)

data = [1, 2, 3, 4, 5]
rdd = sc.parallelize(data)

def echo(data):
    print "python recieved: %s" % (data) # output winds up in the shell
console in my cluster (ie. The machine I launched pyspark from)

rdd.foreach(echo)
print "we are done"




Reply via email to