spark-ec2 script

Andrew Davidson (JIRA) Wed, 04 Nov 2015 14:29:54 -0800

Andrew Davidson created SPARK-11509:
---------------------------------------


             Summary: ipython notebooks do not work on clusters created using 
spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 script
                 Key: SPARK-11509
                 URL: https://issues.apache.org/jira/browse/SPARK-11509
             Project: Spark
          Issue Type: Bug
          Components: Documentation, EC2, PySpark
    Affects Versions: 1.5.1
         Environment: AWS cluster
[ec2-user@ip-172-31-29-60 ~]$ uname -a
Linux ip-172-31-29-60.us-west-1.compute.internal 3.4.37-40.44.amzn1.x86_64 #1 
SMP Thu Mar 21 01:17:08 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

            Reporter: Andrew Davidson


I recently downloaded  spark-1.5.1-bin-hadoop2.6 to my local mac.

I used spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create an aws cluster. I am 
able to run the java SparkPi example on the cluster how ever I am not able to 
run ipython notebooks on the cluster. (I connect using ssh tunnel)

According to the 1.5.1 getting started doc 
http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell

The following should work

 PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook 
--no-browser --port=7000" /root/spark/bin/pyspark

I am able to connect to the notebook server and start a notebook how ever

bug 1) the default sparkContext does not exist

from pyspark import SparkContext
textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md")
textFile.take(3

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-127b6a58d5cc> in <module>()
      1 from pyspark import SparkContext
----> 2 textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md")
      3 textFile.take(3)

NameError: name 'sc' is not defined

bug 2)
 If I create a SparkContext I get the following python versions miss match error

sc = SparkContext("local", "Simple App")
textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md")
textFile.take(3)

 File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver 2.6, 
PySpark cannot run with different minor versions


I am able to run ipython notebooks on my local Mac as follows. (by default you 
would get an error that the driver and works are using different version of 
python)

$ cat ~/bin/pySparkNotebook.sh
#!/bin/sh 

set -x # turn debugging on
#set +x # turn debugging off

export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3
IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $*$ 


I have spent a lot of time trying to debug the pyspark script however I can not 
figure out what the problem is

Please let me know if there is something I can do to help

Andy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11509) ipython notebooks do not work on clusters created using spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 script

Reply via email to