[ https://issues.apache.org/jira/browse/SPARK-11509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Davidson reopened SPARK-11509: ------------------------------------- My issue is not resolved I am able to use ipython notebooks on my local mac but still can not run them in/on my cluster. I have tried launching the same way I do on my mac export PYSPARK_PYTHON=python2.7 export PYSPARK_DRIVER_PYTHON=python2.7 IPYTHON_OPTS="notebook --no-browser --port=7000" $SPARK_ROOT/bin/pyspark I also tried setting export PYSPARK_PYTHON=python2.7 in /root/spark/conf/spark-env.sh on all my machines The following code example from pyspark import SparkContext sc = SparkContext("local", "Simple App") # strange we should not have to create sc textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md") textFile.take(3) generates the following error msg Py4JJavaError Traceback (most recent call last) <ipython-input-1-e0006b323300> in <module>() 2 sc = SparkContext("local", "Simple App") 3 textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md") ----> 4 textFile.take(3) /root/spark/python/pyspark/rdd.py in take(self, num) 1297 1298 p = range(partsScanned, min(partsScanned + numPartsToTry, totalParts)) -> 1299 res = self.context.runJob(self, takeUpToNumLeft, p) 1300 1301 items += res /root/spark/python/pyspark/context.py in runJob(self, rdd, partitionFunc, partitions, allowLocal) 914 # SparkContext#runJob. 915 mappedRDD = rdd.mapPartitions(partitionFunc) --> 916 port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions) 917 return list(_load_from_socket(port, mappedRDD._jrdd_deserializer)) 918 /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args) 536 answer = self.gateway_client.send_command(command) 537 return_value = get_return_value(answer, self.gateway_client, --> 538 self.target_id, self.name) 539 540 for temp_arg in temp_args: /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 298 raise Py4JJavaError( 299 'An error occurred while calling {0}{1}{2}.\n'. --> 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 2.7 than that in driver 2.6, PySpark cannot run with different minor versions at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1848) at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:393) at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 2.7 than that in driver 2.6, PySpark cannot run with different minor versions at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more Any suggestions? Kind regards Andy > ipython notebooks do not work on clusters created using > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 script > ------------------------------------------------------------------------------------------------------ > > Key: SPARK-11509 > URL: https://issues.apache.org/jira/browse/SPARK-11509 > Project: Spark > Issue Type: Bug > Components: Documentation, EC2, PySpark > Affects Versions: 1.5.1 > Environment: AWS cluster > [ec2-user@ip-172-31-29-60 ~]$ uname -a > Linux ip-172-31-29-60.us-west-1.compute.internal 3.4.37-40.44.amzn1.x86_64 #1 > SMP Thu Mar 21 01:17:08 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Andrew Davidson > > I recently downloaded spark-1.5.1-bin-hadoop2.6 to my local mac. > I used spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create an aws cluster. I am > able to run the java SparkPi example on the cluster how ever I am not able to > run ipython notebooks on the cluster. (I connect using ssh tunnel) > According to the 1.5.1 getting started doc > http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell > The following should work > PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook > --no-browser --port=7000" /root/spark/bin/pyspark > I am able to connect to the notebook server and start a notebook how ever > bug 1) the default sparkContext does not exist > from pyspark import SparkContext > textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md") > textFile.take(3 > --------------------------------------------------------------------------- > NameError Traceback (most recent call last) > <ipython-input-1-127b6a58d5cc> in <module>() > 1 from pyspark import SparkContext > ----> 2 textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md") > 3 textFile.take(3) > NameError: name 'sc' is not defined > bug 2) > If I create a SparkContext I get the following python versions miss match > error > sc = SparkContext("local", "Simple App") > textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md") > textFile.take(3) > File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in main > ("%d.%d" % sys.version_info[:2], version)) > Exception: Python in worker has different version 2.7 than that in driver > 2.6, PySpark cannot run with different minor versions > I am able to run ipython notebooks on my local Mac as follows. (by default > you would get an error that the driver and works are using different version > of python) > $ cat ~/bin/pySparkNotebook.sh > #!/bin/sh > set -x # turn debugging on > #set +x # turn debugging off > export PYSPARK_PYTHON=python3 > export PYSPARK_DRIVER_PYTHON=python3 > IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $*$ > I have spent a lot of time trying to debug the pyspark script however I can > not figure out what the problem is > Please let me know if there is something I can do to help > Andy -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org