[ https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614856#comment-14614856 ]
Juliet Hougland commented on SPARK-8646: ---------------------------------------- [~sowen] The pandas error came when I tried to run the pi job-- which doesn't import pandas at all. The only imports in $SPARK_1.4_HOME/examples/src/main/python/pi.py are as follows: import sys from random import random from operator import add from pyspark import SparkContext PySpark itself doesn't require pandas (if it does, that should be documented) so having the pi job (doesn't require pandas) fail with a pandas not found error is wrong, because at no point should the pi job or pyspark itself require pandas. The pandas error is very, very weird but not obviously directly related to this ticket. The problem I reported here has to do with pyspark itself not being shipped or perhaps available to the worker nodes when I run a pyspark app from spark 1.4 using YARN. > PySpark does not run on YARN > ---------------------------- > > Key: SPARK-8646 > URL: https://issues.apache.org/jira/browse/SPARK-8646 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN > Affects Versions: 1.4.0 > Environment: SPARK_HOME=local/path/to/spark1.4install/dir > also with > SPARK_HOME=local/path/to/spark1.4install/dir > PYTHONPATH=$SPARK_HOME/python/lib > Spark apps are submitted with the command: > $SPARK_HOME/bin/spark-submit outofstock/data_transform.py > hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client > data_transform contains a main method, and the rest of the args are parsed in > my own code. > Reporter: Juliet Hougland > Attachments: pi-test.log, spark1.4-SPARK_HOME-set-PYTHONPATH-set.log, > spark1.4-SPARK_HOME-set-inline-HADOOP_CONF_DIR.log, > spark1.4-SPARK_HOME-set.log > > > Running pyspark jobs result in a "no module named pyspark" when run in > yarn-client mode in spark 1.4. > [I believe this JIRA represents the change that introduced this error.| > https://issues.apache.org/jira/browse/SPARK-6869 ] > This does not represent a binary compatible change to spark. Scripts that > worked on previous spark versions (ie comands the use spark-submit) should > continue to work without modification between minor versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org