In case anyone runs into this issue in the future, we got it working: the following variable must be set on the edge node:
export PYSPARK_PYTHON=/your/path/to/whatever/python/you/want/to/run/bin/python I didn't realize that variable gets passed to every worker node. All I saw when searching for this issue was documentation for an older version of Spark which mentions using SPARK_YARN_USER_ENV to set PYSPARK_PYTHON within spark-env.sh, which didn't work for us on Spark 1.3. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/collect-works-take-returns-ImportError-No-module-named-iter-tp24199p24234.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org