[jira] [Commented] (SPARK-9235) PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting as driver in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730680#comment-14730680 ] Aaron Glahe commented on SPARK-9235: You set it in the spark-env.sh, e.g, since we use condo as our "python env": SPARK_YARN_USER_ENV="PYSPARK_PYTHON=/srv/software/anaconda/bin/python" > PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting > as driver in yarn-cluster mode > > > Key: SPARK-9235 > URL: https://issues.apache.org/jira/browse/SPARK-9235 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.4.1, 1.5.0 > Environment: CentOS 6.6, python 2.7, Spark 1.4.1 tagged version, YARN > Cluster Manager, CDH 5.4.1 (Hadoop 2.6.0++), Java 1.7 >Reporter: Aaron Glahe >Priority: Minor > > Relates to SPARK-9229 > Env: Spark on YARN, Java 1.7, Centos 6.6, CDH 5.4.1 (Hadoop 2.6.0++), > Anaconda Python 2.7.10 "installed" in /srv/software directory > On a client/submitting machine, we set the PYSPARK_DRIVER_PYTHON env var in > spark-env.sh that pointed the anaconda python executable, which was on every > YARN node: > export PYSPARK_DRIVER_PYTHON='/srv/software/anaconda/bin/python' > side note, export PYSPARK_PYTHON='/srv/software/anaconda/bin/python' was set > as well in the spark-env.sh. > run the command: > spark-submit test.py --master yarn --deploy-mode cluster > It appears as though the Node Manager with the DRIVER does not use the > PYSPARK_DRIVER_PYTHON env python, but instead uses the CentOS system default > (which in this case is python 2.6). > Workaround appears to setting the python path in the SPARK_YARN_USER_ENV -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9235) PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting as driver when yarn-cluster mode
Aaron Glahe created SPARK-9235: -- Summary: PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting as driver when yarn-cluster mode Key: SPARK-9235 URL: https://issues.apache.org/jira/browse/SPARK-9235 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1, 1.5.0 Environment: CentOS 6.6, python 2.7, Spark 1.4.1 tagged version, YARN Cluster Manager, CDH 5.4.1 (Hadoop 2.6.0++), Java 1.7 Reporter: Aaron Glahe Priority: Minor Relates to SPARK-9229 Env: Spark on YARN, Java 1.7, Centos 6.6, CDH 5.4.1 (Hadoop 2.6.0++), Anaconda Python 2.7.10 "installed" in /srv/software directory On a client/submitting machine, we set the PYSPARK_DRIVER_PYTHON env var in spark-env.sh that pointed the anaconda python executable, which was on every YARN node: export PYSPARK_DRIVER_PYTHON='/srv/software/anaconda/bin/python' side note, export PYSPARK_PYTHON='/srv/software/anaconda/bin/python' was set as well in the spark-env.sh. run the command: spark-submit test.py --master yarn --deploy-mode cluster It appears as though the Node Manager with the DRIVER does not use the PYSPARK_DRIVER_PYTHON env python, but instead uses the CentOS system default (which in this case is python 2.6). Workaround appears to setting the python path in the SPARK_YARN_USER_ENV -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9235) PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting as driver in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Glahe updated SPARK-9235: --- Summary: PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting as driver in yarn-cluster mode (was: PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting as driver when yarn-cluster mode) > PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting > as driver in yarn-cluster mode > > > Key: SPARK-9235 > URL: https://issues.apache.org/jira/browse/SPARK-9235 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.4.1, 1.5.0 > Environment: CentOS 6.6, python 2.7, Spark 1.4.1 tagged version, YARN > Cluster Manager, CDH 5.4.1 (Hadoop 2.6.0++), Java 1.7 >Reporter: Aaron Glahe >Priority: Minor > > Relates to SPARK-9229 > Env: Spark on YARN, Java 1.7, Centos 6.6, CDH 5.4.1 (Hadoop 2.6.0++), > Anaconda Python 2.7.10 "installed" in /srv/software directory > On a client/submitting machine, we set the PYSPARK_DRIVER_PYTHON env var in > spark-env.sh that pointed the anaconda python executable, which was on every > YARN node: > export PYSPARK_DRIVER_PYTHON='/srv/software/anaconda/bin/python' > side note, export PYSPARK_PYTHON='/srv/software/anaconda/bin/python' was set > as well in the spark-env.sh. > run the command: > spark-submit test.py --master yarn --deploy-mode cluster > It appears as though the Node Manager with the DRIVER does not use the > PYSPARK_DRIVER_PYTHON env python, but instead uses the CentOS system default > (which in this case is python 2.6). > Workaround appears to setting the python path in the SPARK_YARN_USER_ENV -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org