Kevin Grealish created SPARK-16110:
--------------------------------------

             Summary: Can't set Python via spark-submit for YARN cluster mode 
when PYSPARK_PYTHON & PYSPARK_DRIVER_PYTHON are set
                 Key: SPARK-16110
                 URL: https://issues.apache.org/jira/browse/SPARK-16110
             Project: Spark
          Issue Type: Bug
          Components: Deploy
    Affects Versions: 1.6.1
         Environment: Ubuntu 14.04.4 LTS (GNU/Linux 4.2.0-38-generic x86_64), 
Spark 1.6.1, Azure HDInsight 3.4)
            Reporter: Kevin Grealish


When a cluster has PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment 
variables set (needed for using non-system Python e.g. 
/usr/bin/anaconda/bin/python), then you are unable to override this per 
submission in YARN cluster mode.

When using spark-submit (in this case via LIVY) to submit with an override:
spark-submit --master yarn --deploy-mode cluster --conf 
'spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=python3' --conf' 
'spark.yarn.appMasterEnv.PYSPARK_PYTHON=python3' probe.py
the environment variable values will override the conf settings. A workaround 
for some can be to unset the env vars but that is not always possible (e.g. 
submitting batch via LIVY where you can only pass through the parameters to 
spark-submit).

Expectation is that the conf values above override the environment variables.

Fix is to change the order of application of conf and env vars in the yarn 
client.

Related discussion:https://issues.cloudera.org/browse/LIVY-159

Backporting this to 1.6 would be great and unblocking for me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to