GitHub user pgandhi999 opened a pull request: https://github.com/apache/spark/pull/21468
[SPARK-22151] : PYTHONPATH not picked up from the spark.yarn.appMaste⦠â¦rEnv properly Running in yarn cluster mode and trying to set pythonpath via spark.yarn.appMasterEnv.PYTHONPATH doesn't work. the yarn Client code looks at the env variables: val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) But when you set spark.yarn.appMasterEnv it puts it into the local env. So the python path set in spark.yarn.appMasterEnv isn't properly set. You can work around if you are running in cluster mode by setting it on the client like: PYTHONPATH=./addon/python/ spark-submit ## What changes were proposed in this pull request? In Client.scala, PYTHONPATH was being overridden, so changed code to append values to PYTHONPATH instead of overriding them. ## How was this patch tested? Added log statements to ApplicationMaster.scala to check for environment variable PYTHONPATH, ran a spark job in cluster mode before the change and verified the issue. Performed the same test after the change and verified the fix. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pgandhi999/spark SPARK-22151 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21468.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21468 ---- commit 0aee8faad9cb60721b153c9bc2187f87a4036b9e Author: pgandhi <pgandhi@...> Date: 2018-05-31T14:36:13Z [SPARK-22151] : PYTHONPATH not picked up from the spark.yarn.appMasterEnv properly Running in yarn cluster mode and trying to set pythonpath via spark.yarn.appMasterEnv.PYTHONPATH doesn't work. the yarn Client code looks at the env variables: val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) But when you set spark.yarn.appMasterEnv it puts it into the local env. So the python path set in spark.yarn.appMasterEnv isn't properly set. You can work around if you are running in cluster mode by setting it on the client like: PYTHONPATH=./addon/python/ spark-submit In Client.scala, PYTHONPATH was being overridden, so changed code to append values to PYTHONPATH ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org