[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21468 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user pgandhi999 commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r202839172 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -813,8 +813,14 @@ private[spark] class Client( if (pythonPath.nonEmpty) { val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = --- End diff -- @tgravescs Have replaced the if-else code with ++ operator. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user pgandhi999 commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r202782358 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -813,8 +813,20 @@ private[spark] class Client( if (pythonPath.nonEmpty) { val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = +if (env.contains("PYTHONPATH")) { + env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr +} else { + pythonPathStr +} + env("PYTHONPATH") = newValue + if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) { +sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + } else { +val pythonPathExecutorEnv = sparkConf.getExecutorEnv.toMap.get("PYTHONPATH").get + + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr +sparkConf.setExecutorEnv("PYTHONPATH", pythonPathExecutorEnv) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r202713900 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -813,8 +813,14 @@ private[spark] class Client( if (pythonPath.nonEmpty) { val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = --- End diff -- Also note as @vanzin said you can just use the ++=: operator with the listbuffer to prepend and get rid of the if conditions before converting to string. env.get("PYTHONPATH") ++=: (sys.env.get("PYTHONPATH") ++=: pythonPath) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r202171390 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -813,8 +813,20 @@ private[spark] class Client( if (pythonPath.nonEmpty) { val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = +if (env.contains("PYTHONPATH")) { + env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr +} else { + pythonPathStr +} + env("PYTHONPATH") = newValue + if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) { +sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + } else { +val pythonPathExecutorEnv = sparkConf.getExecutorEnv.toMap.get("PYTHONPATH").get + + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr +sparkConf.setExecutorEnv("PYTHONPATH", pythonPathExecutorEnv) --- End diff -- move the setExecutorEnv outside of the if and have the if return the value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user pgandhi999 commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r198936274 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -811,10 +811,18 @@ private[spark] class Client( // Finally, update the Spark config to propagate PYTHONPATH to the AM and executors. if (pythonPath.nonEmpty) { - val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) + val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = --- End diff -- I would have done it, but I need the pythonPathStr variable to set it in executorEnv so have not modified that bit. Let me know if you think otherwise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user pgandhi999 commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r198935970 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -811,10 +811,18 @@ private[spark] class Client( // Finally, update the Spark config to propagate PYTHONPATH to the AM and executors. if (pythonPath.nonEmpty) { - val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) + val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = +if (env.contains("PYTHONPATH")) { + env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr +} else { + pythonPathStr +} + env("PYTHONPATH") = newValue + if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) { --- End diff -- That is indeed an issue, fixed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user pgandhi999 commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r198935817 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -811,10 +811,18 @@ private[spark] class Client( // Finally, update the Spark config to propagate PYTHONPATH to the AM and executors. if (pythonPath.nonEmpty) { - val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) + val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath) --- End diff -- Makes sense, have changed it to ++ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r198864713 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -811,10 +811,18 @@ private[spark] class Client( // Finally, update the Spark config to propagate PYTHONPATH to the AM and executors. if (pythonPath.nonEmpty) { - val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) + val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = +if (env.contains("PYTHONPATH")) { + env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr +} else { + pythonPathStr +} + env("PYTHONPATH") = newValue + if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) { +sparkConf.setExecutorEnv("PYTHONPATH", newValue) --- End diff -- when I did some testing of this, this was causing the job to fail, Parth is investigating.I agree with Vanzin on the executor path comment as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r197971721 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -811,10 +811,18 @@ private[spark] class Client( // Finally, update the Spark config to propagate PYTHONPATH to the AM and executors. if (pythonPath.nonEmpty) { - val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) + val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = +if (env.contains("PYTHONPATH")) { + env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr +} else { + pythonPathStr +} + env("PYTHONPATH") = newValue + if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) { --- End diff -- I see that the previous code was overriding this in the executor env; but perhaps the right thing here is to concatenate them, otherwise the executor might be missing the py4j/pyspark stuff this class adds. So, basically, what you want is: - driver: env.get(pp) ++ sys.env.get(pp) ++ pythonPath - executor: pythonPath ++ sparkConf.getExecutorEnv(pp) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r19797 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -811,10 +811,18 @@ private[spark] class Client( // Finally, update the Spark config to propagate PYTHONPATH to the AM and executors. if (pythonPath.nonEmpty) { - val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) + val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = --- End diff -- This seems unnecessary. What you're trying to do here is concatenate the values of the current `PYTHONPATH` env variable, the `PYTHONPATH` created by Spark, and the `PYTHONPATH` set in `spark.yarn.appMasterEnv`. That's, respectively, `sys.env.get("PYTHONPATH")`, `pythonPath`, and `env.get("PYTHONPATH")`. So just concatenate those. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r197970463 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -811,10 +811,18 @@ private[spark] class Client( // Finally, update the Spark config to propagate PYTHONPATH to the AM and executors. if (pythonPath.nonEmpty) { - val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) + val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath) --- End diff -- This is modifying `pythonPath`, but it's not used again after this line, so why? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user pgandhi999 commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r194148280 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -813,8 +813,14 @@ private[spark] class Client( if (pythonPath.nonEmpty) { val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = --- End diff -- Thank you for your comments @vanzin , have made the necessary changes. As far as precedence is concerned, I am still not sure whether I understood your question at first, however @tgravescs clarified it for me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r193842887 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -813,8 +813,14 @@ private[spark] class Client( if (pythonPath.nonEmpty) { val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = --- End diff -- good questions - precedence: So right now you can work around this issue by exporting PYTHONPATH before you launch spark-submit, I think this is something that could just be in someone's env on the launcher box and might not be what you want in a yarn container. I would think that specifying explicit pythonpath via spark.yarn.appMasterEnv would take precedence over that since you explicitly configured. Now the second question is where that fails with the py-files and that one isn't as clear to me since like you said its explicitly specified. Maybe we do py-files then spark.yarn.appMasterEnv.PYTHONPATH and then last env PYTHONPATH. that is different from the way it is now though. thoughts? - agree this should not be reflected in the executors so if it is we shouldn't do that. We should make sure the spark. executorEnv.PYTHONPATH works --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r192857531 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -813,8 +813,14 @@ private[spark] class Client( if (pythonPath.nonEmpty) { val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = --- End diff -- You could just say `env.get("PYTHONPATH") ++=: pythonPath` before turning the list into a string. But there's also two extra questions here: - precedence; should the env come before or after the files added with `py-files`? I kinda think after makes more sense, since files are generally provided in the command line. - should `appMasterEnv` be reflected in executors? With your code it is. I'm not so sure it should. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
GitHub user pgandhi999 opened a pull request: https://github.com/apache/spark/pull/21468 [SPARK-22151] : PYTHONPATH not picked up from the spark.yarn.appMaste⦠â¦rEnv properly Running in yarn cluster mode and trying to set pythonpath via spark.yarn.appMasterEnv.PYTHONPATH doesn't work. the yarn Client code looks at the env variables: val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) But when you set spark.yarn.appMasterEnv it puts it into the local env. So the python path set in spark.yarn.appMasterEnv isn't properly set. You can work around if you are running in cluster mode by setting it on the client like: PYTHONPATH=./addon/python/ spark-submit ## What changes were proposed in this pull request? In Client.scala, PYTHONPATH was being overridden, so changed code to append values to PYTHONPATH instead of overriding them. ## How was this patch tested? Added log statements to ApplicationMaster.scala to check for environment variable PYTHONPATH, ran a spark job in cluster mode before the change and verified the issue. Performed the same test after the change and verified the fix. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pgandhi999/spark SPARK-22151 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21468.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21468 commit 0aee8faad9cb60721b153c9bc2187f87a4036b9e Author: pgandhi Date: 2018-05-31T14:36:13Z [SPARK-22151] : PYTHONPATH not picked up from the spark.yarn.appMasterEnv properly Running in yarn cluster mode and trying to set pythonpath via spark.yarn.appMasterEnv.PYTHONPATH doesn't work. the yarn Client code looks at the env variables: val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) But when you set spark.yarn.appMasterEnv it puts it into the local env. So the python path set in spark.yarn.appMasterEnv isn't properly set. You can work around if you are running in cluster mode by setting it on the client like: PYTHONPATH=./addon/python/ spark-submit In Client.scala, PYTHONPATH was being overridden, so changed code to append values to PYTHONPATH --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org