[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21468


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-07-16 Thread pgandhi999
Github user pgandhi999 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r202839172
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -813,8 +813,14 @@ private[spark] class Client(
 if (pythonPath.nonEmpty) {
   val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
--- End diff --

@tgravescs Have replaced the if-else code with ++ operator.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-07-16 Thread pgandhi999
Github user pgandhi999 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r202782358
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -813,8 +813,20 @@ private[spark] class Client(
 if (pythonPath.nonEmpty) {
   val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
+if (env.contains("PYTHONPATH")) {
+  env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR  + 
pythonPathStr
+} else {
+  pythonPathStr
+}
+  env("PYTHONPATH") = newValue
+  if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) {
+sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  } else {
+val pythonPathExecutorEnv = 
sparkConf.getExecutorEnv.toMap.get("PYTHONPATH").get +
+  ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr
+sparkConf.setExecutorEnv("PYTHONPATH", pythonPathExecutorEnv)
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-07-16 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r202713900
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -813,8 +813,14 @@ private[spark] class Client(
 if (pythonPath.nonEmpty) {
   val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
--- End diff --

Also note as @vanzin  said you can just use the ++=: operator with the 
listbuffer to prepend and get rid of the if conditions before converting to 
string.
env.get("PYTHONPATH") ++=: (sys.env.get("PYTHONPATH") ++=: pythonPath)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-07-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r202171390
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -813,8 +813,20 @@ private[spark] class Client(
 if (pythonPath.nonEmpty) {
   val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
+if (env.contains("PYTHONPATH")) {
+  env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR  + 
pythonPathStr
+} else {
+  pythonPathStr
+}
+  env("PYTHONPATH") = newValue
+  if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) {
+sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  } else {
+val pythonPathExecutorEnv = 
sparkConf.getExecutorEnv.toMap.get("PYTHONPATH").get +
+  ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr
+sparkConf.setExecutorEnv("PYTHONPATH", pythonPathExecutorEnv)
--- End diff --

move the setExecutorEnv outside of the if and have the if return the value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-28 Thread pgandhi999
Github user pgandhi999 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r198936274
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -811,10 +811,18 @@ private[spark] class Client(
 
 // Finally, update the Spark config to propagate PYTHONPATH to the AM 
and executors.
 if (pythonPath.nonEmpty) {
-  val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
+  val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
--- End diff --

I would have done it, but I need the pythonPathStr variable to set it in 
executorEnv so have not modified that bit. Let me know if you think otherwise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-28 Thread pgandhi999
Github user pgandhi999 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r198935970
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -811,10 +811,18 @@ private[spark] class Client(
 
 // Finally, update the Spark config to propagate PYTHONPATH to the AM 
and executors.
 if (pythonPath.nonEmpty) {
-  val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
+  val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
+if (env.contains("PYTHONPATH")) {
+  env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR  + 
pythonPathStr
+} else {
+  pythonPathStr
+}
+  env("PYTHONPATH") = newValue
+  if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) {
--- End diff --

That is indeed an issue, fixed it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-28 Thread pgandhi999
Github user pgandhi999 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r198935817
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -811,10 +811,18 @@ private[spark] class Client(
 
 // Finally, update the Spark config to propagate PYTHONPATH to the AM 
and executors.
 if (pythonPath.nonEmpty) {
-  val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
+  val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath)
--- End diff --

Makes sense, have changed it to ++


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-28 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r198864713
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -811,10 +811,18 @@ private[spark] class Client(
 
 // Finally, update the Spark config to propagate PYTHONPATH to the AM 
and executors.
 if (pythonPath.nonEmpty) {
-  val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
+  val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
+if (env.contains("PYTHONPATH")) {
+  env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR  + 
pythonPathStr
+} else {
+  pythonPathStr
+}
+  env("PYTHONPATH") = newValue
+  if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) {
+sparkConf.setExecutorEnv("PYTHONPATH", newValue)
--- End diff --

when I did some testing of this, this was causing the job to fail, Parth is 
investigating.I agree with Vanzin on the executor path comment as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-25 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r197971721
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -811,10 +811,18 @@ private[spark] class Client(
 
 // Finally, update the Spark config to propagate PYTHONPATH to the AM 
and executors.
 if (pythonPath.nonEmpty) {
-  val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
+  val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
+if (env.contains("PYTHONPATH")) {
+  env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR  + 
pythonPathStr
+} else {
+  pythonPathStr
+}
+  env("PYTHONPATH") = newValue
+  if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) {
--- End diff --

I see that the previous code was overriding this in the executor env; but 
perhaps the right thing here is to concatenate them, otherwise the executor 
might be missing the py4j/pyspark stuff this class adds.

So, basically, what you want is:

- driver: env.get(pp) ++ sys.env.get(pp) ++ pythonPath
- executor: pythonPath ++ sparkConf.getExecutorEnv(pp)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-25 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r19797
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -811,10 +811,18 @@ private[spark] class Client(
 
 // Finally, update the Spark config to propagate PYTHONPATH to the AM 
and executors.
 if (pythonPath.nonEmpty) {
-  val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
+  val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
--- End diff --

This seems unnecessary. What you're trying to do here is concatenate the 
values of the current `PYTHONPATH` env variable, the `PYTHONPATH` created by 
Spark, and the `PYTHONPATH` set in `spark.yarn.appMasterEnv`.

That's, respectively, `sys.env.get("PYTHONPATH")`, `pythonPath`, and 
`env.get("PYTHONPATH")`. So just concatenate those.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-25 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r197970463
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -811,10 +811,18 @@ private[spark] class Client(
 
 // Finally, update the Spark config to propagate PYTHONPATH to the AM 
and executors.
 if (pythonPath.nonEmpty) {
-  val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
+  val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath)
--- End diff --

This is modifying `pythonPath`, but it's not used again after this line, so 
why?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-08 Thread pgandhi999
Github user pgandhi999 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r194148280
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -813,8 +813,14 @@ private[spark] class Client(
 if (pythonPath.nonEmpty) {
   val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
--- End diff --

Thank you for your comments @vanzin , have made the necessary changes. As 
far as precedence is concerned, I am still not sure whether I understood your 
question at first, however @tgravescs clarified it for me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-07 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r193842887
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -813,8 +813,14 @@ private[spark] class Client(
 if (pythonPath.nonEmpty) {
   val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
--- End diff --

good questions

- precedence: So right now you can work around this issue by exporting 
PYTHONPATH before you launch spark-submit, I think this is something that could 
just be in someone's env on the launcher box and might not be what you want in 
a yarn container.  I would think that specifying explicit pythonpath via 
spark.yarn.appMasterEnv would take precedence over that since you explicitly 
configured. Now the second question is where that fails with the py-files and 
that one isn't as clear to me since like you said its explicitly specified.
Maybe we do py-files then spark.yarn.appMasterEnv.PYTHONPATH and then last env 
PYTHONPATH.  that is different from the way it is now though. thoughts?

- agree this should not be reflected in the executors so if it is we 
shouldn't do that. We should make sure the spark. executorEnv.PYTHONPATH works


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-06-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21468#discussion_r192857531
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -813,8 +813,14 @@ private[spark] class Client(
 if (pythonPath.nonEmpty) {
   val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
 .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-  env("PYTHONPATH") = pythonPathStr
-  sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+  val newValue =
--- End diff --

You could just say `env.get("PYTHONPATH") ++=: pythonPath` before turning 
the list into a string.

But there's also two extra questions here:
- precedence; should the env come before or after the files added with 
`py-files`? I kinda think after makes more sense, since files are generally 
provided in the command line.
- should `appMasterEnv` be reflected in executors? With your code it is. 
I'm not so sure it should.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-05-31 Thread pgandhi999
GitHub user pgandhi999 opened a pull request:

https://github.com/apache/spark/pull/21468

[SPARK-22151] : PYTHONPATH not picked up from the spark.yarn.appMaste…

…rEnv properly

Running in yarn cluster mode and trying to set pythonpath via 
spark.yarn.appMasterEnv.PYTHONPATH doesn't work.

the yarn Client code looks at the env variables:
val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
But when you set spark.yarn.appMasterEnv it puts it into the local env.

So the python path set in spark.yarn.appMasterEnv isn't properly set.

You can work around if you are running in cluster mode by setting it on the 
client like:

PYTHONPATH=./addon/python/ spark-submit

## What changes were proposed in this pull request?
In Client.scala, PYTHONPATH was being overridden, so changed code to append 
values to PYTHONPATH instead of overriding them.

## How was this patch tested?
Added log statements to ApplicationMaster.scala to check for environment 
variable PYTHONPATH, ran a spark job in cluster mode before the change and 
verified the issue. Performed the same test after the change and verified the 
fix. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pgandhi999/spark SPARK-22151

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21468


commit 0aee8faad9cb60721b153c9bc2187f87a4036b9e
Author: pgandhi 
Date:   2018-05-31T14:36:13Z

[SPARK-22151] : PYTHONPATH not picked up from the spark.yarn.appMasterEnv 
properly

Running in yarn cluster mode and trying to set pythonpath via 
spark.yarn.appMasterEnv.PYTHONPATH doesn't work.

the yarn Client code looks at the env variables:
val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
But when you set spark.yarn.appMasterEnv it puts it into the local env.

So the python path set in spark.yarn.appMasterEnv isn't properly set.

You can work around if you are running in cluster mode by setting it on the 
client like:

PYTHONPATH=./addon/python/ spark-submit

In Client.scala, PYTHONPATH was being overridden, so changed code to append 
values to PYTHONPATH




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org