Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2232#issuecomment-56091918 Hi @tgravescs, I believe every point of the behavior you listed is correct and preserved in this PR, since it only affects `spark.yarn.dist.*` and these are resolved to `file://` if relative paths are provided. Then it seems OK to keep the `spark.yarn.dist.*` in the list of things to resolve in `SparkSubmit`. In addition, here's a tangential clarification question: Isn't setting SPARK_YARN_DIST_* meaningless in cluster mode, because the driver is launched on one of the slave nodes and the resources specified here should already be visible to the executors, which are launched on the same nodes? If so, it will simplify things if we always treat the paths specified through these variables as `hdfs://` paths regardless of the deploy mode. I believe we currently already do this in [ClientArguments](https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala), where we distinguish between client and cluster mode only in the comment.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org