xkrogen commented on a change in pull request #32810: URL: https://github.com/apache/spark/pull/32810#discussion_r650292946
########## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ########## @@ -1464,6 +1464,14 @@ private object Client extends Logging { (mainUri ++ secondaryUris).toArray } + /** + * Returns a list of local, absolute URLs representing the user classpath. + * + * @param conf Spark configuration. + */ + def getUserClasspathUrls(conf: SparkConf): Array[URL] = + getUserClasspath(conf).map(entry => new URL("file:" + new File(entry.getPath).getAbsolutePath)) Review comment: Doing `new File(entry.getPath).getAbsolutePath)` will handle the relative paths by converting them to absolute. The values we get back from `getUserClasspath` are relative, like `./foo.jar`, and so the `getAbsolutePath` properly converts them to an absolute path. The snippet you linked uses the following: ``` Client.buildPath(Environment.PWD.$(), uri.getPath()) ``` This is needed because it lives within `ExecutorRunnable`, which executes on the _driver_, but needs to resolve the working directory on the _executor_, so the environment variable is put into the path by the driver and later resolved in the executor process. In this case, we can just use standard Java IO/NIO mechanisms to make the path absolute, since the code is executing in the same process that will consume the path. However this made me realize I had a bug regarding `local` paths which use the `GATEWAY_ROOT_PATH`/`REPLACEMENT_ROOT_PATH` configs. Will explain further in the main thread. ########## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ########## @@ -1464,6 +1464,14 @@ private object Client extends Logging { (mainUri ++ secondaryUris).toArray } + /** + * Returns a list of local, absolute URLs representing the user classpath. + * + * @param conf Spark configuration. + */ + def getUserClasspathUrls(conf: SparkConf): Array[URL] = + getUserClasspath(conf).map(entry => new URL("file:" + new File(entry.getPath).getAbsolutePath)) Review comment: Doing `new File(entry.getPath).getAbsolutePath)` will handle the relative paths by converting them to absolute. The values we get back from `getUserClasspath` are relative, like `./foo.jar`, and so the `getAbsolutePath` properly converts them to an absolute path. The snippet you linked uses the following: ``` Client.buildPath(Environment.PWD.$(), uri.getPath()) ``` This is needed because it lives within `ExecutorRunnable`, which executes on the _driver_, but needs to resolve the working directory on the _executor_, so the environment variable is put into the path by the driver and later resolved in the executor process. In this case, we can just use standard Java IO/NIO mechanisms to make the path absolute, since the code is executing in the same process that will consume the path. However investigating this more deeply made me realize I had a bug regarding `local` paths which use the `GATEWAY_ROOT_PATH`/`REPLACEMENT_ROOT_PATH` configs. Will explain further in the main thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org