xkrogen commented on a change in pull request #32810:
URL: https://github.com/apache/spark/pull/32810#discussion_r650292946



##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1464,6 +1464,14 @@ private object Client extends Logging {
     (mainUri ++ secondaryUris).toArray
   }
 
+  /**
+   * Returns a list of local, absolute URLs representing the user classpath.
+   *
+   * @param conf Spark configuration.
+   */
+  def getUserClasspathUrls(conf: SparkConf): Array[URL] =
+    getUserClasspath(conf).map(entry => new URL("file:" + new 
File(entry.getPath).getAbsolutePath))

Review comment:
       Doing `new File(entry.getPath).getAbsolutePath)` will handle the 
relative paths by converting them to absolute. The values we get back from 
`getUserClasspath` are relative, like `./foo.jar`, and so the `getAbsolutePath` 
properly converts them to an absolute path. The snippet you linked uses the 
following:
   ```
   Client.buildPath(Environment.PWD.$(), uri.getPath())
   ```
   This is needed because it lives within `ExecutorRunnable`, which executes on 
the _driver_, but needs to resolve the working directory on the _executor_, so 
the environment variable is put into the path by the driver and later resolved 
in the executor process. In this case, we can just use standard Java IO/NIO 
mechanisms to make the path absolute, since the code is executing in the same 
process that will consume the path.
   
   However this made me realize I had a bug regarding `local` paths which use 
the `GATEWAY_ROOT_PATH`/`REPLACEMENT_ROOT_PATH` configs. Will explain further 
in the main thread.

##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1464,6 +1464,14 @@ private object Client extends Logging {
     (mainUri ++ secondaryUris).toArray
   }
 
+  /**
+   * Returns a list of local, absolute URLs representing the user classpath.
+   *
+   * @param conf Spark configuration.
+   */
+  def getUserClasspathUrls(conf: SparkConf): Array[URL] =
+    getUserClasspath(conf).map(entry => new URL("file:" + new 
File(entry.getPath).getAbsolutePath))

Review comment:
       Doing `new File(entry.getPath).getAbsolutePath)` will handle the 
relative paths by converting them to absolute. The values we get back from 
`getUserClasspath` are relative, like `./foo.jar`, and so the `getAbsolutePath` 
properly converts them to an absolute path. The snippet you linked uses the 
following:
   ```
   Client.buildPath(Environment.PWD.$(), uri.getPath())
   ```
   This is needed because it lives within `ExecutorRunnable`, which executes on 
the _driver_, but needs to resolve the working directory on the _executor_, so 
the environment variable is put into the path by the driver and later resolved 
in the executor process. In this case, we can just use standard Java IO/NIO 
mechanisms to make the path absolute, since the code is executing in the same 
process that will consume the path.
   
   However investigating this more deeply made me realize I had a bug regarding 
`local` paths which use the `GATEWAY_ROOT_PATH`/`REPLACEMENT_ROOT_PATH` 
configs. Will explain further in the main thread.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to