xkrogen commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r529837613
########## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ########## @@ -2980,6 +2980,75 @@ private[spark] object Utils extends Logging { metadata.toString } + /** + * Download Ivy URIs dependent jars. + * + * @param uri Ivy uri need to be downloaded. + * @return Comma separated string list of URIs of downloaded jars + */ + def resolveMavenDependencies(uri: URI): String = { + val Seq(repositories, ivyRepoPath, ivySettingsPath) = + Seq( + "spark.jars.repositories", + "spark.jars.ivy", + "spark.jars.ivySettings" + ).map(sys.props.get(_).orNull) + // Create the IvySettings, either load from file or build defaults + val ivySettings = Option(ivySettingsPath) match { + case Some(path) => + SparkSubmitUtils.loadIvySettings(path, Option(repositories), Option(ivyRepoPath)) Review comment: Let's not use default Ivy settings... In my experience with some custom logic we have, it's very valuable to ensure that all of the Ivy resolution obeys the settings. Maybe we can pull this out into a common utility that can be leveraged here and `DriverWrapper`? Then there is no need for testing twice. But I don't think saying that the logic is tested elsewhere then copy-pasted is sufficient -- if there is drift in the two code paths, we will lose the validation. ########## File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ########## @@ -1348,6 +1348,7 @@ private[spark] object SparkSubmitUtils { coordinates: String, ivySettings: IvySettings, exclusions: Seq[String] = Nil, + transitive: Boolean = true, Review comment: Nit: Add Scaladoc for this new parameter ########## File path: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ########## @@ -159,6 +161,13 @@ class SessionResourceLoader(session: SparkSession) extends FunctionResourceLoade } } + protected def resolveJars(path: String): List[String] = { + new Path(path).toUri.getScheme match { Review comment: Shouldn't we use `URI.create(path)` a single time, then re-use the URI in this line and the one below? I also think I remember you mentioning elsewhere that `new Path(path).toUri` can lose information. ########## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ########## @@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging { metadata.toString } + /** + * Download Ivy URIs dependent jars. + * + * @param uri Ivy uri need to be downloaded. + * @return Comma separated string list of URIs of downloaded jars + */ + def resolveMavenDependencies(uri: URI): String = { + val Seq(repositories, ivyRepoPath, ivySettingsPath) = + Seq( + "spark.jars.repositories", + "spark.jars.ivy", + "spark.jars.ivySettings" + ).map(sys.props.get(_).orNull) + // Create the IvySettings, either load from file or build defaults + val ivySettings = Option(ivySettingsPath) match { + case Some(path) => + SparkSubmitUtils.loadIvySettings(path, Option(repositories), Option(ivyRepoPath)) + + case None => + SparkSubmitUtils.buildIvySettings(Option(repositories), Option(ivyRepoPath)) + } + SparkSubmitUtils.resolveMavenCoordinates(uri.getAuthority, ivySettings, + parseExcludeList(uri.getQuery), parseTransitive(uri.getQuery)) + } + + private def parseURLQueryParameter(queryString: String, queryTag: String): Array[String] = { + if (queryString == null || queryString.isEmpty) { + Array.empty[String] + } else { + val mapTokens = queryString.split("&") + assert(mapTokens.forall(_.split("=").length == 2), "Invalid query string: " + queryString) Review comment: IIRC this will accept URLs that looks like `?=foo`, `?foo=`, or `?bar=&baz=foo`. It would be good to add tests for this to confirm, and adjust as necessary. Same comment for `parseExcludeList` below. ########## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ########## @@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging { metadata.toString } + /** + * Download Ivy URIs dependent jars. + * + * @param uri Ivy uri need to be downloaded. + * @return Comma separated string list of URIs of downloaded jars Review comment: Should we be returning a `List[String]` instead of `String` (here and in `SparkSubmitUtils`)? It seems odd to have `SparkSubmitUtils` do a `mkString` to convert a list to string, then re-convert back to a list later. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org