[GitHub] [spark] xkrogen commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

GitBox Tue, 24 Nov 2020 11:50:14 -0800


xkrogen commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r529837613




##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2980,6 +2980,75 @@ private[spark] object Utils extends Logging {
     metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars
+   */
+  def resolveMavenDependencies(uri: URI): String = {
+    val Seq(repositories, ivyRepoPath, ivySettingsPath) =
+      Seq(
+        "spark.jars.repositories",
+        "spark.jars.ivy",
+        "spark.jars.ivySettings"
+      ).map(sys.props.get(_).orNull)
+    // Create the IvySettings, either load from file or build defaults
+    val ivySettings = Option(ivySettingsPath) match {
+      case Some(path) =>
+        SparkSubmitUtils.loadIvySettings(path, Option(repositories), 
Option(ivyRepoPath))

Review comment:
       Let's not use default Ivy settings... In my experience with some custom 
logic we have, it's very valuable to ensure that all of the Ivy resolution 
obeys the settings.
   
   Maybe we can pull this out into a common utility that can be leveraged here 
and `DriverWrapper`? Then there is no need for testing twice. But I don't think 
saying that the logic is tested elsewhere then copy-pasted is sufficient -- if 
there is drift in the two code paths, we will lose the validation.

##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1348,6 +1348,7 @@ private[spark] object SparkSubmitUtils {
       coordinates: String,
       ivySettings: IvySettings,
       exclusions: Seq[String] = Nil,
+      transitive: Boolean = true,

Review comment:
       Nit: Add Scaladoc for this new parameter

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala
##########
@@ -159,6 +161,13 @@ class SessionResourceLoader(session: SparkSession) extends 
FunctionResourceLoade
     }
   }
 
+  protected def resolveJars(path: String): List[String] = {
+    new Path(path).toUri.getScheme match {

Review comment:
       Shouldn't we use `URI.create(path)` a single time, then re-use the URI 
in this line and the one below? I also think I remember you mentioning 
elsewhere that `new Path(path).toUri` can lose information.

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging {
     metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars
+   */
+  def resolveMavenDependencies(uri: URI): String = {
+    val Seq(repositories, ivyRepoPath, ivySettingsPath) =
+      Seq(
+        "spark.jars.repositories",
+        "spark.jars.ivy",
+        "spark.jars.ivySettings"
+      ).map(sys.props.get(_).orNull)
+    // Create the IvySettings, either load from file or build defaults
+    val ivySettings = Option(ivySettingsPath) match {
+      case Some(path) =>
+        SparkSubmitUtils.loadIvySettings(path, Option(repositories), 
Option(ivyRepoPath))
+
+      case None =>
+        SparkSubmitUtils.buildIvySettings(Option(repositories), 
Option(ivyRepoPath))
+    }
+    SparkSubmitUtils.resolveMavenCoordinates(uri.getAuthority, ivySettings,
+      parseExcludeList(uri.getQuery), parseTransitive(uri.getQuery))
+  }
+
+  private def parseURLQueryParameter(queryString: String, queryTag: String): 
Array[String] = {
+    if (queryString == null || queryString.isEmpty) {
+      Array.empty[String]
+    } else {
+      val mapTokens = queryString.split("&")
+      assert(mapTokens.forall(_.split("=").length == 2), "Invalid query 
string: " + queryString)

Review comment:
       IIRC this will accept URLs that looks like `?=foo`, `?foo=`, or 
`?bar=&baz=foo`. It would be good to add tests for this to confirm, and adjust 
as necessary.
   
   Same comment for `parseExcludeList` below.

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging {
     metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars

Review comment:
       Should we be returning a `List[String]` instead of `String` (here and in 
`SparkSubmitUtils`)? It seems odd to have `SparkSubmitUtils` do a `mkString` to 
convert a list to string, then re-convert back to a list later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xkrogen commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

Reply via email to