spark git commit: [SPARK-7841][BUILD] Stop using retrieveManaged to retrieve dependencies in SBT

marmbrus Tue, 10 Nov 2015 10:14:43 -0800

Repository: spark
Updated Branches:
  refs/heads/master a81f47ff7 -> 689386b1c



[SPARK-7841][BUILD] Stop using retrieveManaged to retrieve dependencies in SBT

This patch modifies Spark's SBT build so that it no longer uses 
`retrieveManaged` / `lib_managed` to store its dependencies. The motivations 
for this change are nicely described on the JIRA ticket 
([SPARK-7841](https://issues.apache.org/jira/browse/SPARK-7841)); my personal 
interest in doing this stems from the fact that `lib_managed` has caused me 
some pain while debugging dependency issues in another PR of mine.

Removing our use of `lib_managed` would be trivial except for one snag: the 
Datanucleus JARs, required by Spark SQL's Hive integration, cannot be included 
in assembly JARs due to problems with merging OSGI `plugin.xml` files. As a 
result, several places in the packaging and deployment pipeline assume that 
these Datanucleus JARs are copied to `lib_managed/jars`. In the interest of 
maintaining compatibility, I have chosen to retain the `lib_managed/jars` 
directory _only_ for these Datanucleus JARs and have added custom code to 
`SparkBuild.scala` to automatically copy those JARs to that folder as part of 
the `assembly` task.

`dev/mima` also depended on `lib_managed` in a hacky way in order to set 
classpaths when generating MiMa excludes; I've updated this to obtain the 
classpaths directly from SBT instead.

/cc dragos marmbrus pwendell srowen

Author: Josh Rosen <joshro...@databricks.com>

Closes #9575 from JoshRosen/SPARK-7841.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/689386b1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/689386b1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/689386b1

Branch: refs/heads/master
Commit: 689386b1c60997e4505749915f7005a52c207de2
Parents: a81f47f
Author: Josh Rosen <joshro...@databricks.com>
Authored: Tue Nov 10 10:14:19 2015 -0800
Committer: Michael Armbrust <mich...@databricks.com>
Committed: Tue Nov 10 10:14:19 2015 -0800

----------------------------------------------------------------------
 dev/mima                 |  2 +-
 project/SparkBuild.scala | 22 +++++++++++++++++-----
 2 files changed, 18 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/689386b1/dev/mima
----------------------------------------------------------------------
diff --git a/dev/mima b/dev/mima
index 2952fa6..d5baffc 100755
--- a/dev/mima
+++ b/dev/mima
@@ -38,7 +38,7 @@ generate_mima_ignore() {
 # it did not process the new classes (which are in assembly jar).
 generate_mima_ignore
 
-export SPARK_CLASSPATH="`find lib_managed \( -name '*spark*jar' -a -type f \) 
| tr "\\n" ":"`"
+export SPARK_CLASSPATH="$(build/sbt "export oldDeps/fullClasspath" | tail -n1)"
 echo "SPARK_CLASSPATH=$SPARK_CLASSPATH"
 
 generate_mima_ignore

http://git-wip-us.apache.org/repos/asf/spark/blob/689386b1/project/SparkBuild.scala
----------------------------------------------------------------------
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index b75ed13..a9fb741 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -16,6 +16,7 @@
  */
 
 import java.io._
+import java.nio.file.Files
 
 import scala.util.Properties
 import scala.collection.JavaConverters._
@@ -135,8 +136,6 @@ object SparkBuild extends PomBuild {
       .orElse(sys.props.get("java.home").map { p => new 
File(p).getParentFile().getAbsolutePath() })
       .map(file),
     incOptions := incOptions.value.withNameHashing(true),
-    retrieveManaged := true,
-    retrievePattern := "[type]s/[artifact](-[revision])(-[classifier]).[ext]",
     publishMavenStyle := true,
     unidocGenjavadocVersion := "0.9-spark0",
 
@@ -326,8 +325,6 @@ object OldDeps {
   def oldDepsSettings() = Defaults.coreDefaultSettings ++ Seq(
     name := "old-deps",
     scalaVersion := "2.10.5",
-    retrieveManaged := true,
-    retrievePattern := "[type]s/[artifact](-[revision])(-[classifier]).[ext]",
     libraryDependencies := Seq("spark-streaming-mqtt", 
"spark-streaming-zeromq",
       "spark-streaming-flume", "spark-streaming-kafka", 
"spark-streaming-twitter",
       "spark-streaming", "spark-mllib", "spark-bagel", "spark-graphx",
@@ -404,6 +401,8 @@ object Assembly {
 
   val hadoopVersion = taskKey[String]("The version of hadoop that spark is 
compiled against.")
 
+  val deployDatanucleusJars = taskKey[Unit]("Deploy datanucleus jars to the 
spark/lib_managed/jars directory")
+
   lazy val settings = assemblySettings ++ Seq(
     test in assembly := {},
     hadoopVersion := {
@@ -429,7 +428,20 @@ object Assembly {
       case m if m.toLowerCase.startsWith("meta-inf/services/") => 
MergeStrategy.filterDistinctLines
       case "reference.conf"                                    => 
MergeStrategy.concat
       case _                                                   => 
MergeStrategy.first
-    }
+    },
+    deployDatanucleusJars := {
+      val jars: Seq[File] = (fullClasspath in assembly).value.map(_.data)
+        .filter(_.getPath.contains("org.datanucleus"))
+      var libManagedJars = new File(BuildCommons.sparkHome, "lib_managed/jars")
+      libManagedJars.mkdirs()
+      jars.foreach { jar =>
+        val dest = new File(libManagedJars, jar.getName)
+        if (!dest.exists()) {
+          Files.copy(jar.toPath, dest.toPath)
+        }
+      }
+    },
+    assembly <<= assembly.dependsOn(deployDatanucleusJars)
   )
 }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-7841][BUILD] Stop using retrieveManaged to retrieve dependencies in SBT

Reply via email to