Well,
somehow I managed to send an email as Roman Shaposhnik :) I guess he got
himself a
stacker ;)
Cos
On Mon, Jan 06, 2014 at 04:55PM, Roman Shaposhnik wrote:
> Alex,
>
> I don't know if it helps or not but sometimes back I made maven assembly to be
> able to package Spark in Bigtop. That assembly exclude all hadoop
> dependencies. So, you can simply build it using maven, instead of sbt.
>
> Regards,
> Cos
>
> On Mon, Jan 06, 2014 at 02:33PM, Alex Cozzi wrote:
> > I am trying to exclude the hadoop jar dependencies from spark’s assembly
> > files, the reason being that in order to work on our cluster it is
> > necessary to use our now version of those files instead of the published
> > ones. I tried define the hadoop dependencies as “provided”, but
> > surpassingly this causes compilation errors in the build. Just to be clear,
> > I modified the sbt build file
> > as follows:
> >
> > def yarnEnabledSettings = Seq(
> > libraryDependencies ++= Seq(
> > // Exclude rule required for all ?
> > "org.apache.hadoop" % "hadoop-client" % hadoopVersion % "provided"
> > excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib),
> > "org.apache.hadoop" % "hadoop-yarn-api" % hadoopVersion % "provided"
> > excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib),
> > "org.apache.hadoop" % "hadoop-yarn-common" % hadoopVersion %
> > "provided" excludeAll(excludeJackson, excludeNetty, excludeAsm,
> > excludeCglib),
> > "org.apache.hadoop" % "hadoop-yarn-client" % hadoopVersion %
> > "provided" excludeAll(excludeJackson, excludeNetty, excludeAsm,
> > excludeCglib)
> > )
> > )
> >
> > and compile as
> >
> > SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_IS_NEW_HADOOP=true sbt
> > assembly
> >
> >
> > but the assembly still includes the hadoop libraries, contrary to what the
> > assembly docs say. I managed to exclude them instead by using the
> > non-recommended way:
> > def extraAssemblySettings() = Seq(
> > test in assembly := {},
> > mergeStrategy in assembly := {
> > case m if m.toLowerCase.endsWith("manifest.mf") =>
> > MergeStrategy.discard
> > case m if m.toLowerCase.matches("meta-inf.*\\.sf$") =>
> > MergeStrategy.discard
> > case "log4j.properties" => MergeStrategy.discard
> > case m if m.toLowerCase.startsWith("meta-inf/services/") =>
> > MergeStrategy.filterDistinctLines
> > case "reference.conf" => MergeStrategy.concat
> > case _ => MergeStrategy.first
> > },
> > excludedJars in assembly <<= (fullClasspath in assembly) map { cp =>
> > cp filter {_.data.getName.contains("hadoop")}
> > }
> > )
> >
> >
> > But I would like to hear whether there is interest in excluding the hadoop
> > jar by default in the build
> > Alex Cozzi
> > [email protected]