Well,

somehow I managed to send an email as Roman Shaposhnik :) I guess he got 
himself a
stacker ;)

Cos

On Mon, Jan 06, 2014 at 04:55PM, Roman Shaposhnik wrote:
> Alex,
> 
> I don't know if it helps or not but sometimes back I made maven assembly to be
> able to package Spark in Bigtop. That assembly exclude all hadoop
> dependencies. So, you can simply build it using maven, instead of sbt.
> 
> Regards,
>   Cos
> 
> On Mon, Jan 06, 2014 at 02:33PM, Alex Cozzi wrote:
> > I am trying to exclude the hadoop jar dependencies from spark’s assembly 
> > files, the reason being that in order to work on our cluster it is 
> > necessary to use our now version of those files instead of the published 
> > ones. I tried define the hadoop dependencies as “provided”, but 
> > surpassingly this causes compilation errors in the build. Just to be clear, 
> > I modified the sbt build file 
> > as follows:
> > 
> >   def yarnEnabledSettings = Seq(
> >     libraryDependencies ++= Seq(
> >       // Exclude rule required for all ?
> >       "org.apache.hadoop" % "hadoop-client" % hadoopVersion  % "provided" 
> > excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib),
> >       "org.apache.hadoop" % "hadoop-yarn-api" % hadoopVersion  % "provided" 
> > excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib),
> >       "org.apache.hadoop" % "hadoop-yarn-common" % hadoopVersion  % 
> > "provided" excludeAll(excludeJackson, excludeNetty, excludeAsm, 
> > excludeCglib),
> >       "org.apache.hadoop" % "hadoop-yarn-client" % hadoopVersion  % 
> > "provided" excludeAll(excludeJackson, excludeNetty, excludeAsm, 
> > excludeCglib)
> >     )
> >   )
> > 
> > and compile as 
> > 
> >  SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_IS_NEW_HADOOP=true sbt  
> > assembly
> > 
> > 
> > but the assembly still includes the hadoop libraries, contrary to what the 
> > assembly docs say. I managed to exclude them instead by using the 
> > non-recommended way:
> > def extraAssemblySettings() = Seq(
> >     test in assembly := {},
> >     mergeStrategy in assembly := {
> >       case m if m.toLowerCase.endsWith("manifest.mf") => 
> > MergeStrategy.discard
> >       case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => 
> > MergeStrategy.discard
> >       case "log4j.properties" => MergeStrategy.discard
> >       case m if m.toLowerCase.startsWith("meta-inf/services/") => 
> > MergeStrategy.filterDistinctLines
> >       case "reference.conf" => MergeStrategy.concat
> >       case _ => MergeStrategy.first
> >     },
> >     excludedJars in assembly <<= (fullClasspath in assembly) map { cp => 
> >      cp filter {_.data.getName.contains("hadoop")}
> >     }
> > )
> > 
> > 
> > But I would like to hear whether there is interest in excluding the hadoop 
> > jar by default in the build
> > Alex Cozzi
> > [email protected]

Reply via email to