@Sandy Yes, in sbt with multiple projects setup, you can easily set a variable in the build.scala and reference the version number from all dependent projects .
Regarding mix of java and scala projects, in my workplace , we have both java and scala codes. The sbt can be used to build both with the same build.scala. We have being use this setup for last 6 months. The build includes different versions of Hadoop as well as spark. Hope this helps Chester Sent from my iPhone On Feb 25, 2014, at 4:36 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > To perhaps restate what some have said, Maven is by far the most common > build tool for the Hadoop / JVM data ecosystem. While Maven is less pretty > than SBT, expertise in it is abundant. SBT requires contributors to > projects in the ecosystem to learn yet another tool. If we think of Spark > as a project in that ecosystem that happens to be in Scala, as opposed to a > Scala project that happens to be part of that ecosystem, Maven seems like > the better choice to me. > > On a CDH-specific note, in building CDH, one of the reasons Maven is > helpful to us is that it makes it easy to harmonize dependency versions > across projects. We modify project poms to include the "CDH" pom as a root > pom, allowing each project to reference variables defined in the root pom > like ${cdh.slf4j.version}. Is there a way to make an SBT project inherit > from a Maven project that would allow this kind of thing? > > -Sandy > > > On Tue, Feb 25, 2014 at 4:23 PM, Evan Chan <e...@ooyala.com> wrote: > >> Hi Patrick, >> >> If you include shaded dependencies inside of the main Spark jar, such >> that it would have combined classes from all dependencies, wouldn't >> you end up with a sub-assembly jar? It would be dangerous in that >> since it is a single unit, it would break normal packaging assumptions >> that the jar only contains its own classes, and maven/sbt/ivy/etc is >> used to resolve the remaining deps.... but maybe I don't know what you >> mean. >> >> The shader plugin in maven is apparently used to >> 1) build uber jars - this is the part that sbt-assembly also does >> 2) "shade" existing jars, ie rename the classes and rewrite bytecode >> depending on them such that it doesn't conflict with other jars having >> the same classes -- this is something sbt-assembly doesn't do, which >> you point out is done manually. >> >> >> >> On Tue, Feb 25, 2014 at 4:09 PM, Patrick Wendell <pwend...@gmail.com> >> wrote: >>> What I mean is this. AFIAK the shader plug-in is primarily designed >>> for creating uber jars which contain spark and all dependencies. But >>> since Spark is something people depend on in Maven, what I actually >>> want is to create the normal old Spark jar [1], but then include >>> shaded versions of some of our dependencies inside of it. Not sure if >>> that's even possible. >>> >>> The way we do shading now is we manually publish shaded versions of >>> some dependencies to maven central as their own artifacts. >> http://search.maven.org/remotecontent?filepath=org/apache/spark/spark-core_2.10/0.9.0-incubating/spark-core_2.10-0.9.0-incubating.jar >>> >>> On Tue, Feb 25, 2014 at 4:04 PM, Evan Chan <e...@ooyala.com> wrote: >>>> Patrick -- not sure I understand your request, do you mean >>>> - somehow creating a shaded jar (eg with maven shader plugin) >>>> - then including it in the spark jar (which would then be an assembly)? >>>> >>>> On Tue, Feb 25, 2014 at 4:01 PM, Patrick Wendell <pwend...@gmail.com> >> wrote: >>>>> Evan - this is a good thing to bring up. Wrt the shader plug-in - >>>>> right now we don't actually use it for bytecode shading - we simply >>>>> use it for creating the uber jar with excludes (which sbt supports >>>>> just fine via assembly). >>>>> >>>>> I was wondering actually, do you know if it's possible to added shaded >>>>> artifacts to the *spark jar* using this plug-in (e.g. not an uber >>>>> jar)? That's something I could see being really handy in the future. >>>>> >>>>> - Patrick >>>>> >>>>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <e...@ooyala.com> wrote: >>>>>> The problem is that plugins are not equivalent. There is AFAIK no >>>>>> equivalent to the maven shader plugin for SBT. >>>>>> There is an SBT plugin which can apparently read POM XML files >>>>>> (sbt-pom-reader). However, it can't possibly handle plugins, which >>>>>> is still problematic. >>>>>> >>>>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com> wrote: >>>>>>> I would prefer keep both of them, it would be better even if that >> means >>>>>>> pom.xml will be generated using sbt. Some company, like my current >> one, >>>>>>> have their own build infrastructures built on top of maven. It is >> not easy >>>>>>> to support sbt for these potential spark clients. But I do agree to >> only >>>>>>> keep one if there is a promising way to generate correct >> configuration from >>>>>>> the other. >>>>>>> >>>>>>> -Shengzhe >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> wrote: >>>>>>> >>>>>>>> The correct way to exclude dependencies in SBT is actually to >> declare >>>>>>>> a dependency as "provided". I'm not familiar with Maven or its >>>>>>>> dependencySet, but provided will mark the entire dependency tree as >>>>>>>> excluded. It is also possible to exclude jar by jar, but this is >>>>>>>> pretty error prone and messy. >>>>>>>> >>>>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <ko...@tresata.com> >> wrote: >>>>>>>>> yes in sbt assembly you can exclude jars (although i never had a >> need for >>>>>>>>> this) and files in jars. >>>>>>>>> >>>>>>>>> for example i frequently remove log4j.properties, because for >> whatever >>>>>>>>> reason hadoop decided to include it making it very difficult to >> use our >>>>>>>> own >>>>>>>>> logging config. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik < >> c...@apache.org> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote: >>>>>>>>>>> Kos - thanks for chiming in. Could you be more specific about >> what is >>>>>>>>>>> available in maven and not in sbt for these issues? I took a >> look at >>>>>>>>>>> the bigtop code relating to Spark. As far as I could tell [1] >> was the >>>>>>>>>>> main point of integration with the build system (maybe there >> are other >>>>>>>>>>> integration points)? >>>>>>>>>>> >>>>>>>>>>>> - in order to integrate Spark well into existing Hadoop >> stack it >>>>>>>> was >>>>>>>>>>>> necessary to have a way to avoid transitive dependencies >>>>>>>>>> duplications and >>>>>>>>>>>> possible conflicts. >>>>>>>>>>>> >>>>>>>>>>>> E.g. Maven assembly allows us to avoid adding _all_ >> Hadoop libs >>>>>>>>>> and later >>>>>>>>>>>> merely declare Spark package dependency on standard >> Bigtop >>>>>>>> Hadoop >>>>>>>>>>>> packages. And yes - Bigtop packaging means the naming >> and layout >>>>>>>>>> would be >>>>>>>>>>>> standard across all commercial Hadoop distributions that >> are >>>>>>>> worth >>>>>>>>>>>> mentioning: ASF Bigtop convenience binary packages, and >>>>>>>> Cloudera or >>>>>>>>>>>> Hortonworks packages. Hence, the downstream user doesn't >> need to >>>>>>>>>> spend any >>>>>>>>>>>> effort to make sure that Spark "clicks-in" properly. >>>>>>>>>>> >>>>>>>>>>> The sbt build also allows you to plug in a Hadoop version >> similar to >>>>>>>>>>> the maven build. >>>>>>>>>> >>>>>>>>>> I am actually talking about an ability to exclude a set of >> dependencies >>>>>>>>>> from an >>>>>>>>>> assembly, similarly to what's happening in dependencySet >> sections of >>>>>>>>>> assembly/src/main/assembly/assembly.xml >>>>>>>>>> If there is a comparable functionality in Sbt, that would help >> quite a >>>>>>>> bit, >>>>>>>>>> apparently. >>>>>>>>>> >>>>>>>>>> Cos >>>>>>>>>> >>>>>>>>>>>> - Maven provides a relatively easy way to deal with the >> jar-hell >>>>>>>>>> problem, >>>>>>>>>>>> although the original maven build was just Shader'ing >> everything >>>>>>>>>> into a >>>>>>>>>>>> huge lump of class files. Oftentimes ending up with >> classes >>>>>>>>>> slamming on >>>>>>>>>>>> top of each other from different transitive dependencies. >>>>>>>>>>> >>>>>>>>>>> AFIAK we are only using the shade plug-in to deal with conflict >>>>>>>>>>> resolution in the assembly jar. These are dealt with in sbt >> via the >>>>>>>>>>> sbt assembly plug-in in an identical way. Is there a >> difference? >>>>>>>>>> >>>>>>>>>> I am bringing up the Sharder, because it is an awful hack, which >> is >>>>>>>> can't >>>>>>>>>> be >>>>>>>>>> used in real controlled deployment. >>>>>>>>>> >>>>>>>>>> Cos >>>>>>>>>> >>>>>>>>>>> [1] >> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> -- >>>>>>>> Evan Chan >>>>>>>> Staff Engineer >>>>>>>> e...@ooyala.com | >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Evan Chan >>>>>> Staff Engineer >>>>>> e...@ooyala.com | >>>> >>>> >>>> >>>> -- >>>> -- >>>> Evan Chan >>>> Staff Engineer >>>> e...@ooyala.com | >> >> >> >> -- >> -- >> Evan Chan >> Staff Engineer >> e...@ooyala.com | >>