To perhaps restate what some have said, Maven is by far the most common
build tool for the Hadoop / JVM data ecosystem.  While Maven is less pretty
than SBT, expertise in it is abundant.  SBT requires contributors to
projects in the ecosystem to learn yet another tool.  If we think of Spark
as a project in that ecosystem that happens to be in Scala, as opposed to a
Scala project that happens to be part of that ecosystem, Maven seems like
the better choice to me.

On a CDH-specific note, in building CDH, one of the reasons Maven is
helpful to us is that it makes it easy to harmonize dependency versions
across projects.  We modify project poms to include the "CDH" pom as a root
pom, allowing each project to reference variables defined in the root pom
like ${cdh.slf4j.version}.  Is there a way to make an SBT project inherit
from a Maven project that would allow this kind of thing?

-Sandy


On Tue, Feb 25, 2014 at 4:23 PM, Evan Chan <e...@ooyala.com> wrote:

> Hi Patrick,
>
> If you include shaded dependencies inside of the main Spark jar, such
> that it would have combined classes from all dependencies, wouldn't
> you end up with a sub-assembly jar?  It would be dangerous in that
> since it is a single unit, it would break normal packaging assumptions
> that the jar only contains its own classes, and maven/sbt/ivy/etc is
> used to resolve the remaining deps.... but maybe I don't know what you
> mean.
>
> The shader plugin in maven is apparently used to
> 1) build uber jars  - this is the part that sbt-assembly also does
> 2) "shade" existing jars, ie rename the classes and rewrite bytecode
> depending on them such that it doesn't conflict with other jars having
> the same classes  -- this is something sbt-assembly doesn't do, which
> you point out is done manually.
>
>
>
> On Tue, Feb 25, 2014 at 4:09 PM, Patrick Wendell <pwend...@gmail.com>
> wrote:
> > What I mean is this. AFIAK the shader plug-in is primarily designed
> > for creating uber jars which contain spark and all dependencies. But
> > since Spark is something people depend on in Maven, what I actually
> > want is to create the normal old Spark jar [1], but then include
> > shaded versions of some of our dependencies inside of it. Not sure if
> > that's even possible.
> >
> > The way we do shading now is we manually publish shaded versions of
> > some dependencies to maven central as their own artifacts.
> >
> >
> http://search.maven.org/remotecontent?filepath=org/apache/spark/spark-core_2.10/0.9.0-incubating/spark-core_2.10-0.9.0-incubating.jar
> >
> > On Tue, Feb 25, 2014 at 4:04 PM, Evan Chan <e...@ooyala.com> wrote:
> >> Patrick -- not sure I understand your request, do you mean
> >> - somehow creating a shaded jar (eg with maven shader plugin)
> >> - then including it in the spark jar (which would then be an assembly)?
> >>
> >> On Tue, Feb 25, 2014 at 4:01 PM, Patrick Wendell <pwend...@gmail.com>
> wrote:
> >>> Evan - this is a good thing to bring up. Wrt the shader plug-in -
> >>> right now we don't actually use it for bytecode shading - we simply
> >>> use it for creating the uber jar with excludes (which sbt supports
> >>> just fine via assembly).
> >>>
> >>> I was wondering actually, do you know if it's possible to added shaded
> >>> artifacts to the *spark jar* using this plug-in (e.g. not an uber
> >>> jar)? That's something I could see being really handy in the future.
> >>>
> >>> - Patrick
> >>>
> >>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <e...@ooyala.com> wrote:
> >>>> The problem is that plugins are not equivalent.  There is AFAIK no
> >>>> equivalent to the maven shader plugin for SBT.
> >>>> There is an SBT plugin which can apparently read POM XML files
> >>>> (sbt-pom-reader).   However, it can't possibly handle plugins, which
> >>>> is still problematic.
> >>>>
> >>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com> wrote:
> >>>>> I would prefer keep both of them, it would be better even if that
> means
> >>>>> pom.xml will be generated using sbt. Some company, like my current
> one,
> >>>>> have their own build infrastructures built on top of maven. It is
> not easy
> >>>>> to support sbt for these potential spark clients. But I do agree to
> only
> >>>>> keep one if there is a promising way to generate correct
> configuration from
> >>>>> the other.
> >>>>>
> >>>>> -Shengzhe
> >>>>>
> >>>>>
> >>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> wrote:
> >>>>>
> >>>>>> The correct way to exclude dependencies in SBT is actually to
> declare
> >>>>>> a dependency as "provided".   I'm not familiar with Maven or its
> >>>>>> dependencySet, but provided will mark the entire dependency tree as
> >>>>>> excluded.   It is also possible to exclude jar by jar, but this is
> >>>>>> pretty error prone and messy.
> >>>>>>
> >>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <ko...@tresata.com>
> wrote:
> >>>>>> > yes in sbt assembly you can exclude jars (although i never had a
> need for
> >>>>>> > this) and files in jars.
> >>>>>> >
> >>>>>> > for example i frequently remove log4j.properties, because for
> whatever
> >>>>>> > reason hadoop decided to include it making it very difficult to
> use our
> >>>>>> own
> >>>>>> > logging config.
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <
> c...@apache.org>
> >>>>>> wrote:
> >>>>>> >
> >>>>>> >> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote:
> >>>>>> >> > Kos - thanks for chiming in. Could you be more specific about
> what is
> >>>>>> >> > available in maven and not in sbt for these issues? I took a
> look at
> >>>>>> >> > the bigtop code relating to Spark. As far as I could tell [1]
> was the
> >>>>>> >> > main point of integration with the build system (maybe there
> are other
> >>>>>> >> > integration points)?
> >>>>>> >> >
> >>>>>> >> > >   - in order to integrate Spark well into existing Hadoop
> stack it
> >>>>>> was
> >>>>>> >> > >     necessary to have a way to avoid transitive dependencies
> >>>>>> >> duplications and
> >>>>>> >> > >     possible conflicts.
> >>>>>> >> > >
> >>>>>> >> > >     E.g. Maven assembly allows us to avoid adding _all_
> Hadoop libs
> >>>>>> >> and later
> >>>>>> >> > >     merely declare Spark package dependency on standard
> Bigtop
> >>>>>> Hadoop
> >>>>>> >> > >     packages. And yes - Bigtop packaging means the naming
> and layout
> >>>>>> >> would be
> >>>>>> >> > >     standard across all commercial Hadoop distributions that
> are
> >>>>>> worth
> >>>>>> >> > >     mentioning: ASF Bigtop convenience binary packages, and
> >>>>>> Cloudera or
> >>>>>> >> > >     Hortonworks packages. Hence, the downstream user doesn't
> need to
> >>>>>> >> spend any
> >>>>>> >> > >     effort to make sure that Spark "clicks-in" properly.
> >>>>>> >> >
> >>>>>> >> > The sbt build also allows you to plug in a Hadoop version
> similar to
> >>>>>> >> > the maven build.
> >>>>>> >>
> >>>>>> >> I am actually talking about an ability to exclude a set of
> dependencies
> >>>>>> >> from an
> >>>>>> >> assembly, similarly to what's happening in dependencySet
> sections of
> >>>>>> >>     assembly/src/main/assembly/assembly.xml
> >>>>>> >> If there is a comparable functionality in Sbt, that would help
> quite a
> >>>>>> bit,
> >>>>>> >> apparently.
> >>>>>> >>
> >>>>>> >> Cos
> >>>>>> >>
> >>>>>> >> > >   - Maven provides a relatively easy way to deal with the
> jar-hell
> >>>>>> >> problem,
> >>>>>> >> > >     although the original maven build was just Shader'ing
> everything
> >>>>>> >> into a
> >>>>>> >> > >     huge lump of class files. Oftentimes ending up with
> classes
> >>>>>> >> slamming on
> >>>>>> >> > >     top of each other from different transitive dependencies.
> >>>>>> >> >
> >>>>>> >> > AFIAK we are only using the shade plug-in to deal with conflict
> >>>>>> >> > resolution in the assembly jar. These are dealt with in sbt
> via the
> >>>>>> >> > sbt assembly plug-in in an identical way. Is there a
> difference?
> >>>>>> >>
> >>>>>> >> I am bringing up the Sharder, because it is an awful hack, which
> is
> >>>>>> can't
> >>>>>> >> be
> >>>>>> >> used in real controlled deployment.
> >>>>>> >>
> >>>>>> >> Cos
> >>>>>> >>
> >>>>>> >> > [1]
> >>>>>> >>
> >>>>>>
> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master
> >>>>>> >>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> --
> >>>>>> Evan Chan
> >>>>>> Staff Engineer
> >>>>>> e...@ooyala.com  |
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> --
> >>>> Evan Chan
> >>>> Staff Engineer
> >>>> e...@ooyala.com  |
> >>
> >>
> >>
> >> --
> >> --
> >> Evan Chan
> >> Staff Engineer
> >> e...@ooyala.com  |
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> e...@ooyala.com  |
>

Reply via email to