On Feb 26, 2014 11:12 PM, "Patrick Wendell" <pwend...@gmail.com> wrote:
> @mridul - As far as I know both Maven and Sbt use fairly similar
> processes for building the assembly/uber jar. We actually used to
> package spark with sbt and there were no specific issues we
> encountered and AFAIK sbt respects versioning of transitive
> dependencies correctly. Do you have a specific bug listing for sbt
> that indicates something is broken?

Slightly longish ...

The assembled jar, generated via sbt broke all over the place while I was
adding yarn support in 0.6 - and I had to fix sbt project a fair bit to get
it to work : we need the assembled jar to submit a yarn job.

When I finally submitted those changes to 0.7, it broke even more - since
dependencies changed : someone else had thankfully already added maven
support by then - which worked remarkably well out of the box (with some
minor tweaks) !

In theory, they might be expected to work the same, but practically they
did not : as I mentioned,  it must just have been luck that maven worked
that well; but given multiple past nasty experiences with sbt, and the fact
that it does not bring anything compelling or new in contrast, I am fairly
against the idea of using only sbt - inspite of maven being unintuitive at


> @sandy - It sounds like you are saying that the CDH build would be
> easier with Maven because you can inherit the POM. However, is this
> just a matter of convenience for packagers or would standardizing on
> sbt limit capabilities in some way? I assume that it would just mean a
> bit more manual work for packagers having to figure out how to set the
> hadoop version in SBT and exclude certain dependencies. For instance,
> what does CDH about other components like Impala that are not based on
> Maven at all?
> On Wed, Feb 26, 2014 at 9:31 AM, Evan Chan <e...@ooyala.com> wrote:
> > I'd like to propose the following way to move forward, based on the
> > comments I've seen:
> >
> > 1.  Aggressively clean up the giant dependency graph.   One ticket I
> > might work on if I have time is SPARK-681 which might remove the giant
> > fastutil dependency (~15MB by itself).
> >
> > 2.  Take an intermediate step by having only ONE source of truth
> > w.r.t. dependencies and versions.  This means either:
> >    a)  Using a maven POM as the spec for dependencies, Hadoop version,
> > etc.   Then, use sbt-pom-reader to import it.
> >    b)  Using the build.scala as the spec, and "sbt make-pom" to
> > generate the pom.xml for the dependencies
> >
> >     The idea is to remove the pain and errors associated with manual
> > translation of dependency specs from one system to another, while
> > still maintaining the things which are hard to translate (plugins).
> >
> >
> > On Wed, Feb 26, 2014 at 7:17 AM, Koert Kuipers <ko...@tresata.com>
> >> We maintain in house spark build using sbt. We have no problem using
> >> assembly. We did add a few exclude statements for transitive
> >>
> >> The main enemy of assemblies are jars that include stuff they shouldn't
> >> (kryo comes to mind, I think they include logback?), new versions of
> >> that change the provider/artifact without changing the package (asm),
> >> incompatible new releases (protobuf). These break the transitive
> >> process. I imagine that's true for any build tool.
> >>
> >> Besides shading I don't see anything maven can do sbt cannot, and if I
> >> understand it correctly shading is not done currently using the build
> >>
> >> Since spark is primarily scala/akka based the main developer base will
> >> familiar with sbt (I think?). Switching build tool is always painful. I
> >> personally think it is smarter to put this burden on a limited number
> >> upstream integrators than on the community. However that said I don't
> >> its a problem for us to maintain an sbt build in-house if spark
switched to
> >> maven.
> >> The problem is, the complete spark dependency graph is fairly large,
> >> and there are lot of conflicting versions in there.
> >> In particular, when we bump versions of dependencies - making managing
> >> this messy at best.
> >>
> >> Now, I have not looked in detail at how maven manages this - it might
> >> just be accidental that we get a decent out-of-the-box assembled
> >> shaded jar (since we dont do anything great to configure it).
> >> With current state of sbt in spark, it definitely is not a good
> >> solution : if we can enhance it (or it already is ?), while keeping
> >> the management of the version/dependency graph manageable, I dont have
> >> any objections to using sbt or maven !
> >> Too many exclude versions, pinned versions, etc would just make things
> >> unmanageable in future.
> >>
> >>
> >> Regards,
> >> Mridul
> >>
> >>
> >>
> >>
> >> On Wed, Feb 26, 2014 at 8:56 AM, Evan chan <e...@ooyala.com> wrote:
> >>> Actually you can control exactly how sbt assembly merges or resolves
> >> conflicts.  I believe the default settings however lead to order which
> >> cannot be controlled.
> >>>
> >>> I do wish for a smarter fat jar plugin.
> >>>
> >>> -Evan
> >>> To be free is not merely to cast off one's chains, but to live in a
> >> that respects & enhances the freedom of others. (#NelsonMandela)
> >>>
> >>>> On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan <mri...@gmail.com>
> >> wrote:
> >>>>
> >>>>> On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell <pwend...@gmail.com
> >> wrote:
> >>>>> Evan - this is a good thing to bring up. Wrt the shader plug-in -
> >>>>> right now we don't actually use it for bytecode shading - we simply
> >>>>> use it for creating the uber jar with excludes (which sbt supports
> >>>>> just fine via assembly).
> >>>>
> >>>>
> >>>> Not really - as I mentioned initially in this thread, sbt's assembly
> >>>> does not take dependencies into account properly : and can overwrite
> >>>> newer classes with older versions.
> >>>> From an assembly point of view, sbt is not very good : we are yet to
> >>>> try it after 2.10 shift though (and probably wont, given the mess it
> >>>> created last time).
> >>>>
> >>>> Regards,
> >>>> Mridul
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> I was wondering actually, do you know if it's possible to added
> >>>>> artifacts to the *spark jar* using this plug-in (e.g. not an uber
> >>>>> jar)? That's something I could see being really handy in the future.
> >>>>>
> >>>>> - Patrick
> >>>>>
> >>>>>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <e...@ooyala.com> wrote:
> >>>>>> The problem is that plugins are not equivalent.  There is AFAIK no
> >>>>>> equivalent to the maven shader plugin for SBT.
> >>>>>> There is an SBT plugin which can apparently read POM XML files
> >>>>>> (sbt-pom-reader).   However, it can't possibly handle plugins,
> >>>>>> is still problematic.
> >>>>>>
> >>>>>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com>
> >>>>>>> I would prefer keep both of them, it would be better even if that
> >> means
> >>>>>>> pom.xml will be generated using sbt. Some company, like my current
> >> one,
> >>>>>>> have their own build infrastructures built on top of maven. It is
> >> easy
> >>>>>>> to support sbt for these potential spark clients. But I do agree
> >> only
> >>>>>>> keep one if there is a promising way to generate correct
> >> configuration from
> >>>>>>> the other.
> >>>>>>>
> >>>>>>> -Shengzhe
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> wrote:
> >>>>>>>>
> >>>>>>>> The correct way to exclude dependencies in SBT is actually to
> >>>>>>>> a dependency as "provided".   I'm not familiar with Maven or its
> >>>>>>>> dependencySet, but provided will mark the entire dependency tree
> >>>>>>>> excluded.   It is also possible to exclude jar by jar, but this
> >>>>>>>> pretty error prone and messy.
> >>>>>>>>
> >>>>>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <
> >> wrote:
> >>>>>>>>> yes in sbt assembly you can exclude jars (although i never had a
> >> need for
> >>>>>>>>> this) and files in jars.
> >>>>>>>>>
> >>>>>>>>> for example i frequently remove log4j.properties, because for
> >> whatever
> >>>>>>>>> reason hadoop decided to include it making it very difficult to
> >> our
> >>>>>>>> own
> >>>>>>>>> logging config.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote:
> >>>>>>>>>>> Kos - thanks for chiming in. Could you be more specific about
> >> what is
> >>>>>>>>>>> available in maven and not in sbt for these issues? I took a
> >> at
> >>>>>>>>>>> the bigtop code relating to Spark. As far as I could tell [1]
> >> the
> >>>>>>>>>>> main point of integration with the build system (maybe there
> >> other
> >>>>>>>>>>> integration points)?
> >>>>>>>>>>>
> >>>>>>>>>>>>  - in order to integrate Spark well into existing Hadoop
stack it
> >>>>>>>> was
> >>>>>>>>>>>>    necessary to have a way to avoid transitive dependencies
> >>>>>>>>>> duplications and
> >>>>>>>>>>>>    possible conflicts.
> >>>>>>>>>>>>
> >>>>>>>>>>>>    E.g. Maven assembly allows us to avoid adding _all_ Hadoop
> >> libs
> >>>>>>>>>> and later
> >>>>>>>>>>>>    merely declare Spark package dependency on standard Bigtop
> >>>>>>>> Hadoop
> >>>>>>>>>>>>    packages. And yes - Bigtop packaging means the naming and
> >> layout
> >>>>>>>>>> would be
> >>>>>>>>>>>>    standard across all commercial Hadoop distributions that
> >>>>>>>> worth
> >>>>>>>>>>>>    mentioning: ASF Bigtop convenience binary packages, and
> >>>>>>>> Cloudera or
> >>>>>>>>>>>>    Hortonworks packages. Hence, the downstream user doesn't
> >> to
> >>>>>>>>>> spend any
> >>>>>>>>>>>>    effort to make sure that Spark "clicks-in" properly.
> >>>>>>>>>>>
> >>>>>>>>>>> The sbt build also allows you to plug in a Hadoop version
> >> to
> >>>>>>>>>>> the maven build.
> >>>>>>>>>>
> >>>>>>>>>> I am actually talking about an ability to exclude a set of
> >> dependencies
> >>>>>>>>>> from an
> >>>>>>>>>> assembly, similarly to what's happening in dependencySet
> >> of
> >>>>>>>>>>    assembly/src/main/assembly/assembly.xml
> >>>>>>>>>> If there is a comparable functionality in Sbt, that would help
> >> quite a
> >>>>>>>> bit,
> >>>>>>>>>> apparently.
> >>>>>>>>>>
> >>>>>>>>>> Cos
> >>>>>>>>>>
> >>>>>>>>>>>>  - Maven provides a relatively easy way to deal with the
> >>>>>>>>>> problem,
> >>>>>>>>>>>>    although the original maven build was just Shader'ing
> >> everything
> >>>>>>>>>> into a
> >>>>>>>>>>>>    huge lump of class files. Oftentimes ending up with
> >>>>>>>>>> slamming on
> >>>>>>>>>>>>    top of each other from different transitive dependencies.
> >>>>>>>>>>>
> >>>>>>>>>>> AFIAK we are only using the shade plug-in to deal with
> >>>>>>>>>>> resolution in the assembly jar. These are dealt with in sbt
> >> the
> >>>>>>>>>>> sbt assembly plug-in in an identical way. Is there a
> >>>>>>>>>>
> >>>>>>>>>> I am bringing up the Sharder, because it is an awful hack,
which is
> >>>>>>>> can't
> >>>>>>>>>> be
> >>>>>>>>>> used in real controlled deployment.
> >>>>>>>>>>
> >>>>>>>>>> Cos
> >>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>
> >>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> --
> >>>>>>>> Evan Chan
> >>>>>>>> Staff Engineer
> >>>>>>>> e...@ooyala.com  |
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> --
> >>>>>> Evan Chan
> >>>>>> Staff Engineer
> >>>>>> e...@ooyala.com  |
> >
> >
> >
> > --
> > --
> > Evan Chan
> > Staff Engineer
> > e...@ooyala.com  |

Reply via email to