Hey Jakob,

The builds in Spark are largely maintained by me, Sean, and Michael
Armbrust (for SBT). For historical reasons, Spark supports both a Maven and
SBT build. Maven is the build of reference for packaging Spark and is used
by many downstream packagers and to build all Spark releases. SBT is more
often used by developers. Both builds inherit from the same pom files (and
rely on the same profiles) to minimize maintenance complexity of Spark's
very complex dependency graph.

If you are looking to make contributions that help with the build, I am
happy to point you towards some things that are consistent maintenance
headaches. There are two major pain points right now that I'd be thrilled
to see fixes for:

1. SBT relies on a different dependency conflict resolution strategy than
maven - causing all kinds of headaches for us. I have heard that newer
versions of SBT can (maybe?) use Maven as a dependency resolver instead of
Ivy. This would make our life so much better if it were possible, either by
virtue of upgrading SBT or somehow doing this ourselves.

2. We don't have a great way of auditing the net effect of dependency
changes when people make them in the build. I am working on a fairly clunky
patch to do this here:

https://github.com/apache/spark/pull/8531

It could be done much more nicely using SBT, but only provided (1) is
solved.

Doing a major overhaul of the sbt build to decouple it from pom files, I'm
not sure that's the best place to start, given that we need to continue to
support maven - the coupling is intentional. But getting involved in the
build in general would be completely welcome.

- Patrick

On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen <so...@cloudera.com> wrote:

> Maven isn't 'legacy', or supported for the benefit of third parties.
> SBT had some behaviors / problems that Maven didn't relative to what
> Spark needs. SBT is a development-time alternative only, and partly
> generated from the Maven build.
>
> On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers <ko...@tresata.com> wrote:
> > People who do upstream builds of spark (think bigtop and hadoop distros)
> are
> > used to legacy systems like maven, so maven is the default build. I don't
> > think it will change.
> >
> > Any improvements for the sbt build are of course welcome (it is still
> used
> > by many developers), but i would not do anything that increases the
> burden
> > of maintaining two build systems.
> >
> > On Nov 5, 2015 18:38, "Jakob Odersky" <joder...@gmail.com> wrote:
> >>
> >> Hi everyone,
> >> in the process of learning Spark, I wanted to get an overview of the
> >> interaction between all of its sub-projects. I therefore decided to
> have a
> >> look at the build setup and its dependency management.
> >> Since I am alot more comfortable using sbt than maven, I decided to try
> to
> >> port the maven configuration to sbt (with the help of automated tools).
> >> This led me to a couple of observations and questions on the build
> system
> >> design:
> >>
> >> First, currently, there are two build systems, maven and sbt. Is there a
> >> preferred tool (or future direction to one)?
> >>
> >> Second, the sbt build also uses maven "profiles" requiring the use of
> >> specific commandline parameters when starting sbt. Furthermore, since it
> >> relies on maven poms, dependencies to the scala binary version (_2.xx)
> are
> >> hardcoded and require running an external script when switching
> versions.
> >> Sbt could leverage built-in constructs to support cross-compilation and
> >> emulate profiles with configurations and new build targets. This would
> >> remove external state from the build (in that no extra steps need to be
> >> performed in a particular order to generate artifacts for a new
> >> configuration) and therefore improve stability and build reproducibility
> >> (maybe even build performance). I was wondering if implementing such
> >> functionality for the sbt build would be welcome?
> >>
> >> thanks,
> >> --Jakob
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to