Re: State of the Build

2015-11-06 Thread Jakob Odersky
Reposting to the list...

Thanks for all the feedback everyone, I get a clearer picture of the
reasoning and implications now.

Koert, according to your post in this thread
http://apache-spark-developers-list.1001551.n3.nabble.com/Master-build-fails-tt14895.html#a15023,
it is apparently very easy to change the maven resolution mechanism to the
ivy one.
Patrick, would this not help with the problems you described?

On 5 November 2015 at 23:23, Patrick Wendell  wrote:

> Hey Jakob,
>
> The builds in Spark are largely maintained by me, Sean, and Michael
> Armbrust (for SBT). For historical reasons, Spark supports both a Maven and
> SBT build. Maven is the build of reference for packaging Spark and is used
> by many downstream packagers and to build all Spark releases. SBT is more
> often used by developers. Both builds inherit from the same pom files (and
> rely on the same profiles) to minimize maintenance complexity of Spark's
> very complex dependency graph.
>
> If you are looking to make contributions that help with the build, I am
> happy to point you towards some things that are consistent maintenance
> headaches. There are two major pain points right now that I'd be thrilled
> to see fixes for:
>
> 1. SBT relies on a different dependency conflict resolution strategy than
> maven - causing all kinds of headaches for us. I have heard that newer
> versions of SBT can (maybe?) use Maven as a dependency resolver instead of
> Ivy. This would make our life so much better if it were possible, either by
> virtue of upgrading SBT or somehow doing this ourselves.
>
> 2. We don't have a great way of auditing the net effect of dependency
> changes when people make them in the build. I am working on a fairly clunky
> patch to do this here:
>
> https://github.com/apache/spark/pull/8531
>
> It could be done much more nicely using SBT, but only provided (1) is
> solved.
>
> Doing a major overhaul of the sbt build to decouple it from pom files, I'm
> not sure that's the best place to start, given that we need to continue to
> support maven - the coupling is intentional. But getting involved in the
> build in general would be completely welcome.
>
> - Patrick
>
> On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen  wrote:
>
>> Maven isn't 'legacy', or supported for the benefit of third parties.
>> SBT had some behaviors / problems that Maven didn't relative to what
>> Spark needs. SBT is a development-time alternative only, and partly
>> generated from the Maven build.
>>
>> On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers  wrote:
>> > People who do upstream builds of spark (think bigtop and hadoop
>> distros) are
>> > used to legacy systems like maven, so maven is the default build. I
>> don't
>> > think it will change.
>> >
>> > Any improvements for the sbt build are of course welcome (it is still
>> used
>> > by many developers), but i would not do anything that increases the
>> burden
>> > of maintaining two build systems.
>> >
>> > On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:
>> >>
>> >> Hi everyone,
>> >> in the process of learning Spark, I wanted to get an overview of the
>> >> interaction between all of its sub-projects. I therefore decided to
>> have a
>> >> look at the build setup and its dependency management.
>> >> Since I am alot more comfortable using sbt than maven, I decided to
>> try to
>> >> port the maven configuration to sbt (with the help of automated tools).
>> >> This led me to a couple of observations and questions on the build
>> system
>> >> design:
>> >>
>> >> First, currently, there are two build systems, maven and sbt. Is there
>> a
>> >> preferred tool (or future direction to one)?
>> >>
>> >> Second, the sbt build also uses maven "profiles" requiring the use of
>> >> specific commandline parameters when starting sbt. Furthermore, since
>> it
>> >> relies on maven poms, dependencies to the scala binary version (_2.xx)
>> are
>> >> hardcoded and require running an external script when switching
>> versions.
>> >> Sbt could leverage built-in constructs to support cross-compilation and
>> >> emulate profiles with configurations and new build targets. This would
>> >> remove external state from the build (in that no extra steps need to be
>> >> performed in a particular order to generate artifacts for a new
>> >> configuration) and therefore improve stability and build
>> reproducibility
>> >> (maybe even build performance). I was wondering if implementing such
>> >> functionality for the sbt build would be welcome?
>> >>
>> >> thanks,
>> >> --Jakob
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


Re: State of the Build

2015-11-05 Thread Ted Yu
See previous discussion:
http://search-hadoop.com/m/q3RTtPnPnzwOhBr

FYI

On Thu, Nov 5, 2015 at 4:30 PM, Stephen Boesch  wrote:

> Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build
> environment by updating the pom.xml in each of the subprojects. If you were
> able to come up with a structure that avoids that approach it would be an
> improvement.
>
> 2015-11-05 15:38 GMT-08:00 Jakob Odersky :
>
>> Hi everyone,
>> in the process of learning Spark, I wanted to get an overview of the
>> interaction between all of its sub-projects. I therefore decided to have a
>> look at the build setup and its dependency management.
>> Since I am alot more comfortable using sbt than maven, I decided to try
>> to port the maven configuration to sbt (with the help of automated tools).
>> This led me to a couple of observations and questions on the build system
>> design:
>>
>> First, currently, there are two build systems, maven and sbt. Is there a
>> preferred tool (or future direction to one)?
>>
>> Second, the sbt build also uses maven "profiles" requiring the use of
>> specific commandline parameters when starting sbt. Furthermore, since it
>> relies on maven poms, dependencies to the scala binary version (_2.xx) are
>> hardcoded and require running an external script when switching versions.
>> Sbt could leverage built-in constructs to support cross-compilation and
>> emulate profiles with configurations and new build targets. This would
>> remove external state from the build (in that no extra steps need to be
>> performed in a particular order to generate artifacts for a new
>> configuration) and therefore improve stability and build reproducibility
>> (maybe even build performance). I was wondering if implementing such
>> functionality for the sbt build would be welcome?
>>
>> thanks,
>> --Jakob
>>
>
>


Re: State of the Build

2015-11-05 Thread Stephen Boesch
Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build
environment by updating the pom.xml in each of the subprojects. If you were
able to come up with a structure that avoids that approach it would be an
improvement.

2015-11-05 15:38 GMT-08:00 Jakob Odersky :

> Hi everyone,
> in the process of learning Spark, I wanted to get an overview of the
> interaction between all of its sub-projects. I therefore decided to have a
> look at the build setup and its dependency management.
> Since I am alot more comfortable using sbt than maven, I decided to try to
> port the maven configuration to sbt (with the help of automated tools).
> This led me to a couple of observations and questions on the build system
> design:
>
> First, currently, there are two build systems, maven and sbt. Is there a
> preferred tool (or future direction to one)?
>
> Second, the sbt build also uses maven "profiles" requiring the use of
> specific commandline parameters when starting sbt. Furthermore, since it
> relies on maven poms, dependencies to the scala binary version (_2.xx) are
> hardcoded and require running an external script when switching versions.
> Sbt could leverage built-in constructs to support cross-compilation and
> emulate profiles with configurations and new build targets. This would
> remove external state from the build (in that no extra steps need to be
> performed in a particular order to generate artifacts for a new
> configuration) and therefore improve stability and build reproducibility
> (maybe even build performance). I was wondering if implementing such
> functionality for the sbt build would be welcome?
>
> thanks,
> --Jakob
>


Re: State of the Build

2015-11-05 Thread Sean Owen
Maven isn't 'legacy', or supported for the benefit of third parties.
SBT had some behaviors / problems that Maven didn't relative to what
Spark needs. SBT is a development-time alternative only, and partly
generated from the Maven build.

On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers  wrote:
> People who do upstream builds of spark (think bigtop and hadoop distros) are
> used to legacy systems like maven, so maven is the default build. I don't
> think it will change.
>
> Any improvements for the sbt build are of course welcome (it is still used
> by many developers), but i would not do anything that increases the burden
> of maintaining two build systems.
>
> On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:
>>
>> Hi everyone,
>> in the process of learning Spark, I wanted to get an overview of the
>> interaction between all of its sub-projects. I therefore decided to have a
>> look at the build setup and its dependency management.
>> Since I am alot more comfortable using sbt than maven, I decided to try to
>> port the maven configuration to sbt (with the help of automated tools).
>> This led me to a couple of observations and questions on the build system
>> design:
>>
>> First, currently, there are two build systems, maven and sbt. Is there a
>> preferred tool (or future direction to one)?
>>
>> Second, the sbt build also uses maven "profiles" requiring the use of
>> specific commandline parameters when starting sbt. Furthermore, since it
>> relies on maven poms, dependencies to the scala binary version (_2.xx) are
>> hardcoded and require running an external script when switching versions.
>> Sbt could leverage built-in constructs to support cross-compilation and
>> emulate profiles with configurations and new build targets. This would
>> remove external state from the build (in that no extra steps need to be
>> performed in a particular order to generate artifacts for a new
>> configuration) and therefore improve stability and build reproducibility
>> (maybe even build performance). I was wondering if implementing such
>> functionality for the sbt build would be welcome?
>>
>> thanks,
>> --Jakob

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: State of the Build

2015-11-05 Thread Koert Kuipers
People who do upstream builds of spark (think bigtop and hadoop distros)
are used to legacy systems like maven, so maven is the default build. I
don't think it will change.

Any improvements for the sbt build are of course welcome (it is still used
by many developers), but i would not do anything that increases the burden
of maintaining two build systems.
On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:

> Hi everyone,
> in the process of learning Spark, I wanted to get an overview of the
> interaction between all of its sub-projects. I therefore decided to have a
> look at the build setup and its dependency management.
> Since I am alot more comfortable using sbt than maven, I decided to try to
> port the maven configuration to sbt (with the help of automated tools).
> This led me to a couple of observations and questions on the build system
> design:
>
> First, currently, there are two build systems, maven and sbt. Is there a
> preferred tool (or future direction to one)?
>
> Second, the sbt build also uses maven "profiles" requiring the use of
> specific commandline parameters when starting sbt. Furthermore, since it
> relies on maven poms, dependencies to the scala binary version (_2.xx) are
> hardcoded and require running an external script when switching versions.
> Sbt could leverage built-in constructs to support cross-compilation and
> emulate profiles with configurations and new build targets. This would
> remove external state from the build (in that no extra steps need to be
> performed in a particular order to generate artifacts for a new
> configuration) and therefore improve stability and build reproducibility
> (maybe even build performance). I was wondering if implementing such
> functionality for the sbt build would be welcome?
>
> thanks,
> --Jakob
>


Re: State of the Build

2015-11-05 Thread Mark Hamstra
There was a lot of discussion that preceded our arriving at this statement
in the Spark documentation: "Maven is the official build tool recommended
for packaging Spark, and is the build of reference."
https://spark.apache.org/docs/latest/building-spark.html#building-with-sbt

I'm not aware of anything new in the way of SBT tooling or your post,
Jakob, that would lead us to reconsider the choice of Maven over SBT for
the reference build of Spark.  Of course, I'm by no means the sole and
final authority on the matter, but at least I am not seeing anything in
your suggested approach that hasn't already been considered.  You're
welcome to review the prior discussion and try to convince us that we've
made the wrong choice, but I wouldn't expect that to be a quick and easy
process.


On Thu, Nov 5, 2015 at 4:44 PM, Ted Yu  wrote:

> See previous discussion:
> http://search-hadoop.com/m/q3RTtPnPnzwOhBr
>
> FYI
>
> On Thu, Nov 5, 2015 at 4:30 PM, Stephen Boesch  wrote:
>
>> Yes. The current dev/change-scala-version.sh mutates (/pollutes) the
>> build environment by updating the pom.xml in each of the subprojects. If
>> you were able to come up with a structure that avoids that approach it
>> would be an improvement.
>>
>> 2015-11-05 15:38 GMT-08:00 Jakob Odersky :
>>
>>> Hi everyone,
>>> in the process of learning Spark, I wanted to get an overview of the
>>> interaction between all of its sub-projects. I therefore decided to have a
>>> look at the build setup and its dependency management.
>>> Since I am alot more comfortable using sbt than maven, I decided to try
>>> to port the maven configuration to sbt (with the help of automated tools).
>>> This led me to a couple of observations and questions on the build
>>> system design:
>>>
>>> First, currently, there are two build systems, maven and sbt. Is there a
>>> preferred tool (or future direction to one)?
>>>
>>> Second, the sbt build also uses maven "profiles" requiring the use of
>>> specific commandline parameters when starting sbt. Furthermore, since it
>>> relies on maven poms, dependencies to the scala binary version (_2.xx) are
>>> hardcoded and require running an external script when switching versions.
>>> Sbt could leverage built-in constructs to support cross-compilation and
>>> emulate profiles with configurations and new build targets. This would
>>> remove external state from the build (in that no extra steps need to be
>>> performed in a particular order to generate artifacts for a new
>>> configuration) and therefore improve stability and build reproducibility
>>> (maybe even build performance). I was wondering if implementing such
>>> functionality for the sbt build would be welcome?
>>>
>>> thanks,
>>> --Jakob
>>>
>>
>>
>


Re: State of the Build

2015-11-05 Thread Patrick Wendell
Hey Jakob,

The builds in Spark are largely maintained by me, Sean, and Michael
Armbrust (for SBT). For historical reasons, Spark supports both a Maven and
SBT build. Maven is the build of reference for packaging Spark and is used
by many downstream packagers and to build all Spark releases. SBT is more
often used by developers. Both builds inherit from the same pom files (and
rely on the same profiles) to minimize maintenance complexity of Spark's
very complex dependency graph.

If you are looking to make contributions that help with the build, I am
happy to point you towards some things that are consistent maintenance
headaches. There are two major pain points right now that I'd be thrilled
to see fixes for:

1. SBT relies on a different dependency conflict resolution strategy than
maven - causing all kinds of headaches for us. I have heard that newer
versions of SBT can (maybe?) use Maven as a dependency resolver instead of
Ivy. This would make our life so much better if it were possible, either by
virtue of upgrading SBT or somehow doing this ourselves.

2. We don't have a great way of auditing the net effect of dependency
changes when people make them in the build. I am working on a fairly clunky
patch to do this here:

https://github.com/apache/spark/pull/8531

It could be done much more nicely using SBT, but only provided (1) is
solved.

Doing a major overhaul of the sbt build to decouple it from pom files, I'm
not sure that's the best place to start, given that we need to continue to
support maven - the coupling is intentional. But getting involved in the
build in general would be completely welcome.

- Patrick

On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen  wrote:

> Maven isn't 'legacy', or supported for the benefit of third parties.
> SBT had some behaviors / problems that Maven didn't relative to what
> Spark needs. SBT is a development-time alternative only, and partly
> generated from the Maven build.
>
> On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers  wrote:
> > People who do upstream builds of spark (think bigtop and hadoop distros)
> are
> > used to legacy systems like maven, so maven is the default build. I don't
> > think it will change.
> >
> > Any improvements for the sbt build are of course welcome (it is still
> used
> > by many developers), but i would not do anything that increases the
> burden
> > of maintaining two build systems.
> >
> > On Nov 5, 2015 18:38, "Jakob Odersky"  wrote:
> >>
> >> Hi everyone,
> >> in the process of learning Spark, I wanted to get an overview of the
> >> interaction between all of its sub-projects. I therefore decided to
> have a
> >> look at the build setup and its dependency management.
> >> Since I am alot more comfortable using sbt than maven, I decided to try
> to
> >> port the maven configuration to sbt (with the help of automated tools).
> >> This led me to a couple of observations and questions on the build
> system
> >> design:
> >>
> >> First, currently, there are two build systems, maven and sbt. Is there a
> >> preferred tool (or future direction to one)?
> >>
> >> Second, the sbt build also uses maven "profiles" requiring the use of
> >> specific commandline parameters when starting sbt. Furthermore, since it
> >> relies on maven poms, dependencies to the scala binary version (_2.xx)
> are
> >> hardcoded and require running an external script when switching
> versions.
> >> Sbt could leverage built-in constructs to support cross-compilation and
> >> emulate profiles with configurations and new build targets. This would
> >> remove external state from the build (in that no extra steps need to be
> >> performed in a particular order to generate artifacts for a new
> >> configuration) and therefore improve stability and build reproducibility
> >> (maybe even build performance). I was wondering if implementing such
> >> functionality for the sbt build would be welcome?
> >>
> >> thanks,
> >> --Jakob
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>