Thanks for the update Luke.
I'm updating my local working copy to do new tests.
Regards
JB
On 11/19/2017 08:21 PM, Lukasz Cwik wrote:
The gradle build rules have been merged, I'm adding a precommit[1] to start
collecting data about the build times. It currently only mirrors the Java
mvn install precommit. I'll gather data over the next two weeks and provide
a summary here.
You can rerun the precommit by issuing "Run Java Gradle PreCommit"
1: https://github.com/apache/beam/pull/4146
On Mon, Nov 13, 2017 at 9:08 AM, Lukasz Cwik <[email protected]> wrote:
There has been plenty of time for comments on the PR and the approach.
So far Ken Knowles has provided the most feedback on the PR, Ken would you
like to finish the review?
On Fri, Nov 10, 2017 at 1:22 PM, Romain Manni-Bucau <[email protected]
wrote:
This is only a setup thing and better to not break the master history for
poc/tests, in particular when no very localized. Alternative can be to ask
another temp repo to infra and have a synchro between both but dont think
it does worth it personally.
Le 10 nov. 2017 18:57, "Lukasz Cwik" <[email protected]> a écrit :
The reason to get it on master is because that is where all the PRs
are. An
upstream branch without any development means no data.
Also, our Jenkins setup via job-dsl doesn't honor using the Jenkins
configuration on the branch because the seed job always runs against
master.
On Thu, Nov 9, 2017 at 9:59 PM, Romain Manni-Bucau <
[email protected]>
wrote:
What about pushing it on a "upstream" branch and testing it for 1
week in
parallel of the maven reference build? If gradle is always 50% faster
on
jenkins then it could become master setup without much discussion I
guess.
We can even have 2 jenkins jobs: one with the daemon etc and one
without.
Also noticed yesterday that gradle build is killing my machine (all 8
cores
are 100%) during the first minutes vs maven build which let me do
something
else. Then all the consumed time which makes gradle not that fast is
about
python. Will try to send figures later today.
Le 10 nov. 2017 00:10, "Lukasz Cwik" <[email protected]> a
écrit
:
I wouldn't mind merging this change in so I could setup those Gradle
Jenkins precommits.
As per our contribution guidelines, any committer willing to sign
off
on
the PR?
On Thu, Nov 9, 2017 at 2:12 PM, Romain Manni-Bucau <
[email protected]>
wrote:
Le 9 nov. 2017 21:31, "Kenneth Knowles" <[email protected]>
a
écrit :
Keep in mind that a clean build is unusual during development (it
is
common
for mvn use and that is a bug) and also not necessary for
precommits
if
the
build tool is correct enough that caching is safe. So while this
number
matters, it is not the most important.
Not sure, in dev you bypass the build tool most of the time
anyway -
thanks
to IDE or other shortcuts - but not on PR and CI. Keep in mind
that
not
doing a clean and killing gradle daemon makes the build not
reproducible
and therefore useful :(. Starting to build from a subpart of the
reactor
-
with the mentionned mvn plugin for instance - can be nice on some
CI
like
travis if the caching is well configured but still not a guarantee
the
build is "green".
My trade off is to ensure an easy build and relevant result over
the
time
criteria. Do you share it as well or prefer time over other
criteria
-
which leads to other conclusions and options indeed and can make
us
not
understanding each other?
On Thu, Nov 9, 2017 at 11:30 AM, Romain Manni-Bucau <
[email protected]
wrote:
I will try next week yes but the 2 runs i did were 28mn vs 32mn
from
memory
- after having downloaded all deps once.
Le 9 nov. 2017 19:45, "Lukasz Cwik" <[email protected]>
a
écrit :
If Gradle was slow, do you mind running the build with
--profile
and
sharing that and also sharing the Maven build log?
On Thu, Nov 9, 2017 at 10:43 AM, Lukasz Cwik <
[email protected]>
wrote:
Romain, I don't understand your last comment, were you
trying
to
say
that
you had the same Gradle build times like I did and it was an
improvement
over Maven or that you did not and you experienced build
times
that
were
equivalent to Maven?
On Thu, Nov 9, 2017 at 9:51 AM, Romain Manni-Bucau <
[email protected]>
wrote:
2017-11-09 18:38 GMT+01:00 Kenneth Knowles
<[email protected]
:
On Thu, Nov 9, 2017 at 9:11 AM, Romain Manni-Bucau <
[email protected]>
wrote:
(this is another topic so we can maybe open another
thread)
issue
is
not much about python but more about the fact the build
is
not
self
contained. it is a maven build and maven should be
sufficient
without
having to install python + dependencies.
Let's leave out the topic of whether our build should
install
things
like
JDKs, Python, Golang, Docker, protoc, findbugs, RAT, etc.
That
issue
is
somewhat independent of build tool, and the new build
isn't
worse
than
the
old one as far as it goes.
Yep, globally the same time with clean and killing the
daemon.
Kenn
I don't see any technical
blockers to do it (except time ;)) but it is always a
bit
annoying
to
git clone then not be able to build.
Romain Manni-Bucau
@rmannibucau | Blog | Old Blog | Github | LinkedIn
2017-11-09 18:07 GMT+01:00 Lukasz Cwik
<[email protected]
:
Hmm, I have had good luck when following the Python
quick
start
setup
<https://beam.apache.org/get-started/quickstart-py/>
on
multiple
machines
by ensuring the installed version of setuptools,
virtualenv
and
pip
are
new
enough versions.
You can always skip the Python portion of the build by
excluding
the
build
task as so:
./gradlew build -x ":beam-sdks-parent:beam-sdks-
python:build"
On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau <
[email protected]>
wrote:
The 1.3.5 file is when i installed the python
dependencies
manually
to make the build passing (the pip command never
passed
on
my
computer
and therefore the build always has been broken until
i
installed
it
manually - independently from the build tool).
Romain Manni-Bucau
@rmannibucau | Blog | Old Blog | Github | LinkedIn
2017-11-09 17:51 GMT+01:00 Lukasz Cwik
<[email protected]
:
It turns out that the Apache Rat Ant task and the
Apache
Rat
Maven
plugin
differ in that the plugin automatically excludes
certain
files
by
default
while the Ant task does not.
See:
http://creadur.apache.org/rat/
apache-rat-plugin/check-mojo.
html#useDefaultExcludes
I fixed the list to exclude ".idea/" instead of
"idea/"
since
there
was a
typo.
I have no idea what the file "=1.3.5" is. Can you
take a
look
at
the
contents?
On Thu, Nov 9, 2017 at 12:03 AM, Romain
Manni-Bucau <
[email protected]>
wrote:
Ok, the rat issues I got were:
== File: /home/rmannibucau/1_dev/beam/.idea/*
== File: /home/rmannibucau/1_dev/beam/
sdks/python/=1.3.5
The first one could be in my default exclude -
even
if
eclipse/idea
files should be in the default exclude set of beam
rat
config
IMHO,
the last one is more a "?" can probably be
exclude as
well
if
created
by the build at some point.
Romain Manni-Bucau
@rmannibucau | Blog | Old Blog | Github |
LinkedIn
2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré <
[email protected]
:
Thanks for the update. I was swamped on some
meetings.
I'm
back to
test
the latest changes.
Regards
JB
On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik
<[email protected]
wrote:
Thanks everyone for trying this build out in
different
workspaces /
configurations. This will help make sure the
build
works
for
more
people
and will get rid of any rough edges.
Performance (All):
Maven performs parallelization at the module
level,
an
entire
module
needs
to complete before any dependent modules can
start,
this
means
running
all
the checks like findbugs, checkstyle, tests
need to
finish.
Gradle
has
task
level parallelism between subprojects which
means
that
as
soon
as
the
compile and shade steps are done for a project,
and
dependent
subprojects
can typically start. This means that we get
increased
parallelism
due
to
not needing to wait for findbugs, checkstyle,
tests
to
run. I
typically
see
~20 tasks (at peak) running on my desktop in
parallel.
Apache Rat (JB / Romain):
What files are in the rat report that fail (its
likely
that
I'm
missing
some exclusion for a build time artifact)? Also,
please
try
the
build
again
after running `git clean -fdx` in your
workspace.
Python (JB):
As for the Python SDK, you'll need to share more
details
about
the
failure.
Gradle 4.3:
I would like to defer the swap to Gradle 4.3
until
after
this
PR
since
it
will be a much smaller set of changes.
On Wed, Nov 8, 2017 at 12:54 AM, Jean-Baptiste
Onofré <
[email protected]>
wrote:
Same for me for rat and python build too:
FAILURE: Build completed with 2 failures.
1: Task failed with an exception.
-----------
* What went wrong:
Execution failed for task ':rat'.
Found 905 files with unapproved/unknown
licenses.
See
file:/home/jbonofre/Workspace/
beam/build/reports/rat/rat-
report.txt
* Try:
Run with --stacktrace option to get the stack
trace.
Run
with
--info
or
--debug option to get more log output.
==============================
==============================
==================
2: Task failed with an exception.
-----------
* Where:
Build file '/home/jbonofre/Workspace/
beam/sdks/python/build.gradle'
line:
64
* What went wrong:
Execution failed for task
':beam-sdks-parent:beam-sdks-
python:lint'.
Process 'command 'tox'' finished with
non-zero
exit
value 1
On 11/08/2017 09:51 AM, Romain Manni-Bucau
wrote:
gradle branch doesnt build for me (some rat
issues)
Romain Manni-Bucau
@rmannibucau | Blog | Old Blog | Github |
LinkedIn
2017-11-08 5:41 GMT+01:00 Jean-Baptiste
Onofré <
[email protected]
:
Great !
What explain these difference ? I'm curious
especially
for
the
clean
build
all Java modules: is it a question of
parallel
execution
?
Regards
JB
On 11/08/2017 02:59 AM, Lukasz Cwik wrote:
The Gradle POC has made significant
advances
since
last
week
(shading,
Python, Go, Docker builds, ...). I believe
the
current
state
is
close
enough to the Maven build system to
warrant a
comparison.
The largest build differences I noticed
are:
* Full build takes about ~22mins using
Gradle
(parallelizing
the
three
rounds of Python tests would reduce this to
~17mins)
compared
to
~38mins
in
Maven
* Clean build all Java modules (skipping
over
Go/Python <https://goto.google.com/Python>
<https://goto.google.com/Python>
<https://goto.google.com/Python>) takes
~8mins
in
Gradle which takes ~36mins in Maven
* Build output is cached allowing for
faster
subsequent
builds
with
"gradle
buildDependents" allowing for most single
module
changes
taking
~2mins
to
build and test without needing to rely on
"mvn
install"
I have opened PR 4096 <
https://github.com/apache/
beam/pull/4096>
so
that
the Gradle build files merged and then
follow
up
with
new
Jenkins
precommits which are powered by Gradle.
This
will
allow
the
community to
continuing contributing to the Gradle build
and
also
allow
for a
comparison
of the precommit times on the Jenkins
executor
when
using
Maven/Gradle.
I
suggest that those who are interested try
out
the
PR.
On Fri, Nov 3, 2017 at 10:29 PM,
Jean-Baptiste
Onofré
<[email protected]>
wrote:
That makes sense. The point is that we
have to
compare
equivalently. I'm
also curious about Gradle PoC assuming it
does
the
same
actions
as
Maven.
Regards
JB
On Nov 3, 2017, 20:41, at 20:41, Kenneth
Knowles
<[email protected]>
wrote:
I'm confident that any choice will speed
things
up
dramatically
even
beyond
a fast profile, even if the new tool runs
all
the
extra
stuff.
But
that
is
a question that we can answer empirically
anyhow.
Let's
see
how
it
goes!
Incidentally, my experiments with Bazel
have
led
me
to
the
conclusion
that
it is not the right choice for us so I'm
not
going
to
be
proposing any
completed POC of that right now. I'm
interested
in
the
outcome
of the
Gradle POC.
Kenn
On Fri, Nov 3, 2017 at 3:30 AM,
Jean-Baptiste
Onofré
<[email protected]
wrote:
Hi
It's what I said in a previous e-mail: I
don't
think
that
just
changing
the build tool will improve a lot the
build
time.
We already know (and discussed while
ago)
that
plugins
like
findbugs,
checkstyle, etc are taking time.
So, I think we can already have a fast
profile.
Regards
JB
On Nov 3, 2017, 11:16, at 11:16, Romain
Manni-Bucau
<[email protected]>
wrote:
Hi guys,
when you check the duration of each
mojo
of
the
build
(almost
since
python part of the build just breaks it
locally)
you
see
that
there
is
no real link with maven for the perf
issues
beam
can
encounter:
https://gist.github.com/rmannibucau/
f65fdde28d5dab0fdac50633
f84554c9
(generated from the profiling of
tesla-profile
and
parsed
with
https://gist.github.com/rmannibucau/
e329d54b8af6c009f46fd1
51d10037
ad
)
Before PoC-ing other tools which will
end
up
to
either
have
the same
issues if the other builds do the same
things
(test,
checkstyle,
enforcer, findbugs, ...) or have a less
reliable
build
(trying
to
skip
some parts of the build if "untouched"
-
note
that
this
is a
very
hard
issue since static code anaylizis
doesn't
give
you
any
guarantee of
what it does with modern code - then
maybe
some
action
can be
taken
on
the current build:
- testing
https://github.com/vackosar/
gitflow-incremental-builder
or
https://github.com/khmarbaise/
incremental-module-builder
maybe
or
do
the same kind of extension including
the
beam
needs
(/!\
the
previous
warning is still accurate and requires
a
full
run
at
some
point to
validate the graph detection algorithm
didn't
get
abused
by
some
indirect code dependency)
- maybe try to get rid of some shades
(it
is a
bit
crazy
ATM
to have
so much shades no?)
- the CI can have profiles based on a
PR
convention
(name
of
the
branch?) to select the build profile,
for
instance
fb/elasticsearch_super-nice-PR would
build
only
the
elasticsearch
modules, jenkins/travis have this
ability
since
they
support
scripting
- document how to setup a "fastBuild"
profile
in
its
settings.xml
which bypasses checkstyle, enforcer
plugin,
findbugs,
etc...
for
fast
development iterations
Romain Manni-Bucau
@rmannibucau | Blog | Old Blog |
Github |
LinkedIn
2017-11-01 21:02 GMT+01:00 Kenneth
Knowles
<[email protected]
:
I have started one, here:
https://github.com/
kennknowles/beam/commits/bazel
.
It is not nearly as far along as
Luke's.
For
the
POC
I am
just
putting
things in one root BUILD, and learning
where
we
might
find
the
necessary
plugins as I go. I am happy to grant
push
access
to
this
branch.
It
would
be superb if you had some time to work
through
the
Python
steps.
On Wed, Nov 1, 2017 at 10:09 AM, Ahmet
Altay
<[email protected]>
wrote:
Has anyone started a POC with Bazel? I
would
be
interested
in
helping that
effort.
On Wed, Nov 1, 2017 at 9:27 AM,
Lukasz
Cwik
<[email protected]>
wrote:
I have started a POC for using Gradle
here:
https://github.com/lukecwik/in
cubator-beam/tree/gradle
Things that work:
* compiling all Java code (src/main
and
src/test)
* generating source from protos
* generating source from avro
* running rat, checkstyle
Partially working:
* generating maven pom (albeit with
wrong
dependencies
for
some
subprojects)
* running tests (~80% pass,
remainder
seem
to
be
dependency
related but
are
uninvestigated)
Things that don't work:
* anything Python/Go/Docker
compilation
related
* many tests fail because I messed
up
dependencies
* anything shading related
* minor plugins like eclipse code
formatter/...
* running
@NeedsRunner/@ValidatesRunner/
integration
tests
Feel free to reach out to me on
Slack
if
you
would
like to
try
to
tackle
a
piece of the POC to prevent
duplication
of
effort
from
anyone
working on
it.
On Tue, Oct 31, 2017 at 10:25 PM,
Jean-Baptiste
Onofré
<[email protected]>
wrote:
Agree to move forward on a PoC.
Thanks Reuven for bringing
discussion
on
the
mailing
list
!
Regards
JB
On Nov 1, 2017, 03:20, at 03:20,
Reuven
Lax
<[email protected]>
wrote:
Some good discussion here, and
thanks
to
JB
and
Romain
for
adding to
it!
JB makes the good point that we
still
need
to
release
Maven
artifacts,
as
many Beam users want to develop
using
Maven.
So
none
of
this
discussion
will affect our release process,
as
we
still
need
Maven
"releases."
At this point, if people are
interested,
I
see
no
harm
in
prototyping.
Having working alternatives will
give
us
a
better
basis
for
comparison
to
understand whether these other
build
systems
give us
anything
over
what
Maven does.
Reuven
On Tue, Oct 31, 2017 at 11:05 AM,
Charles
Chen
<[email protected]
wrote:
As a contributor to the Beam
Python
SDK,
I
noticed
that
many
of the
points
above regarding Maven and Gradle
pertain
mostly
to
Java
SDK
development.
For Python development, Maven is
much
less
natural,
and
we
end up
just
shelling out to perform builds
and
tests.
For
Python
SDK
(and
upcoming Go
SDK development), an option to
use
Bazel
would
be
quite
useful.
On Tue, Oct 31, 2017 at 10:42 AM
Robert
Bradshaw
<[email protected]>
wrote:
+1, Maven is both a build tool
and a
repository, and
the
latter is
essential to keep. Both Gradel
and
Bazel
can
interface
with
this
repository.
I am, however, very supportive
of
moving
away
from
Maven
to
a tool
that supports correct
incremental,
hermetic,
dependency-driven,
multi-langauge, and hopefully
fast
builds
for
our
own
development.
On Tue, Oct 31, 2017 at 10:00
AM,
Kenneth
Knowles
<[email protected]> wrote:
Echoing what JB and Reuven
said,
we
absolutely
must
provide
maven
central
artifacts for Java users, just
as
we
provide
pypi
artifacts for
Python
users.
I see Maven as still a viable
tool
for
single-module
Java
builds,
especially considering its rich
plugin
ecosystem.
On Mon, Oct 30, 2017 at 11:27
PM,
Reuven
Lax
<[email protected]
wrote:
I think that's a very good
point.
No
matter
what
build
system
we
use
for
our own personal development,
we
still
need
to
release
Maven
artifacts
and
releases as we need to support
our
users
using
Maven.
On Mon, Oct 30, 2017 at 11:26
PM,
Jean-Baptiste
Onofré <
[email protected]
wrote:
Generally speaking, it's
interesting
to
evaluate
alternatives,
especially
Gradle. My point is also to
keep
Maven
artifacts
and
"releases" as
most
of
our users will use Maven.
For incremental build, afair,
there's
some
enhancements on
Maven
but I
have to take a look.
Regards
JB
On Oct 31, 2017, 07:22, at
07:22,
Eugene
Kirpichov
<[email protected]
wrote:
Hi!
Many of these points sound
valid,
but
AFAICT
Maven
doesn't
really
do
incremental builds [1]. The
best
it
can
do
is,
it
seems,
recompile
only
changed files, but Java
compilation
is
a
tiny
part
of
the
overall
build.
Almost all time is taken by
other
plugins,
such as
unit
testing or
findbugs
- and Maven does not seem to
currently
support
features such
as "do
not
rerun unit tests of a
module if
the
code
didn't
change".
The fact that the surefire
plugin
has
existed
for
11
years
(version
2.0
was released in 2006) and
still
doesn't
have
this
feature
makes me
think
that it's unlikely to be
supported
in
the
next
few
years
either.
I suspect most PRs affect a
very
small
number
of
modules, so
I
think
the
performance advantage of a
build
system
truly
supporting
incremental
builds
may be so overwhelming as to
trump
many
other
factors. Of
course,
we'd
need
to prototype and have hard
numbers
in
hand
to
discuss
this
with
more
substance.
[1]
https://stackoverflow.com/
questions/8918165/does-maven-
support-incremental-builds
On Mon, Oct 30, 2017 at
10:57
PM
Romain
Manni-Bucau
<[email protected]>
wrote:
Hi
Even if not a commiter or
even
PMC,
I'd
like
to
mention a
few
points
from
an external eye:
- Maven stays the most
common
build
tool
and
easier
one
for
any
user.
It
means it is the best one to
hope
contributions
IMHO.
- Maven has incremental
support
but
if
there
is
any
blocker
the
community
is probably ready to
enhance
it
(has
been
done
for
compiler
plugin
for
instance)
- Gradle hides issues
easily
with
its
daemon
so
a
build
without
daemon is
needed
- Gradle doesnt isolate
plugins
well
enough so
ensure your
planned
plugins
doesnt conflict
- Only Maven is correctly
supported
in
mainstream
and
OS/free IDE
This is the reasons why I
think
Maven
is
better
-
not even
entering
into
the ASF points.
Now Maven is not perfect
but
some
quick
enhancements can
be
done:
- A fast build profile can
be
created
- Takari scheduler can be
used
yo
enhance
the
parallel
build
- Scripts can be provided
to
build a
subpart
of
the
project
- A beam extension can
surely
be
done
to
optimize
or
compute the
reactors
more easily based on module
names
Romain
Le 31 oct. 2017 06:42,
"Jean-Baptiste
Onofré"
<[email protected]>
a
écrit :
-0
For the following reasons
reasons:
- maven is a Apache project
and
we
can
have
support/improvement
- I don't see how another
build
tool
would
speed
up
the
build by
itself
- Apache default release
process
is
based
on
Maven
On the other hand, Gradle
could
be
interesting.
Anyway
it's
something
to
evaluate.
Regards
JB
On Oct 30, 2017, 18:46, at
18:46,
Ted
Yu
<[email protected]>
wrote:
I agree with Ben's
comment.
Recently I have been using
gradle
in
another
Apache
project and
found
it
interesting.
Cheers
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com