Another solution would be to make the Travis builds more efficient. For example, we could write a script that determines the modified Maven module and only run the test for this module (and maybe transitive dependencies). PRs for libraries such as Gelly, Table, CEP or connectors would not trigger a compilation of the entire stack anymore. Of course this would not solve all problems but many of it.

What do you think about this?



Am 20/03/17 um 14:02 schrieb Robert Metzger:
Aljoscha, do you know how to configure jenkins?
Is Apache INFRA doing that, or are the beam people doing that themselves?

One downside of Jenkins is that we probably need some machines that execute
the tests. A Travis container has 2 CPU cores and 4 GB main memory. We
currently have 10 such containers available on travis concurrently. I think
we would need at least the same amount on Jenkins.


On Mon, Mar 20, 2017 at 1:48 PM, Timo Walther <twal...@apache.org> wrote:

I agress with Aljoscha that we might consider moving from Jenkins to
Travis. Is there any disadvantage in using Jenkins?

I think we should structure the project according to release management
(e.g. more frequent releases of libraries) or other criteria (e.g. core and
non-core) instead of build time. What would happen if the built of another
submodule would become too long, would we split/restructure again and
again? If Jenkins solves all our problems we should use it.

Regards,
Timo



Am 20/03/17 um 12:21 schrieb Aljoscha Krettek:

I prefer Jenkins to Travis by far. Working on Beam, where we have good
Jenkins integration, has opened my eyes to what is possible with good CI
integration.

For example, look at this recent Beam PR: https://github.com/apache/beam
/pull/2263 <https://github.com/apache/beam/pull/2263>. The
Jenkins-Github integration will tell you exactly which tests failed and if
you click on the links you can look at the log output/std out of the tests
in question.

This is the overview page of one of the Jenkins Jobs that we have in
Beam: https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
nService_Flink/ <https://builds.apache.org/job
/beam_PostCommit_Java_RunnableOnService_Flink/>. This is an example of a
stable build: https://builds.apache.org/job/
beam_PostCommit_Java_RunnableOnService_Flink/lastStableBuild/ <
https://builds.apache.org/job/beam_PostCommit_Java_Runnable
OnService_Flink/lastStableBuild/>. Notice how it gives you fine grained
information about the Maven run. This is an unstable run:
https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
nService_Flink/lastUnstableBuild/ <https://builds.apache.org/job
/beam_PostCommit_Java_RunnableOnService_Flink/lastUnstableBuild/>. There
you can see which tests failed and you can easily drill down.

Best,
Aljoscha

On 20 Mar 2017, at 11:46, Robert Metzger <rmetz...@apache.org> wrote:
Thank you for looking into the build times.

I didn't know that the build time situation is so bad. Even with yarn,
mesos, connectors and libraries removed, we are still running into the
build timeout :(

Aljoscha told me that the Beam community is using Jenkins for running
the tests, and they are planning to completely move away from Travis. I
wonder whether we should do the same, as having our own Jenkins servers
would allow us to run tests for more than 50 minutes.

I agree with Stephan that we should keep the yarn and mesos tests in the
core for stability / testing quality purposes.


On Mon, Mar 20, 2017 at 11:27 AM, Stephan Ewen <se...@apache.org
<mailto:se...@apache.org>> wrote:
@Greg

I am personally in favor of splitting "connectors" and "contrib" out as
well. I know that @rmetzger has some reservations about the connectors,
but
we may be able to convince him.

For the cluster tests (yarn / mesos) - in the past there were many cases
where these tests caught cases that other tests did not, because they are
the only tests that actually use the "flink-dist.jar" and thus discover
many dependency and configuration issues. For that reason, my feeling
would
be that they are valuable in the core repository.

I would actually suggest to do only the library split initially, to see
what the challenges are in setting up the multi-repo build and release
tooling. Once we gathered experience there, we can probably easily see
what
else we can split out.

Stephan


On Fri, Mar 17, 2017 at 8:37 PM, Greg Hogan <c...@greghogan.com <mailto:
c...@greghogan.com>> wrote:

I’d like to use this refactoring opportunity to unspilt the Travis tests.
With 51 builds queued up for the weekend (some of which may fail or have
been force pushed) we are at the limit of the number of contributions we
can process. Fixing this requires 1) splitting the project, 2)
investigating speedups for long-running tests, and 3) staying cognizant
of
test performance when accepting new code.

I’d like to add one to Stephan’s list of module group. I like that the
modules are generic (“libraries”) so that no one module is alone and
independent.

Flink has three “libraries”: cep, ml, and gelly.

“connectors” is a hotspot due to the long-running Kafka tests (and
connectors for three Kafka versions).

Both flink-storm and flink-python have a modest number of number of
tests
and could live with the miscellaneous modules in “contrib”.

The YARN tests are long-running and problematic (I am unable to
successfully run these locally). A “cluster” module could host
flink-mesos,
flink-yarn, and flink-yarn-tests.

That gets us close to running all tests in a single Travis build.
    https://travis-ci.org/greghogan/flink/builds/212122590 <
https://travis-ci.org/greghogan/flink/builds/212122590> <
https://travis-ci.org/greghogan/flink/builds/212122590 <
https://travis-ci.org/greghogan/flink/builds/212122590>>

I also tested (https://github.com/greghogan/flink/commits/core_build <
https://github.com/greghogan/flink/commits/core_build> <
https://github.com/greghogan/flink/commits/core_build <
https://github.com/greghogan/flink/commits/core_build>>) with a maven
parallelism of 2 and 4, with the latter a 6.4% drop in build time.
    https://travis-ci.org/greghogan/flink/builds/212137659 <
https://travis-ci.org/greghogan/flink/builds/212137659> <
https://travis-ci.org/greghogan/flink/builds/212137659 <
https://travis-ci.org/greghogan/flink/builds/212137659>>
    https://travis-ci.org/greghogan/flink/builds/212154470 <
https://travis-ci.org/greghogan/flink/builds/212154470> <
https://travis-ci.org/greghogan/flink/builds/212154470 <
https://travis-ci.org/greghogan/flink/builds/212154470>>

We can run Travis CI builds nightly to guard against breaking changes.

I also wanted to get an idea of how disruptive it would be to developers
to divide the project into multiple git repos. I wrote a simple python
script and configured it with the module partitions listed above. The
usage
string from the top of the file lists commits with files from multiple
partitions and well as the modified files.
    https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 <
https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897> <
https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 <
https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897>>

Accounting for the merging of the batch and streaming connector modules,
and assuming that the project structure has not changed much over the
past
15 months, for the following date ranges the listed number of commits
would
have been split across repositories.

since "2017-01-01"
36 of 571 commits were mixed

since "2016-07-01"
155 of 1607 commits were mixed

since "2016-01-01"
272 of 2561 commits were mixed

Greg


On Mar 15, 2017, at 1:13 PM, Stephan Ewen <se...@apache.org <mailto:
se...@apache.org>> wrote:

@Robert - I think once we know that a separate git repo works well, and
that it actually solves problems, I see no reason to not create a
connectors repository later. The infrastructure changes should be

identical

for two or more repositories.

On Wed, Mar 15, 2017 at 5:22 PM, Till Rohrmann <trohrm...@apache.org
<mailto:trohrm...@apache.org>>

wrote:

I think it should not be at least the flink-dist but exactly the
remaining
flink-dist module. Otherwise we do redundant work.
On Wed, Mar 15, 2017 at 5:03 PM, Robert Metzger <rmetz...@apache.org
<mailto:rmetz...@apache.org>>
wrote:

"flink-core" means the main repository, not the "flink-core" module.
When doing a release, we need to build the flink main code first,

because
the flink-libraries depend on that.
Once the "flink-libraries" are build, we need to run the main build

again
(at least the flink-dist module), so that it is pulling the artifacts
from

the flink-libraries to put them into the opt/ folder of the final

artifact.



On Wed, Mar 15, 2017 at 4:44 PM, Till Rohrmann <trohrm...@apache.org
<mailto:trohrm...@apache.org>>
wrote:

I'm ok with point 3.
Concerning point 8: Why do we have to build flink-core twice after

having
it built as a dependency for flink-libraries? This seems wrong to me.
Cheers,
Till

On Wed, Mar 15, 2017 at 4:23 PM, Robert Metzger <
rmetz...@apache.org <mailto:rmetz...@apache.org>>
wrote:

Thank you. Running on AWS is a good idea!
Let me know if you (or anybody else) wants to help me with the
infrastructure work! Any help is much appreciated (as I've said

before, I
don't really have time for doing this, but it has to be done :) )
I'm against creating two new repositories. I fear that this

introduces
too
much complexity and too many repositories.
"flink" and "flink-libraries" are hopefully enough to get the build

time
significantly down.
We can also consider putting the connectors into the

"flink-libraries"
repo
if we need to further reduce the build time.

We should probably move "flink-table" of out "flink-libraries" if
we

want
to keep "flink-table" in the main repo. (This would eliminate the
"flink-libraries" module from main.

Also, I agree that "flink-statebackend-rocksdb" is not correctly

placed
in
contrib anymore.


On Wed, Mar 15, 2017 at 4:07 PM, Greg Hogan <c...@greghogan.com
<mailto:c...@greghogan.com>>

wrote:
Robert, appreciate your kickstarting this task.
We should compare the verification time with and without the
listed
modules. I’ll try to run this by tomorrow on AWS and on Travis.

Should we maintain separate repos for flink-contrib and

flink-libraries?
Are you intending that we move flink-table out of flink-libraries
(and
perhaps flink-statebackend-rocksdb out of flink-contrib)?
Greg


On Mar 15, 2017, at 9:55 AM, Robert Metzger <rmetz...@apache.org
<mailto:rmetz...@apache.org>

wrote:
Thank you for looking into this Till.
I think we then have to split the repositories.
My main motivation for doing this is that it seems to be the only

feasible

way of scaling the community to allow more committers working on

the
libraries.
I'll take care of getting things started.

As the next steps I propose to:
1. Ask INFRA to rename https://git-wip-us.apache.org/ <
https://git-wip-us.apache.org/>

repos/asf?p=flink-
connectors.git;a=summary to "flink-libraries"
2. Ask INFRA to set up GitHub and travis integration for

"flink-libraries"

3. Put the code of "flink-ml", "flink-gelly", "flink-python",

"flink-cep",

"flink-scala-shell", "flink-storm" into the new repository. (I

decided
against moving flink-contrib there, because rocksdb is in the
contrib
module, for flink-table, I'm undecided, but I kept it in the main
repo
because its probably going to interact more with the core code in
the
future)
I try to preserve the history of those modules when splitting

them
into
the
new repo
4. I'll close all pull requests against those modules in the main

repo.
5. I'll set up a minimal documentation page for the library
repository,
similar to the main documentation.
6. I'll update the documentation build process to build both

documentations

& link them to each other
7. I'll update the nightly deployment process to include both

repositories

8. I'll update the release script to create the Flink release out

of
both
repositories. In order to put the libraries into the opt/ dir of
the
release, I'll need to change the build of "flink-dist" so that it
first
builds flink core, then the libraries and then the core again
with
the
libraries as an additional dependency.
The main question for the community is: do you agree with point

3 ?
Would
you like to include more or less?
I'll start with 1. and 2. tomorrow morning.



On Wed, Mar 15, 2017 at 1:48 PM, Till Rohrmann <

trohrm...@apache.org <mailto:trohrm...@apache.org>
wrote:
In theory we could have a merging bot which solves the problem
of
the
"commit window". Once the PR passes all tests and has enough
+1s,
the
bot
could do the merging and, thus, it effectively linearizes the
merge
process.
I think the second point is actually a disadvantage because

there
is

not
such an immediate incentive/pressure to fix the broken module if
it
lives
in a separate repository. Furthermore, breaking API changes in
the
core
will most likely go unnoticed for some time in other modules
which
are
not
developed so actively. In the worst case these things will only
be
noticed
when we try to make a release.
But I also agree that we are not Google and we don't have the

capacities to
maintain such a smooth a build process that we can keep all the
code
in
a

single repository.
I looked a bit into Gradle and as far as I can tell it offers

some
nice
features wrt incrementally building projects. This would be
beneficial
for
local development but it would not solve our build time problems
on
Travis.
Gradle intends to introduce a task result cache which allows to
reuse
results across builds. This could help when building on Travis,
however, it
is not yet fully implemented. Moreover, migrating from Maven to
Gradle
won't come for free (there's simply no free lunch out there) and
we
might
risk to introduce new bugs. Therefore, I would vote to split the
repository
in order to mitigate our current problems with Travis and the
build
time in
general. Whether to use a different build system or not can then
be
discussed as an orthogonal question.
Cheers,
Till

On Tue, Mar 14, 2017 at 8:05 PM, Stephan Ewen <se...@apache.org
<mailto:se...@apache.org>

wrote:
Some other thoughts on how repository split would help. I am
not
sure
for
all of them, so please comment:
- There is less competition for a "commit window". It happens

a
lot
already that you run all tests and want to commit, but there
was
a

commit
in the meantime. You rebase, need to re-test, again commit in
the
meantime.
    For a "linear" commit history, this may become a bottleneck

eventually

as well.

- There is less risk of broken master. If one

repository/modules
breaks
its master, the others can still continue.
Stephan


On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann <

trohrm...@apache.org <mailto:trohrm...@apache.org>>
wrote:
Thanks for all your input. In order to wrap the discussion up
I'd
like
to
summarize the mentioned points:
The problem of increasing build times and complexity of the

project
has
been acknowledged. Ideally we would have everything in one
repository
using
an incremental build tool. Since Maven does not properly

support
this
we
would have to switch our build tool to something like Gradle,
for
example.
Another option is introducing build profiles for different

sets
of

modules
as well as separating integration and unit tests. The third

alternative
would be creating sub-projects with their own repositories. I
actually
think that these two proposal are not necessarily exclusive
and
it

would
also make sense to have a separation between unit and
integration
tests
if
we split the respository.

The overall consensus seems to be that we don't want to split

the
community
and want to keep everything under the same umbrella. I think

this
is

the
right way to go, because otherwise some parts of the project
could
become
second class citizens. Given that and that we continue using
Maven,
I

still
think that creating sub-projects for the libraries, for

example,
could
be
beneficial. A split could reduce the project's complexity and
make
it
potentially easier for libraries to get actively developed.
The
main
concern is setting up the build infrastructure to aggregate
docs
from
multiple repositories and making them publicly available.
Since I started this thread and I would really like to see

Flink's
ML
library being revived again, I'd volunteer investigating first
whether
it
is doable establishing a proper incremental build for Flink.
If
that
should
not be possible, I will look into splitting the repository,

first
only
for
the libraries. I'll share my results with the community once

I'm
done
with
the investigation.

Cheers,
Till

On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger <

rmetz...@apache.org <mailto:rmetz...@apache.org>>
wrote:
@Jin Mingjian: You can not use the paid travis version for
open
source
projects. It only works for private repositories (at least
back
then
when
we've asked them about that).
@Stephan: I don't think that incremental builds will be

available
with
Maven anytime soon.
I agree that we need to fix the build time issue on Travis.

I've
recently
pushed a commit to use now three instead of two test groups.
But I don't think that this is feasible long-term solution.

If this discussion is only about reducing the build and test

time,
introducing build profiles for different components as
Aljoscha
suggested
would solve the problem Till mentioned.
Also, if we decide that travis is not a good tool anymore for

the
testing,
I guess we can find a different solution. There are now

competitors
to
Travis that might be willing to offer a paid plan for an open
source
project, or we set up our own infra on a server sponsored by
one
of

the
contributing companies.
If we want to solve "community issues" with the change as

well,
then
I
think its work the effort of splitting up Flink into
different
repositories.
Splitting up repositories is not a trivial task in my

opinion.
As

others
have mentioned before, we need to consider the following
things:
- How are we doing to build the documentation? Ideally every
repo
should
contain its docs, so we would need to pull them together when
building
the
main docs.
- How do organize the dependencies? If we have library

repository
depend
on

snapshot Flink versions, we need to make sure that the

snapshot
deployment
always works. This also means that people working on a

library
repository
will pull from snapshot OR need to build first locally.
- We need to update the release scripts

If we commit to do these changes, we need to assign at least

one
committer
(yes, in this case we need somebody who can commit, for

example
for
updating the buildbot stuff) who volunteers to do the change.
I've done a lot of infrastructure work in the past, but I'm

currently
pretty booked with many other things, so I don't
realistically
see
myself
doing that. Max who used to work on these things is taking
some
time
off.
I think we need, best case 3 days for the change, worst case
5
days.
The
problem is that there are no "unit tests" for the infra
stuff,


Reply via email to