Hi all,

I wanted to circle back on this. We've had continual quasi-outages, where
Jenkins occasionally schedules jobs but not reliably. Meanwhile, many of
our most important jobs continued to hit their timeouts when they did get
scheduled. On top of that, I've run the wrong command once before merging a
PR, resulting in a couple broken precommit runs.

Just so everyone knows the actions that have been taken and what we still
need to do:

 - Java postcommit and precommit timeouts bumped to 240 minutes (!!)
 - Seed job made independent of other jobs, so we can experiment safely
 - RAT config fixed

The latency of precommit is still prohibitive for effective use. Not only
are we waiting too long, but the number of workers needed to avoid backlog
is excessive. Here are some ideas for things we can do:

 - Proceed further on the pipeline job, which is what we want to do long
term
 - Precommit: run only on demand, so workers are not congested by automatic
builds; we could also pick and choose what we want to run
 - Consider non-Maven build orchestrators that can do dependency-driven
builds
 - Precommit: run fewer tests; I think this would mean leaving out some
modules that we are OK finding issues in the postcommit, or making them on
demand only
 - ValidatesRunner tests: Instead of running a single mavenJob, run
multiple, once which just installs while skipping everything else, then
follow it with just running the tests we care about
 - Examples integration tests: separate from the maven precommit; also run
as sequenced invocations

These are just some ideas; I honestly don't know. I think my upper limit
for default precommit feedback is probably 30 minutes, and even there I am
not very happy.

Any other suggestions?

Kenn

On Mon, Oct 23, 2017 at 3:29 PM, Kenneth Knowles <k...@google.com> wrote:

> I want to wait and get some green from Jenkins running against the HEAD
> groovy scripts to confirm. I haven't sat at my desk long enough to see a
> full `mvn -P release clean verify` yet.
>
> On Mon, Oct 23, 2017 at 3:27 PM, Kenneth Knowles <k...@google.com> wrote:
>
>> I was easily able to reproduce all the sorts of failures (and some more!)
>>
>> Here are some things that now work that didn't work, or didn't work
>> correctly, before
>>
>>  - mvn apache-rat:check
>>  - mvn -P release apache-rat:check
>>  - mvn -P release -f somewhere/else/pom.xml apache-rat:check
>>
>> It turns out we had these issues:
>>
>>  - items in .gitignore that our RAT config did not ignore
>>  - sub-modules actually *did* inherit the RAT config, depending on the
>> command you run
>>  - paths in the RAT exclude were relative to current dir
>>  - paths in our RAT exclude that aren't even part of our codebase or
>> generated targets, but just things that Jenkins dropped in the workspace,
>> and we were cloning directly into the workspace
>>
>> I think with the new Jenkins config and the new pom.xml things should
>> work well.
>>
>> Incidentally, Valentyn, changes to Jenkins job DSL groovy scripts do not
>> take effect from being merged but only when the seed job next runs. (TODO:
>> fix this). I am still trying to get the relevant Jenkins UI pages to load
>> to get your change incorporated into the live scripts, which were regressed
>> by being run from an old PR (TODO: fix the fact that this can happen).
>>
>> Kenn
>>
>> On Mon, Oct 23, 2017 at 2:21 PM, Valentyn Tymofieiev <
>> valen...@google.com.invalid> wrote:
>>
>>> Thanks a lot!
>>>
>>> Kenn, Were you able to reproduce RAT failures and test the fix locally? I
>>> think "mvn clean verify -Prelease" still passes for me.
>>>
>>> Timeout for the test suite has been increased in
>>> https://github.com/apache/
>>> beam/pull/4028 <https://github.com/apache/beam/pull/4028>.
>>>
>>> On Mon, Oct 23, 2017 at 2:10 PM, Kenneth Knowles <k...@google.com.invalid
>>> >
>>> wrote:
>>>
>>> > Wrong link - https://github.com/apache/beam/pull/4027
>>> >
>>> > On Mon, Oct 23, 2017 at 2:10 PM, Kenneth Knowles <k...@google.com>
>>> wrote:
>>> >
>>> > > Yea, root cause is the config bug I described. Proposed fix at
>>> > > https://github.com/apache/beam/pull/4019/files. I'm working with
>>> infra
>>> > to
>>> > > sort out other build issues that are probably not related.
>>> > >
>>> > > On Mon, Oct 23, 2017 at 1:53 PM, Lukasz Cwik
>>> <lc...@google.com.invalid>
>>> > > wrote:
>>> > >
>>> > >> The build breakage I outlined is being tracked in
>>> > >> https://issues.apache.org/jira/browse/BEAM-3092
>>> > >>
>>> > >> On Mon, Oct 23, 2017 at 11:54 AM, Lukasz Cwik <lc...@google.com>
>>> wrote:
>>> > >>
>>> > >> > Another breaking change was caused by https://github.com/apache/
>>> > >> > beam/commit/241d3cedd5a8fd3f360b8ec2f3a8ef5001cbca98 because it
>>> > changed
>>> > >> > the build layout on the Jenkins server and our RAT rules expected
>>> to
>>> > >> apply
>>> > >> > from a root directory. I pinged Kenneth Knowles about it and he
>>> said
>>> > he
>>> > >> was
>>> > >> > taking a look.
>>> > >> >
>>> > >> > On Mon, Oct 23, 2017 at 11:53 AM, Raghu Angadi
>>> > >> <rang...@google.com.invalid
>>> > >> > > wrote:
>>> > >> >
>>> > >> >> Regd (1) :
>>> > >> >>
>>> > >> >> [4] did have have a file without Apache Licence. It was fixed the
>>> > next
>>> > >> >> day (
>>> > >> >> commit
>>> > >> >> <https://github.com/apache/beam/commit/249da9b8a1e86d0fe4c3d
>>> > >> >> c7b83032ad38c3dcac0#diff-26b77e086ff8292ef54f12b22b7b767a>),
>>> > >> >> thanks to Ken Knowles who pinged me about it.
>>> > >> >>
>>> > >> >> On Mon, Oct 23, 2017 at 11:45 AM, Valentyn Tymofieiev <
>>> > >> >> valen...@google.com>
>>> > >> >> wrote:
>>> > >> >>
>>> > >> >> > Hi Beam-Dev,
>>> > >> >> >
>>> > >> >> > It's been >5 days since the last successful run of a
>>> > >> >> > beam_PreCommit_Java_MavenInstall build[1]  and >4 days since
>>> last
>>> > >> >> > successful run of beam_PreCommit_Java_MavenInstall[2].
>>> > >> >> >
>>> > >> >> > Looking at build logs I see following problems.
>>> > >> >> >
>>> > >> >> > 1. After October 17, postcommit builds started to fail with
>>> > >> >> >
>>> > >> >> > Failed to execute goal org.apache.rat:apache-rat-plug
>>> in:0.12:check
>>> > >> >> > (default) on project beam-parent: Too many files with
>>> unapproved
>>> > >> >> license: 1
>>> > >> >> > See RAT report in: /home/jenkins/jenkins-slave/wo
>>> > >> >> > rkspace/beam_PostCommit_Java_MavenInstall/target/beam-paren
>>> > >> >> > t-2.3.0-SNAPSHOT.rat
>>> > >> >> >
>>> > >> >> > The earliest build that I see this error is Postcommit #5052
>>> [3].
>>> > >> >> >
>>> > >> >> > This makes me suspect [4] or [5] as a breaking change, since
>>> they
>>> > >> change
>>> > >> >> > pom files.
>>> > >> >> >
>>> > >> >> > Questions:
>>> > >> >> > - Is there a way we can reproduce this failure locally? mvn
>>> clean
>>> > >> verify
>>> > >> >> > passes locally for me.
>>> > >> >> > - Is there a way we can see the See RAT report mentioned in the
>>> > error
>>> > >> >> > log?
>>> > >> >> >
>>> > >> >> > 2. Prior to onset of #1 Java Precommit builds no longer
>>> complete
>>> > >> within
>>> > >> >> > allotted 150 min time. Looking at [6-8] it seems the build
>>> makes
>>> > >> >> consistent
>>> > >> >> > progress, but just does not finish on time. We can also see
>>> several
>>> > >> >> recent
>>> > >> >> > successful builds with execution time very close to time out
>>> > [9-11].
>>> > >> >> >
>>> > >> >> > I'd like to propose to increase time limit for Java precommit
>>> test
>>> > >> suite
>>> > >> >> > from 2.5 to 4 hours. 4 hours is long time. I agree that we
>>> should
>>> > >> >> > definitely try to reduce the test execution time, and reduce
>>> > >> flakiness.
>>> > >> >> > However we need the tests at least pass for now. If we write
>>> off
>>> > >> failed
>>> > >> >> > test suites as 'flakes' and merge PRs without having a green
>>> test
>>> > >> >> signal,
>>> > >> >> > we will have to spend more time tracing breakages such as #1.
>>> > >> >> >
>>> > >> >> > Thoughts?
>>> > >> >> >
>>> > >> >> > Thanks,
>>> > >> >> > Valentyn
>>> > >> >> >
>>> > >> >> > [1] https://builds.apache.org/job/
>>> beam_PreCommit_Java_MavenInsta
>>> > ll/
>>> > >> >> > [2] https://builds.apache.org/job/
>>> beam_PostCommit_Java_MavenInst
>>> > all/
>>> > >> >> > [3] https://builds.apache.org/job/
>>> beam_PostCommit_Java_MavenInst
>>> > >> >> > all/5052/changes
>>> > >> >> >
>>> > >> >> > [4] https://github.com/apache/beam/commit/d745cc9d8cc1735d3b
>>> > >> >> > c3c67ba3e2617cb7f11a8c
>>> > >> >> > [5] https://github.com/apache/beam
>>> /commit/0d8ab6cbbc762dd9f9be1b
>>> > >> >> > 3e9a26b6c9d0bb6dc3
>>> > >> >> >
>>> > >> >> > [6] https://builds.apache.org/job/
>>> beam_PreCommit_Java_MavenInsta
>>> > >> >> ll/15222/
>>> > >> >> > [7] https://builds.apache.org/job/
>>> beam_PreCommit_Java_MavenInsta
>>> > >> >> ll/15195/
>>> > >> >> > [8] https://builds.apache.org/job/
>>> beam_PreCommit_Java_MavenInsta
>>> > >> >> ll/15220/
>>> > >> >> >
>>> > >> >> > [9] https://builds.apache.org/job/
>>> beam_PreCommit_Java_MavenInsta
>>> > >> >> ll/15009/
>>> > >> >> > [10] https://builds.apache.org/job/
>>> beam_PreCommit_Java_MavenInsta
>>> > >> >> ll/15068/
>>> > >> >> > [11] https://builds.apache.org/job/
>>> beam_PreCommit_Java_MavenInsta
>>> > >> >> ll/15016/
>>> > >> >> >
>>> > >> >> >
>>> > >> >>
>>> > >> >
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> >
>>>
>>
>>
>

Reply via email to