On Wed, Oct 25, 2017 at 3:59 PM, Lukasz Cwik <lc...@google.com.invalid> wrote:
> Another suggestion would be to break apart the project into multiple maven > projects which are released on independent schedules. > I think at some point this makes a lot of sense. Libraries of transforms built against a stable SDK like Java should be OK, and this is a big chunk of the build time. Fine to move to postcommit / selective precommit too. Kenn > Does anyone have any data points for what has worked for other open source > communities in the past? > I only have anecdotal evidence that Apache Ant+Ivy has worked well as it > was much faster then Apache Maven. > > On Wed, Oct 25, 2017 at 1:03 PM, Kenneth Knowles <k...@google.com.invalid> > wrote: > > > Hi all, > > > > I wanted to circle back on this. We've had continual quasi-outages, where > > Jenkins occasionally schedules jobs but not reliably. Meanwhile, many of > > our most important jobs continued to hit their timeouts when they did get > > scheduled. On top of that, I've run the wrong command once before > merging a > > PR, resulting in a couple broken precommit runs. > > > > Just so everyone knows the actions that have been taken and what we still > > need to do: > > > > - Java postcommit and precommit timeouts bumped to 240 minutes (!!) > > - Seed job made independent of other jobs, so we can experiment safely > > - RAT config fixed > > > > The latency of precommit is still prohibitive for effective use. Not only > > are we waiting too long, but the number of workers needed to avoid > backlog > > is excessive. Here are some ideas for things we can do: > > > > - Proceed further on the pipeline job, which is what we want to do long > > term > > - Precommit: run only on demand, so workers are not congested by > automatic > > builds; we could also pick and choose what we want to run > > - Consider non-Maven build orchestrators that can do dependency-driven > > builds > > - Precommit: run fewer tests; I think this would mean leaving out some > > modules that we are OK finding issues in the postcommit, or making them > on > > demand only > > - ValidatesRunner tests: Instead of running a single mavenJob, run > > multiple, once which just installs while skipping everything else, then > > follow it with just running the tests we care about > > - Examples integration tests: separate from the maven precommit; also > run > > as sequenced invocations > > > > These are just some ideas; I honestly don't know. I think my upper limit > > for default precommit feedback is probably 30 minutes, and even there I > am > > not very happy. > > > > Any other suggestions? > > > > Kenn > > > > On Mon, Oct 23, 2017 at 3:29 PM, Kenneth Knowles <k...@google.com> wrote: > > > > > I want to wait and get some green from Jenkins running against the HEAD > > > groovy scripts to confirm. I haven't sat at my desk long enough to see > a > > > full `mvn -P release clean verify` yet. > > > > > > On Mon, Oct 23, 2017 at 3:27 PM, Kenneth Knowles <k...@google.com> > wrote: > > > > > >> I was easily able to reproduce all the sorts of failures (and some > > more!) > > >> > > >> Here are some things that now work that didn't work, or didn't work > > >> correctly, before > > >> > > >> - mvn apache-rat:check > > >> - mvn -P release apache-rat:check > > >> - mvn -P release -f somewhere/else/pom.xml apache-rat:check > > >> > > >> It turns out we had these issues: > > >> > > >> - items in .gitignore that our RAT config did not ignore > > >> - sub-modules actually *did* inherit the RAT config, depending on the > > >> command you run > > >> - paths in the RAT exclude were relative to current dir > > >> - paths in our RAT exclude that aren't even part of our codebase or > > >> generated targets, but just things that Jenkins dropped in the > > workspace, > > >> and we were cloning directly into the workspace > > >> > > >> I think with the new Jenkins config and the new pom.xml things should > > >> work well. > > >> > > >> Incidentally, Valentyn, changes to Jenkins job DSL groovy scripts do > not > > >> take effect from being merged but only when the seed job next runs. > > (TODO: > > >> fix this). I am still trying to get the relevant Jenkins UI pages to > > load > > >> to get your change incorporated into the live scripts, which were > > regressed > > >> by being run from an old PR (TODO: fix the fact that this can happen). > > >> > > >> Kenn > > >> > > >> On Mon, Oct 23, 2017 at 2:21 PM, Valentyn Tymofieiev < > > >> valen...@google.com.invalid> wrote: > > >> > > >>> Thanks a lot! > > >>> > > >>> Kenn, Were you able to reproduce RAT failures and test the fix > > locally? I > > >>> think "mvn clean verify -Prelease" still passes for me. > > >>> > > >>> Timeout for the test suite has been increased in > > >>> https://github.com/apache/ > > >>> beam/pull/4028 <https://github.com/apache/beam/pull/4028>. > > >>> > > >>> On Mon, Oct 23, 2017 at 2:10 PM, Kenneth Knowles > > <k...@google.com.invalid > > >>> > > > >>> wrote: > > >>> > > >>> > Wrong link - https://github.com/apache/beam/pull/4027 > > >>> > > > >>> > On Mon, Oct 23, 2017 at 2:10 PM, Kenneth Knowles <k...@google.com> > > >>> wrote: > > >>> > > > >>> > > Yea, root cause is the config bug I described. Proposed fix at > > >>> > > https://github.com/apache/beam/pull/4019/files. I'm working with > > >>> infra > > >>> > to > > >>> > > sort out other build issues that are probably not related. > > >>> > > > > >>> > > On Mon, Oct 23, 2017 at 1:53 PM, Lukasz Cwik > > >>> <lc...@google.com.invalid> > > >>> > > wrote: > > >>> > > > > >>> > >> The build breakage I outlined is being tracked in > > >>> > >> https://issues.apache.org/jira/browse/BEAM-3092 > > >>> > >> > > >>> > >> On Mon, Oct 23, 2017 at 11:54 AM, Lukasz Cwik <lc...@google.com > > > > >>> wrote: > > >>> > >> > > >>> > >> > Another breaking change was caused by > > https://github.com/apache/ > > >>> > >> > beam/commit/241d3cedd5a8fd3f360b8ec2f3a8ef5001cbca98 because > it > > >>> > changed > > >>> > >> > the build layout on the Jenkins server and our RAT rules > > expected > > >>> to > > >>> > >> apply > > >>> > >> > from a root directory. I pinged Kenneth Knowles about it and > he > > >>> said > > >>> > he > > >>> > >> was > > >>> > >> > taking a look. > > >>> > >> > > > >>> > >> > On Mon, Oct 23, 2017 at 11:53 AM, Raghu Angadi > > >>> > >> <rang...@google.com.invalid > > >>> > >> > > wrote: > > >>> > >> > > > >>> > >> >> Regd (1) : > > >>> > >> >> > > >>> > >> >> [4] did have have a file without Apache Licence. It was fixed > > the > > >>> > next > > >>> > >> >> day ( > > >>> > >> >> commit > > >>> > >> >> <https://github.com/apache/beam/commit/249da9b8a1e86d0fe4c3d > > >>> > >> >> c7b83032ad38c3dcac0#diff-26b77e086ff8292ef54f12b22b7b767a>), > > >>> > >> >> thanks to Ken Knowles who pinged me about it. > > >>> > >> >> > > >>> > >> >> On Mon, Oct 23, 2017 at 11:45 AM, Valentyn Tymofieiev < > > >>> > >> >> valen...@google.com> > > >>> > >> >> wrote: > > >>> > >> >> > > >>> > >> >> > Hi Beam-Dev, > > >>> > >> >> > > > >>> > >> >> > It's been >5 days since the last successful run of a > > >>> > >> >> > beam_PreCommit_Java_MavenInstall build[1] and >4 days > since > > >>> last > > >>> > >> >> > successful run of beam_PreCommit_Java_MavenInstall[2]. > > >>> > >> >> > > > >>> > >> >> > Looking at build logs I see following problems. > > >>> > >> >> > > > >>> > >> >> > 1. After October 17, postcommit builds started to fail with > > >>> > >> >> > > > >>> > >> >> > Failed to execute goal org.apache.rat:apache-rat-plug > > >>> in:0.12:check > > >>> > >> >> > (default) on project beam-parent: Too many files with > > >>> unapproved > > >>> > >> >> license: 1 > > >>> > >> >> > See RAT report in: /home/jenkins/jenkins-slave/wo > > >>> > >> >> > rkspace/beam_PostCommit_Java_ > MavenInstall/target/beam-paren > > >>> > >> >> > t-2.3.0-SNAPSHOT.rat > > >>> > >> >> > > > >>> > >> >> > The earliest build that I see this error is Postcommit > #5052 > > >>> [3]. > > >>> > >> >> > > > >>> > >> >> > This makes me suspect [4] or [5] as a breaking change, > since > > >>> they > > >>> > >> change > > >>> > >> >> > pom files. > > >>> > >> >> > > > >>> > >> >> > Questions: > > >>> > >> >> > - Is there a way we can reproduce this failure locally? mvn > > >>> clean > > >>> > >> verify > > >>> > >> >> > passes locally for me. > > >>> > >> >> > - Is there a way we can see the See RAT report mentioned in > > the > > >>> > error > > >>> > >> >> > log? > > >>> > >> >> > > > >>> > >> >> > 2. Prior to onset of #1 Java Precommit builds no longer > > >>> complete > > >>> > >> within > > >>> > >> >> > allotted 150 min time. Looking at [6-8] it seems the build > > >>> makes > > >>> > >> >> consistent > > >>> > >> >> > progress, but just does not finish on time. We can also see > > >>> several > > >>> > >> >> recent > > >>> > >> >> > successful builds with execution time very close to time > out > > >>> > [9-11]. > > >>> > >> >> > > > >>> > >> >> > I'd like to propose to increase time limit for Java > precommit > > >>> test > > >>> > >> suite > > >>> > >> >> > from 2.5 to 4 hours. 4 hours is long time. I agree that we > > >>> should > > >>> > >> >> > definitely try to reduce the test execution time, and > reduce > > >>> > >> flakiness. > > >>> > >> >> > However we need the tests at least pass for now. If we > write > > >>> off > > >>> > >> failed > > >>> > >> >> > test suites as 'flakes' and merge PRs without having a > green > > >>> test > > >>> > >> >> signal, > > >>> > >> >> > we will have to spend more time tracing breakages such as > #1. > > >>> > >> >> > > > >>> > >> >> > Thoughts? > > >>> > >> >> > > > >>> > >> >> > Thanks, > > >>> > >> >> > Valentyn > > >>> > >> >> > > > >>> > >> >> > [1] https://builds.apache.org/job/ > > >>> beam_PreCommit_Java_MavenInsta > > >>> > ll/ > > >>> > >> >> > [2] https://builds.apache.org/job/ > > >>> beam_PostCommit_Java_MavenInst > > >>> > all/ > > >>> > >> >> > [3] https://builds.apache.org/job/ > > >>> beam_PostCommit_Java_MavenInst > > >>> > >> >> > all/5052/changes > > >>> > >> >> > > > >>> > >> >> > [4] https://github.com/apache/ > beam/commit/d745cc9d8cc1735d3b > > >>> > >> >> > c3c67ba3e2617cb7f11a8c > > >>> > >> >> > [5] https://github.com/apache/beam > > >>> /commit/0d8ab6cbbc762dd9f9be1b > > >>> > >> >> > 3e9a26b6c9d0bb6dc3 > > >>> > >> >> > > > >>> > >> >> > [6] https://builds.apache.org/job/ > > >>> beam_PreCommit_Java_MavenInsta > > >>> > >> >> ll/15222/ > > >>> > >> >> > [7] https://builds.apache.org/job/ > > >>> beam_PreCommit_Java_MavenInsta > > >>> > >> >> ll/15195/ > > >>> > >> >> > [8] https://builds.apache.org/job/ > > >>> beam_PreCommit_Java_MavenInsta > > >>> > >> >> ll/15220/ > > >>> > >> >> > > > >>> > >> >> > [9] https://builds.apache.org/job/ > > >>> beam_PreCommit_Java_MavenInsta > > >>> > >> >> ll/15009/ > > >>> > >> >> > [10] https://builds.apache.org/job/ > > >>> beam_PreCommit_Java_MavenInsta > > >>> > >> >> ll/15068/ > > >>> > >> >> > [11] https://builds.apache.org/job/ > > >>> beam_PreCommit_Java_MavenInsta > > >>> > >> >> ll/15016/ > > >>> > >> >> > > > >>> > >> >> > > > >>> > >> >> > > >>> > >> > > > >>> > >> > > > >>> > >> > > >>> > > > > >>> > > > > >>> > > > >>> > > >> > > >> > > > > > >