On Thu, Oct 26, 2017 at 9:14 AM, Kenneth Knowles <k...@google.com> wrote:
> On Wed, Oct 25, 2017 at 6:51 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: >> >> These are just some ideas; I honestly don't know. I think my upper limit >> for default precommit feedback is probably 30 minutes, and even there I am >> not very happy. >> >> >> We should be able to get a good signal in less than 30 minutes. Lint, >> smoke >> tests, etc. I think there is a more extensive set of tests that should >> pass >> before merging. >> > > Yea, definitely. Digging into this in particular because I think it is > quite promising as a way to reduce build congestion. > > It looks like `mvn clean test -Dfindbugs.skip` would finish in 10 minutes > if we fixed it. Currently it looks like some modules don't work under this > invocation, probably requiring some state to be established by prior bits > run by `verify`. If we iterated on that until review was done, then 30-90 > minutes to prevent breakage would be fine, especially since a failure in > that longer slower build is probably a signal to add a quicker repro unit > test. > Looked into this, and some IOs explicitly use Guava 19, so presumably that is important, while the core Java SDK requires Guava 21. This is shaded away during `mvn verify` or `mvn install` but for `mvn test` there is no packaging or shading, so when the IO tests run with Guava 19 taking precedence, checkArguments within the Java SDK fail. This one might be fixable, but my 5 second takeaway is that our codebase cannot build against itself without some shading (or equivalent). Kenn > >> >> Also it might not hurt to audit our tests and see if there are a small >> number that take a disproportionate amount of time. >> >> Any other suggestions? >> >> Kenn >> >> On Mon, Oct 23, 2017 at 3:29 PM, Kenneth Knowles <k...@google.com> wrote: >> >> > I want to wait and get some green from Jenkins running against the HEAD >> > groovy scripts to confirm. I haven't sat at my desk long enough to see a >> > full `mvn -P release clean verify` yet. >> > >> > On Mon, Oct 23, 2017 at 3:27 PM, Kenneth Knowles <k...@google.com> >> wrote: >> > >> >> I was easily able to reproduce all the sorts of failures (and some >> more!) >> >> >> >> Here are some things that now work that didn't work, or didn't work >> >> correctly, before >> >> >> >> - mvn apache-rat:check >> >> - mvn -P release apache-rat:check >> >> - mvn -P release -f somewhere/else/pom.xml apache-rat:check >> >> >> >> It turns out we had these issues: >> >> >> >> - items in .gitignore that our RAT config did not ignore >> >> - sub-modules actually *did* inherit the RAT config, depending on the >> >> command you run >> >> - paths in the RAT exclude were relative to current dir >> >> - paths in our RAT exclude that aren't even part of our codebase or >> >> generated targets, but just things that Jenkins dropped in the >> workspace, >> >> and we were cloning directly into the workspace >> >> >> >> I think with the new Jenkins config and the new pom.xml things should >> >> work well. >> >> >> >> Incidentally, Valentyn, changes to Jenkins job DSL groovy scripts do >> not >> >> take effect from being merged but only when the seed job next runs. >> (TODO: >> >> fix this). I am still trying to get the relevant Jenkins UI pages to >> load >> >> to get your change incorporated into the live scripts, which were >> regressed >> >> by being run from an old PR (TODO: fix the fact that this can happen). >> >> >> >> Kenn >> >> >> >> On Mon, Oct 23, 2017 at 2:21 PM, Valentyn Tymofieiev < >> >> valen...@google.com.invalid> wrote: >> >> >> >>> Thanks a lot! >> >>> >> >>> Kenn, Were you able to reproduce RAT failures and test the fix >> locally? >> I >> >>> think "mvn clean verify -Prelease" still passes for me. >> >>> >> >>> Timeout for the test suite has been increased in >> >>> https://github.com/apache/ >> >>> beam/pull/4028 <https://github.com/apache/beam/pull/4028>. >> >>> >> >>> On Mon, Oct 23, 2017 at 2:10 PM, Kenneth Knowles >> <k...@google.com.invalid >> >>> > >> >>> wrote: >> >>> >> >>> > Wrong link - https://github.com/apache/beam/pull/4027 >> >>> > >> >>> > On Mon, Oct 23, 2017 at 2:10 PM, Kenneth Knowles <k...@google.com> >> >>> wrote: >> >>> > >> >>> > > Yea, root cause is the config bug I described. Proposed fix at >> >>> > > https://github.com/apache/beam/pull/4019/files. I'm working with >> >>> infra >> >>> > to >> >>> > > sort out other build issues that are probably not related. >> >>> > > >> >>> > > On Mon, Oct 23, 2017 at 1:53 PM, Lukasz Cwik >> >>> <lc...@google.com.invalid> >> >>> > > wrote: >> >>> > > >> >>> > >> The build breakage I outlined is being tracked in >> >>> > >> https://issues.apache.org/jira/browse/BEAM-3092 >> >>> > >> >> >>> > >> On Mon, Oct 23, 2017 at 11:54 AM, Lukasz Cwik <lc...@google.com> >> >>> wrote: >> >>> > >> >> >>> > >> > Another breaking change was caused by >> https://github.com/apache/ >> >>> > >> > beam/commit/241d3cedd5a8fd3f360b8ec2f3a8ef5001cbca98 because >> it >> >>> > changed >> >>> > >> > the build layout on the Jenkins server and our RAT rules >> expected >> >>> to >> >>> > >> apply >> >>> > >> > from a root directory. I pinged Kenneth Knowles about it and he >> >>> said >> >>> > he >> >>> > >> was >> >>> > >> > taking a look. >> >>> > >> > >> >>> > >> > On Mon, Oct 23, 2017 at 11:53 AM, Raghu Angadi >> >>> > >> <rang...@google.com.invalid >> >>> > >> > > wrote: >> >>> > >> > >> >>> > >> >> Regd (1) : >> >>> > >> >> >> >>> > >> >> [4] did have have a file without Apache Licence. It was fixed >> the >> >>> > next >> >>> > >> >> day ( >> >>> > >> >> commit >> >>> > >> >> <https://github.com/apache/beam/commit/249da9b8a1e86d0fe4c3d >> >>> > >> >> c7b83032ad38c3dcac0#diff-26b77e086ff8292ef54f12b22b7b767a>), >> >>> > >> >> thanks to Ken Knowles who pinged me about it. >> >>> > >> >> >> >>> > >> >> On Mon, Oct 23, 2017 at 11:45 AM, Valentyn Tymofieiev < >> >>> > >> >> valen...@google.com> >> >>> > >> >> wrote: >> >>> > >> >> >> >>> > >> >> > Hi Beam-Dev, >> >>> > >> >> > >> >>> > >> >> > It's been >5 days since the last successful run of a >> >>> > >> >> > beam_PreCommit_Java_MavenInstall build[1] and >4 days >> since >> >>> last >> >>> > >> >> > successful run of beam_PreCommit_Java_MavenInstall[2]. >> >>> > >> >> > >> >>> > >> >> > Looking at build logs I see following problems. >> >>> > >> >> > >> >>> > >> >> > 1. After October 17, postcommit builds started to fail with >> >>> > >> >> > >> >>> > >> >> > Failed to execute goal org.apache.rat:apache-rat-plug >> >>> in:0.12:check >> >>> > >> >> > (default) on project beam-parent: Too many files with >> >>> unapproved >> >>> > >> >> license: 1 >> >>> > >> >> > See RAT report in: /home/jenkins/jenkins-slave/wo >> >>> > >> >> > rkspace/beam_PostCommit_Java_MavenInstall/target/beam-paren >> >>> > >> >> > t-2.3.0-SNAPSHOT.rat >> >>> > >> >> > >> >>> > >> >> > The earliest build that I see this error is Postcommit #5052 >> >>> [3]. >> >>> > >> >> > >> >>> > >> >> > This makes me suspect [4] or [5] as a breaking change, since >> >>> they >> >>> > >> change >> >>> > >> >> > pom files. >> >>> > >> >> > >> >>> > >> >> > Questions: >> >>> > >> >> > - Is there a way we can reproduce this failure locally? mvn >> >>> clean >> >>> > >> verify >> >>> > >> >> > passes locally for me. >> >>> > >> >> > - Is there a way we can see the See RAT report mentioned in >> the >> >>> > error >> >>> > >> >> > log? >> >>> > >> >> > >> >>> > >> >> > 2. Prior to onset of #1 Java Precommit builds no longer >> >>> complete >> >>> > >> within >> >>> > >> >> > allotted 150 min time. Looking at [6-8] it seems the build >> >>> makes >> >>> > >> >> consistent >> >>> > >> >> > progress, but just does not finish on time. We can also see >> >>> several >> >>> > >> >> recent >> >>> > >> >> > successful builds with execution time very close to time out >> >>> > [9-11]. >> >>> > >> >> > >> >>> > >> >> > I'd like to propose to increase time limit for Java >> precommit >> >>> test >> >>> > >> suite >> >>> > >> >> > from 2.5 to 4 hours. 4 hours is long time. I agree that we >> >>> should >> >>> > >> >> > definitely try to reduce the test execution time, and reduce >> >>> > >> flakiness. >> >>> > >> >> > However we need the tests at least pass for now. If we write >> >>> off >> >>> > >> failed >> >>> > >> >> > test suites as 'flakes' and merge PRs without having a green >> >>> test >> >>> > >> >> signal, >> >>> > >> >> > we will have to spend more time tracing breakages such as >> #1. >> >>> > >> >> > >> >>> > >> >> > Thoughts? >> >>> > >> >> > >> >>> > >> >> > Thanks, >> >>> > >> >> > Valentyn >> >>> > >> >> > >> >>> > >> >> > [1] https://builds.apache.org/job/ >> >>> beam_PreCommit_Java_MavenInsta >> >>> > ll/ >> >>> > >> >> > [2] https://builds.apache.org/job/ >> >>> beam_PostCommit_Java_MavenInst >> >>> > all/ >> >>> > >> >> > [3] https://builds.apache.org/job/ >> >>> beam_PostCommit_Java_MavenInst >> >>> > >> >> > all/5052/changes >> >>> > >> >> > >> >>> > >> >> > [4] https://github.com/apache/beam >> /commit/d745cc9d8cc1735d3b >> >>> > >> >> > c3c67ba3e2617cb7f11a8c >> >>> > >> >> > [5] https://github.com/apache/beam >> >>> /commit/0d8ab6cbbc762dd9f9be1b >> >>> > >> >> > 3e9a26b6c9d0bb6dc3 >> >>> > >> >> > >> >>> > >> >> > [6] https://builds.apache.org/job/ >> >>> beam_PreCommit_Java_MavenInsta >> >>> > >> >> ll/15222/ >> >>> > >> >> > [7] https://builds.apache.org/job/ >> >>> beam_PreCommit_Java_MavenInsta >> >>> > >> >> ll/15195/ >> >>> > >> >> > [8] https://builds.apache.org/job/ >> >>> beam_PreCommit_Java_MavenInsta >> >>> > >> >> ll/15220/ >> >>> > >> >> > >> >>> > >> >> > [9] https://builds.apache.org/job/ >> >>> beam_PreCommit_Java_MavenInsta >> >>> > >> >> ll/15009/ >> >>> > >> >> > [10] https://builds.apache.org/job/ >> >>> beam_PreCommit_Java_MavenInsta >> >>> > >> >> ll/15068/ >> >>> > >> >> > [11] https://builds.apache.org/job/ >> >>> beam_PreCommit_Java_MavenInsta >> >>> > >> >> ll/15016/ >> >>> > >> >> > >> >>> > >> >> > >> >>> > >> >> >> >>> > >> > >> >>> > >> > >> >>> > >> >> >>> > > >> >>> > > >> >>> > >> >>> >> >> >> >> >> > >> > >