I have been able to get every IT to pass at least once except the following
ACCUMULO-4362 <https://issues.apache.org/jira/browse/ACCUMULO-4362> ACCUMULO-4397 <https://issues.apache.org/jira/browse/ACCUMULO-4397> These are moved back to release 1.8.0 and are blockers. On Fri, Aug 5, 2016 at 10:29 AM, Josh Elser <josh.el...@gmail.com> wrote: > Sean Busbey wrote: > >> On Wed, Aug 3, 2016 at 5:17 PM, Christopher<ctubb...@apache.org> wrote: >> >>> On Wed, Aug 3, 2016 at 5:47 PM Sean Busbey<bus...@cloudera.com> wrote: >>> >>> My understanding was that maintenance releases (aka double dot, e.g. >>>> 1.7.2) had relaxed criteria because we expected the scope of changes >>>> in them to be more limited. Even so, the release notes for 1.7.2, >>>> 1.7.1, and 1.7.0 all claim the ITs passed. >>>> >>>> >>>> Even those releases have periodic IT failure. >>> >>> >>> Is there a reason we can't parallelize the ITs? >>>> >>> >>> We can. Eric's mrit effort was all intended towards that. But, that's not >>> the same as CI passing. I don't know what it would take to parallelize >>> them >>> in a CI server. >>> >>> >>> What's stopping >>>> builds.a.o from running them? Specific requests from projects to asf >>>> infra can get us resources if that's the problem. >>>> >>>> >>>> I spoke to infra in HipChat about this a a few weeks ago, and mentioned >>> a >>> few things which impact builds on ASF jenkins (builds.apache.org): >>> >>> 1. Accumulo has an excessive number of tests to run. >>> 2. Build timeouts with Jenkins can abort builds. >>> 3. Tests are timing sensitive, and are affected by VM/host configuration >>> and contention with other concurrent builds from other projects. >>> 4. Tests need lots of RAM and storage (at least 4GB RAM, but ideally no >>> less than 16GB, and at least 6 GB for a workspace) >>> 5. Tests need specialized system configuration, (increasing ulimits, >>> optimizing kernel settings for swappiness, etc.) >>> >>> What we really need for reliable IT passing in CI, is exclusive use of >>> dedicated, bare-metal beefy build machines, for 6+ hours per build x 4 >>> branches minimum, plus another 6+ hours for each pull request and other >>> builds which skipITs, so we can get immediate feedback on unit tests and >>> compilation errors. >>> >>> >> I took a first pass at a nightly (~once per 12 hours) job on asf build for >> master and it did okay, considering that I haven't spent any time trying >> to >> tune anything: >> >> https://builds.apache.org/job/Accumulo-master-IT/1/ >> >> 2 hr 9 min, 7 failures out of 202 tests. >> >> I think we can do this; if anyone else is interested I'll start a new >> thread >> where we can discuss. >> > > +1 it would be great to do this on ASF infra. >