I went through and increased timeouts as needed to get the EUnit suite to pass on the current Jenkins setup:
https://github.com/apache/couchdb/pull/2087 <https://github.com/apache/couchdb/pull/2087> Happy to do it again on emulated workers. We aren’t green yet; Jenkins is now complaining about some timeouts in the ExUnit tests. I haven’t looked into those yet, other than to run them locally where I also encountered a test failure (not timeout-related, possibly an eventual consistency thing). Do we want those tests to be blocking the build at this point? Adam > On Jul 27, 2019, at 12:43 AM, Joan Touzet <woh...@apache.org> wrote: > > Actually, I never commited that change. We're still on actual ARM hardware > for the ARM build, and the couch_btree tests time out on that platform. > > https://github.com/apache/couchdb/blob/master/Jenkinsfile#L322 > > -Joan > > On 2019-07-26 7:23 p.m., Adam Kocoloski wrote: >> Great email. >> As a tactical step, does it make sense to back out the qemu-based builds >> from the main pipeline while we work on the timeout issues? >> Adam >>> On Jul 26, 2019, at 5:29 PM, Joan Touzet <woh...@apache.org> wrote: >>> >>> Hello again, >>> >>> Adam poked me on IRC today asking a few questions about the state of >>> Jenkins, and why we're not gnerating test binaries for download. >>> >>> The reason is simple: the tests are failing. >>> >>> I've discussed this topic before twice at length with little feedback: >>> >>> https://lists.apache.org/thread.html/6e2bedbbf5c2b28af4237d0936dc21f056fdafa2ea0c0b457285b9dc@%3Cdev.couchdb.apache.org%3E >>> >>> https://lists.apache.org/thread.html/16a310e3342d3f1ca73fb85f62829b76bbfa3759e418386b07e2827f@%3Cdev.couchdb.apache.org%3E >>> >>> >>> I have 4 specific proposals to get us back on track: >>> >>> 1. Get more targeted build workers for ppc64le and aarch64 platforms. >>> >>> This is critical while we wait for #4 below. By having >1 hardware >>> platform to build on for each of these, we can hopefully pass those >>> architectures regularly, and start building real downloads and Docker >>> images for each of these. I know the user community really wants this. >>> >>> If we get at least 2 of each worker, I'll change Jenkinsfile to use >>> those tagged workers rather than the qemu emulation we currently >>> have (and is failing). >>> >>> >>> 2. Receive and provision the new CouchDB Jenkins build machine. IBM is >>> being very generous in getting this set up, and Paul Davis mentioned >>> the machine should be ready in the very near future. >>> >>> Provisioning will have to include Docker + the qemu support. See >>> https://issues.apache.org/jira/browse/INFRA-18322 for details on that >>> and https://issues.apache.org/jira/browse/INFRA-17404 for the general >>> provisioning approach (we download Jenkins .jar from the ASF machine, >>> set it up to be `runit`-run on boot, run as many as we can on the >>> machine (I think the HW was selected to run 8 of these at once), >>> install the prerequisites, and request the 8x worker+password infos >>> from ASF Infra. >>> >>> We have a choice: do we set this up just as 8x Jenkins workers, or do >>> we also start running our own Jenkins master (potentially on >>> couchdb-vm2)? The motivation to do the latter would be to add >>> credentials that could be used for automatic uploading of binaries to >>> places like bintray and Docker. (I am currently engaged with Infra in >>> trying to solve this for many projects, including Apache OpenWhisk. >>> One of the major limiting factors is that the shared ASF Jenkins >>> master's credentials can be accessed by all users on the server. This >>> is obviously a security nightmare.) >>> >>> At the moment, we are "OK" using the ASF Jenkins master instance. But >>> as soon as we start depending on this service widely (see below) it'll >>> be very disruptive to take it down, even for a day or two. So it may >>> be best to make this decision sooner rather than later. >>> >>> I'll be in touch with Infra next week on the global "automated >>> binary builds" issue, and will ask for guidance at that time. >>> >>> 3. Switch our PR gate on GitHub from Travis CI to Jenkins CI. This way, >>> people won't be blocked on PRs waiting forever anymore, since we'll >>> have a lot of compute resources at our disposal. That said, >>> **PEOPLE HAVE TO START FIXING THE INTERMITTENT TEST CASE FAILURES** >>> or we'll be right back to "Hey, it didn't pass...I'll just click >>> Retry" again. 😒 🤢 This will have to be a team effort. >>> >>> 4. Get rid of all timeouts in all test cases. A few proposals for this >>> were made in the context of ExUnit. Can we get some more progress >>> here? >>> >>> https://github.com/apache/couchdb/issues/2030 >>> https://github.com/apache/couchdb/pull/2039 >>> >>> 5. Once 4 is done, we can consider moving aarch64/ppc64le/other binary >>> builds to qemu support, meaning we can test all platforms just on >>> simple x86_64 machines. It's not a required move, but if we lose >>> access to the other platforms, or they go down, it's a backup >>> strategy. >>> >>> What do people think?