Thanks Adam! I especially want to recognize the support of the Packet.net "Works on ARM" team, who drove the aarch64 CI/build support, which in turn created the scripting necessary for multi-arch CI runs.
So if I commit a PR that ups all timeouts by 10x, what's the worst that will happen? Tests that already have a very high timeout may take significantly longer to timeout. For instance, attachment write timeouts are currently 10s, which would become 100s. I guess the worst that can happen is we find some badly written test cases... ;) Since this is the easiest path forward, I'll give it a shot and report back. It may be...a large set of PRs, though, since we're talking about fixing the timeouts across all the sub-repos, unless I can work out how to tweak eunit's 5s default timeout at a global level. -Joan On 2019-05-09 15:34, Adam Kocoloski wrote: > This is indeed great news. > > I’m afraid I don’t have a strong opinion on how to make the tests more > resilient to running in an emulated environment. Any of your suggestions are > acceptable to me, assuming we can keep ppc64le support in scope. > > Adam > >> On May 4, 2019, at 12:08 AM, Joan Touzet <woh...@apache.org> wrote: >> >> Hi again, >> >> With the support of ASF Infra, we now are in a position to run arbitrary >> alternative platforms in our CI builds. This is great news for the people >> who have been waiting for ARM (aarch64), PowerPC/POWER (ppc64le), mainframe >> (s390x), and other architecture supports. >> >> However, we have a challenge. Emulating other architectures is necessarily >> slower than running natively on provided hardware. I ran some measurements >> today, and found that we're not able to pass our test suites because tests >> are timing out. >> >> For example, on native x86_64 hardware, this test finishes in under half a >> second: >> >> b64url_tests: encode_binary_test...[0.372s] ok >> >> Running the same test on aarch64, being emulated on the same x86_64 machine: >> >> b64url_tests: encode_binary_test...[4.493 s] ok >> >> or 12x longer. The next test (decode_iolist_test) fails, presumably because >> it hits a 5s timeout period. >> >> I need advice from the list. Should we: >> >> * Increase the test timeouts significantly so these tests can complete >> * Restrict ourselves to running only on actual hardware (which limits us >> only to aarch64, and at a stretch, ppc64le) >> * Remove test timeouts entirely, rewriting the ones that wait forever to >> do something different >> * Something else that I haven't mentioned >> >> Having regression testing on a variety of platforms is something that is of >> benefit to the project; we've ignored it for too long. >> >> -Joan >