Thanks Adam! I especially want to recognize the support of the
Packet.net "Works on ARM" team, who drove the aarch64 CI/build support,
which in turn created the scripting necessary for multi-arch CI runs.

So if I commit a PR that ups all timeouts by 10x, what's the worst that
will happen? Tests that already have a very high timeout may take
significantly longer to timeout. For instance, attachment write timeouts
are currently 10s, which would become 100s.

I guess the worst that can happen is we find some badly written test
cases... ;)

Since this is the easiest path forward, I'll give it a shot and report
back. It may be...a large set of PRs, though, since we're talking about
fixing the timeouts across all the sub-repos, unless I can work out how
to tweak eunit's 5s default timeout at a global level.

-Joan

On 2019-05-09 15:34, Adam Kocoloski wrote:
> This is indeed great news.
> 
> I’m afraid I don’t have a strong opinion on how to make the tests more 
> resilient to running in an emulated environment. Any of your suggestions are 
> acceptable to me, assuming we can keep ppc64le support in scope.
> 
> Adam
> 
>> On May 4, 2019, at 12:08 AM, Joan Touzet <woh...@apache.org> wrote:
>>
>> Hi again,
>>
>> With the support of ASF Infra, we now are in a position to run arbitrary 
>> alternative platforms in our CI builds. This is great news for the people 
>> who have been waiting for ARM (aarch64), PowerPC/POWER (ppc64le), mainframe 
>> (s390x), and other architecture supports.
>>
>> However, we have a challenge. Emulating other architectures is necessarily 
>> slower than running natively on provided hardware. I ran some measurements 
>> today, and found that we're not able to pass our test suites because tests 
>> are timing out.
>>
>> For example, on native x86_64 hardware, this test finishes in under half a 
>> second:
>>
>>    b64url_tests: encode_binary_test...[0.372s] ok
>>
>> Running the same test on aarch64, being emulated on the same x86_64 machine:
>>
>>    b64url_tests: encode_binary_test...[4.493 s] ok
>>
>> or 12x longer. The next test (decode_iolist_test) fails, presumably because 
>> it hits a 5s timeout period.
>>
>> I need advice from the list. Should we:
>>
>> * Increase the test timeouts significantly so these tests can complete
>> * Restrict ourselves to running only on actual hardware (which limits us
>>  only to aarch64, and at a stretch, ppc64le)
>> * Remove test timeouts entirely, rewriting the ones that wait forever to
>>  do something different
>> * Something else that I haven't mentioned
>>
>> Having regression testing on a variety of platforms is something that is of 
>> benefit to the project; we've ignored it for too long.
>>
>> -Joan
> 

Reply via email to