These look like the timeout issues I discovered in the build logs for
0.8.2-2, see [1] for Daniel's report and [2] for my analysis, plus my
two follow up mails if jessie-backports and hurd-i386 matter. I'm going
to quote that mail below.

Just a little clarification ahead: I've since confirmed that the issue
was definitely the combination of parallel=32 and a pretty short timeout
for the flock calls. All test cases got started simultaneously but had
to wait their turn for the port(s) used in the tests (that's what the
flock calls around Apache are for), effectively making the flock timeout
(30 seconds by default) a timeout for the whole test suite except the
last test case to complete.

[1]
https://lists.gnupg.org/pipermail/mod_gnutls-devel/2017-February/000185.html
[2]
https://lists.gnupg.org/pipermail/mod_gnutls-devel/2017-February/000186.html


Looking at both build logs my best guess is that the tests are hitting
timeouts, both when getting the server instance locks (used to prevent
port conflicts) and running HTTPS requests against them. The log for
test-15_basic_msva.bash on sparc64 even looks like the timeout hit in
the middle of request handling.

I've pushed three related commits to master (the attached patch combines
all three), in order:

8184ad0eda43102d24d7cf158710e0109c7e293b Test suite: Run flock with
"--verbose" to log timeouts
bbfcbb57c2cb89790316b66bdd475c3a64ca4080 Test suite: Log if a process to
be stopped by PID file is not running
6c030c14da928da3e05f232e3fb8dd5aa98c659a Test suite: Make timeouts for
server locks and HTTPS requests configurable

Commits 8184ad0eda43102d24d7cf158710e0109c7e293b and
bbfcbb57c2cb89790316b66bdd475c3a64ca4080 just make the logging more
detailed to show if the tests are really hitting timeouts.

To change the timeouts, you can either cherry-pick
6c030c14da928da3e05f232e3fb8dd5aa98c659a and pass TEST_LOCK_WAIT and
TEST_QUERY_TIMEOUT with large values (e.g. 300 or even 600, default is
30 [seconds]) to ./configure, or patch the values of lock_wait and
TEST_QUERY_DELAY in test/Makefile.am (I've replaced/renamed the
variables in the patch).

> I can make force the entire build to not be parallel if you think it
> would be useful to test.  While i see "parallel=32" in the sparc64
> transcript, i don't see a "parallel=" note at all in ppc64.

Running in parallel definitely makes hitting the timeouts more likely,
but shouldn't cause any additional problems. You can force just the test
part of the build to run serially by passing --disable-flock to
./configure, which will add the special target .NOTPARALLEL to
test/Makefile. I'd prefer if you try the patches first, though.

Reply via email to