On Mon, Feb 11, 2019 at 05:03:13PM +0000, Daniel P. Berrangé wrote: > On Fri, Feb 08, 2019 at 11:44:42AM +0000, Peter Maydell wrote: > > On Thu, 7 Feb 2019 at 16:06, Marc-André Lureau > > <marcandre.lur...@redhat.com> wrote: > > > > > > The following changes since commit > > > 632351e0e1a861f2eaf709b053c53f96a1225825: > > > > > > Merge remote-tracking branch 'remotes/elmarco/tags/dump-pull-request' > > > into staging (2019-02-07 14:20:46 +0000) > > > > > > are available in the Git repository at: > > > > > > https://github.com/elmarco/qemu.git tags/chardev-pull-request > > > > > > for you to fetch changes up to df3afdedd23ade0c9de55cadeb1d85055689023f: > > > > > > tests/test-char: add muxed chardev testing for open/close (2019-02-07 > > > 16:18:25 +0100) > > > > > > ---------------------------------------------------------------- > > > Various chardev fixes > > > > > > ---------------------------------------------------------------- > > > > This seems to result in 'make check' failures on some platforms. > > I saw this on s390 and aarch32, I think. > > > > MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} > > tests/test-char -m=quick -k --tap < /dev/null | > > ./scripts/tap-driver.pl --test-name="test-char > > " > > PASS 1 test-char /char/null > > PASS 2 test-char /char/invalid > > PASS 3 test-char /char/ringbuf > > PASS 4 test-char /char/mux > > PASS 5 test-char /char/stdio > > PASS 6 test-char /char/pipe > > PASS 7 test-char /char/file > > PASS 8 test-char /char/file-fifo > > PASS 9 test-char /char/udp > > PASS 10 test-char /char/serial > > PASS 11 test-char /char/hotswap > > PASS 12 test-char /char/websocket > > PASS 13 test-char /char/socket/server/mainloop/tcp > > PASS 14 test-char /char/socket/server/mainloop/unix > > PASS 15 test-char /char/socket/server/wait-conn/tcp > > PASS 16 test-char /char/socket/server/wait-conn/unix > > PASS 17 test-char /char/socket/server/mainloop-fdpass/tcp > > PASS 18 test-char /char/socket/server/mainloop-fdpass/unix > > PASS 19 test-char /char/socket/server/wait-conn-fdpass/tcp > > PASS 20 test-char /char/socket/server/wait-conn-fdpass/unix > > PASS 21 test-char /char/socket/client/mainloop/tcp > > PASS 22 test-char /char/socket/client/mainloop/unix > > qemu: qemu_mutex_destroy: Device or resource busy > > PASS 23 test-char /char/socket/client/wait-conn/tcp > > PASS 24 test-char /char/socket/client/wait-conn/unix > > Aborted (core dumped) > > ERROR - too few tests run (expected 32, got 24) > > > > Here's a backtrace from running tests/test-char under gdb. > > Looks like a race condition between a thread trying to > > destroy a mutex and a different thread that is still > > using it. > > Thanks, that is very useful. I can see the race condition here > now between qio_task_thread_worker and qio_task_thread_result. > I need to acquire the mutex in qio_task_thread_result in order > to sycnhronize with completion of qio_task_thread_worker.
In testing this first bug, I found a second bug hiding where tcp_chr_wait_connected forget to de-register the pending reconnect timer GSource, leading to a later crash. We would not have seen this in the test suite except for me adding a sleep(1) in the right place by chance :-) I've sent a v3 series for Marc-André to queue for a new PULL > > On some other hosts I saw a similar > > "qemu: qemu_mutex_destroy: Device or resource busy" and core dump in the > > migration tests, I think, which is probably the same underlying bug. > > Yes, I expect it is the same problem I'm confident this is the first problem i mention now. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|