On Thu, May 24, 2018 at 05:46:05PM +0200, Thomas Huth wrote: > On 24.05.2018 17:00, Michael S. Tsirkin wrote: > > On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote: > >> On 24.05.2018 16:30, Michael S. Tsirkin wrote: > >>> Right now tests report OK status if QEMU crashes during cleanup. > >>> Let's catch that case and fail the test. > >>> > >>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > >>> --- > >>> tests/libqtest.c | 9 ++++++++- > >>> 1 file changed, 8 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/tests/libqtest.c b/tests/libqtest.c > >>> index 43fb97e..f869854 100644 > >>> --- a/tests/libqtest.c > >>> +++ b/tests/libqtest.c > >>> @@ -103,8 +103,15 @@ static int socket_accept(int sock) > >>> static void kill_qemu(QTestState *s) > >>> { > >>> if (s->qemu_pid != -1) { > >>> + int wstatus = 0; > >>> + pid_t pid; > >>> + > >>> kill(s->qemu_pid, SIGTERM); > >>> - waitpid(s->qemu_pid, NULL, 0); > >>> + pid = waitpid(s->qemu_pid, &wstatus, 0); > >>> + > >>> + if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { > >>> + assert(!WCOREDUMP(wstatus)); > >>> + } > >>> } > >>> } > >> > >> That's basically a good idea ... but I've already seen yet another issue > >> in the past already: QEMU sometimes simply hangs in an endless loop > >> during clean up and never terminates. I think we should detect that > >> situation, too. So instead of killing QEMU at the end of the testing, I > >> think we should > >> rather try to terminate it with the QMP "quit" command. If QEMU does not > >> terminate with an exit code of 0, then the test should be flagged a > >> failed (and only if QEMU did not terminate at all, it should be killed > >> with SIGKILL). > >> > >> Thomas > > > > Fine but can we agree to do this as a patch on top? And do you have > > the time to implement this? > > Fine for me if we do that later. And no, I currently don't have time to > work on this (but I've got it on my TODO list somewhere, so I hope I > won't forget about it later...). > > > I'm seeing patches that cause crash on cleanup, it's not a theoretical > > problem for me, so I'd like this one to go in first. > > Ok, so here are the two problems that I remember: > > 1) > > git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1 > make -j4 subdir-nios2-softmmu > nios2-softmmu/qemu-system-nios2 -monitor stdio > > ==> You can neither "quit" from the HMP prompt, nor kill QEMU with > SIGTERM, you've got to use SIGKILL instead. > > Ok, libqtest likely would not have reported success in this case, too, > we just did not notice since there is no libqtest in place that tests > the nios2 machine in TCG mode. Anyway, it would be nice if qtest would > properly detect the situation and report an error instead of just > hanging in waitpid().
That's a tall order, it assumes we can define a reasonable time during which QEMU should exit. > 2) > > git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1 > make -j4 subdir-ppc-softmmu > ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio > > ===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the > problem where libqtest did not report an error though it should have > reported one. So QEMU was not hanging in an endless loop here, but core > dumped ... Sorry, I apparently mixed this up in my mind with the first > case. That means we should be fine here with your patch. > > Thomas