On Fri, 12 Aug 2022 08:38:59 -0300 Murilo Opsfelder Araújo <muri...@linux.ibm.com> wrote:
> On 8/12/22 04:26, Claudio Imbrenda wrote: > > On Thu, 11 Aug 2022 23:05:52 -0300 > > Murilo Opsfelder Araújo <muri...@linux.ibm.com> wrote: > > > >> On 8/11/22 11:02, Daniel P. Berrangé wrote: > >> [...] > >>>>> Hmm, I was hoping you could just use SIGKILL to guarantee that this > >>>>> gets killed off. Is SIGKILL delivered too soon to allow for the > >>>>> main QEMU process to have exited quickly ? > >>>> > >>>> yes, I tried. qemu has not finished exiting when the signal is > >>>> delivered, the cleanup process dies before qemu, which defeats the > >>>> purpose > >>> > >>> Ok, too bad. > >>> > >>>>> If so I wonder what happens when systemd just delivers SIGKILL to > >>>>> all processes in the cgroup - I'm not sure there's a guarantee it > >>>>> will SIGKILL the main qemu before it SIGKILLs this helper > >>>> > >>>> I'm afraid in that case there is no guarantee. > >>>> > >>>> for what it's worth, both virsh shutdown and destroy seem to do things > >>>> properly. > >>> > >>> Hmm, probably because libvirt tells QEMU to exit before systemd comes > >>> along and tells everything in the cgroup to die with SIGKILL. > >> > >> It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10 > >> seconds after Libvirt sent SIGTERM: > >> > >> https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375 > >> > > > > but this is fine. > > > > with asynchronous teardown, qemu will exit almost immediately when > > receiving SIGTERM, and the cleanup process will start cleaning up. > > Under normal and orderly conditions, yes. > > >> So I guess this patch happened to work with Libvirt because the main qemu > >> process terminated before the timeout and before SIGKILL was delivered. > > > > it seems so > > > >> > >> The cleanup process is trying to solve the problem where the main qemu > >> process > >> takes too long to terminate. However, if the cleanup process itself takes > >> too > >> long, SIGKILL will be sent by Libvirt anyway. > > > > but that is not a problem, the sole purpose of the cleanup process is > > to terminate _after_ qemu. it doesn't matter what happens after qemu > > has terminated. if you look at the patch, after going to great lengths > > to assure that qemu has terminated, all the child process does is > > _exit(0). > > > >> > >> Perhaps we can describe this situation in the parameter help, e.g.: If > >> management layer decides to send SIGKILL (e.g.: due to timeout or > >> deliberate > >> decision), the cleanup process can exit before the main process, deceiving > >> its > >> purpose. > > > > if the management layer (or the user) decides to send SIGKILL > > immediately to the whole cgroup without sending SIGTERM first, then > > this whole asynchronous teardown mechanism is defeated, yes. > > This situation is what we likely want to describe in the parameter help. I > don't > want to give users the false impression that this option will *always* behave > the manner we expect it to work *most* of the time. fair enough, I'll improve the documentation > > -- > Murilo