On Mon, 08/21 16:36, Dr. David Alan Gilbert wrote: > * Fam Zheng (f...@redhat.com) wrote: > > On Mon, 08/21 18:05, Peter Xu wrote: > > > On Mon, Aug 21, 2017 at 04:58:51PM +0800, Fam Zheng wrote: > > > > On Mon, 08/21 15:44, Peter Xu wrote: > > > > > This is an extended work for migration postcopy recovery. This series > > > > > is tested with the following series to make sure it solves the monitor > > > > > hang problem that we have encountered for postcopy recovery: > > > > > > > > > > [RFC 00/29] Migration: postcopy failure recovery > > > > > [RFC 0/6] migration: re-use migrate_incoming for postcopy recovery > > > > > > > > > > The root problem is that, monitor commands are all handled in main > > > > > loop thread now, no matter how many monitors we specify. And, if main > > > > > loop thread hangs due to some reason, all monitors will be stuck. > > > > > This can be done in reversed order as well: if any of the monitor > > > > > hangs, it will hang the main loop, and the rest of the monitors (if > > > > > there is any). > > > > > > > > > > That affects postcopy recovery, since the recovery requires user input > > > > > on destination side. If monitors hang, the destination VM dies and > > > > > lose hope for even a final recovery. > > > > > > > > > > So, sometimes we need to make sure the monitor be alive, at least one > > > > > of them. > > > > > > > > > > The whole idea of this series is that instead if handling monitor > > > > > commands all in main loop thread, we do it separately in per-monitor > > > > > threads. Then, even if main loop thread hangs at any point by any > > > > > reason, per-monitor thread can still survive. Further, we add hint in > > > > > QMP/HMP to show whether a command can be executed without QMP, if so, > > > > > we avoid taking BQL when running that command. It greatly reduced > > > > > contention of BQL. Now the only user of that new parameter (currently > > > > > I call it "without-bql") is "migrate-incoming" command, which is the > > > > > only command to rescue a paused postcopy migration. > > > > > > > > > > However, even with the series, it does not mean that per-monitor > > > > > threads will never hang. One example is that we can still run "info > > > > > vcpus" in per-monitor threads during a paused postcopy (in that state, > > > > > page faults are never handled, and "info cpus" will never return since > > > > > it tries to sync every vcpus). So to make sure it does not hang, we > > > > > not only need the per-monitor thread, the user should be careful as > > > > > well on how to use it. > > > > > > > > I think this is like saying we expect the user to understand the > > > > internals of > > > > QEMU, unless the "rules" are clearly documented. Taking this into > > > > account, > > > > does it make sense to make the per-monitor thread only allow BQL-free > > > > commands? > > > > > > I don't think users need to know the internals - they just need to be > > > careful on using them. Just take the example of "info cpus": during > > > paused postcopy it will hang, but IMHO it does not mean that it's > > > illegal for user to send that command. It's "by-design" that it'll be > > > stuck if one of the vcpus is stuck somewhere; it's just not the > > > correct way to use it when the monitor is prepared for postcopy > > > recovery. > > > > They still need to know "what" is the correct way to use the monitor, and > > what > > I'm saying is there doesn't seem to be an easy way for users to know exactly > > what is correct. See below. > > > > > > > > And IMHO we should not treat threaded monitors special - it should be > > > exactly the same monitor service when used with main loop thread. It > > > just has its own thread to handle the requests, so it is less > > > dependent on main loop thread, and that's all. > > > > It's not that simple, I think all non-trivial commands need very careful > > audit > > before assuming they're safe. For example many block related commands > > (qmp_trasaction, for example) indirectly calls BDRV_POLL_WHILE(), which, if > > called from a per-monitor thread, will enter the else branch then fail the > > first > > assert. > > OK, that's interesting - I'd assumed that as long as we actually held > the bql we were reasonably safe. > Can you explain what that assert is actually asserting?
It's not much more than asserting qemu_mutex_iothread_locked(), the problem is the new monitor thread breaks certain assumptions that was true. What is interesting in this is that block layer's nested aio_poll() now not only run in the main thread but also in the monitor thread. Bugs may hide there. :) That's why I suggested a "safe by default" strategy. One step back, is it possible to "unblock" main thread even upon network issue? What is the scenario that causes main thread hang? Is there a backtrace? Fam