* Peter Xu (pet...@redhat.com) wrote: > On Thu, May 24, 2018 at 09:08:58AM +0200, Markus Armbruster wrote: > > Peter Xu <pet...@redhat.com> writes: > > > > > On Mon, May 21, 2018 at 09:13:06AM -0500, Eric Blake wrote: > > >> On 05/21/2018 03:42 AM, Peter Xu wrote: > > >> > We turned Out-Of-Band feature of monitors off for 2.12 release. Now we > > >> > try to turn that on again. > > >> > > >> "try to turn" sounds weak, like you aren't sure of this patch. If you > > >> aren't sure, then why should we feel safe in applying it? This text is > > >> going in the permanent git history, so sound bold, rather than hesitant! > > > > > > Yeah I am not really strong at turn that on by default, that's why I > > > marked the patch as RFC. I wanted to hear from your opinions. For > > > now IMHO even with x-oob, postcopy can start to work with network > > > recovery, then the requirement from my part is done. However I'm > > > thinking maybe we should still turn that on for all the people. One > > > reason is that we already have the QMP capability negotiation so it > > > seems redundant (as you mentioned before), meanwhile exposing it to > > > broader users can let broader users to leverage this new bits directly > > > (meanwhile easier to expose potential issues for OOB too). > > > > > > Meanwhile I'm not confident too on that there can be other test cases > > > that has not yet been run with Out-Of-Band, so even if we solved all > > > existing problems I can't be sure that no further test will broke. > > > However I don't see it a problem for merging, since AFAIU I can't > > > really know what will break again (if there is) unless we apply that > > > to master again... :) > > > > The time to switch it on is now. I don't think we can find remaining > > issues (if any) unless we do. > > > > The time for doubts is the day before rc0 ;) > > Haha. I bet you are right. :) > > > > > >> "We have resolved the issues from last time (commit 3fd2457d reverted by > > >> commit a4f90923): > > >> - issue 1 ... > > >> - issue 2 ... > > >> So now we are ready to enable advertisement of the feature by default" > > >> > > >> with better descriptions of the issues that you fixed (I can think of at > > >> least the fixes adding thread-safety to the current monitor, and fixing > > >> early use of the monitor before qmp_capabilities completes; there may > > >> also > > >> be other issues that you want to call out). > > > > > > Some of the monitor patches are not really related to previous OOB > > > breakage, the only one that really matters should be the ARM+Libvirt > > > one, which I will definitely mention in my next post. The rest > > > (including per-thread cur_mon, monitor thread-safety, etc.) should > > > mostly for future new commands of Out-Of-Band but not for now. For > > > example, current OOB commands are rare, now they don't use the > > > get_fd()/set_fd() interface, then the mon_fdsets won't need to be > > > protected at all. > > > > A lock works only when all its critical sections are guaranteed to > > execute quickly! Remember, for OOB to work, all OOB commands must > > execute quickly, always. Rule of thumb: treat OOB command code like > > realtime code. > > Yes indeed I suspect the realtime work will have similar requirements. > It's a funny experience that we do realtime-alike considerations in > QEMU's control path. :) > > I believe that's why Dave pointed out that when taking the mon_fdsets > lock (in the other patchset) we can't call close(). That's a good > lesson.
I don't worry too much about the length of time; what I worry about is things that might actually hang (OK, a TCP timeout is too long a length of time!); so things like close and other things that work on sockets are just asking for trouble. I wish there was a way we could annotate code called from OOB context and get some tools to automatically spot problems! Dave > (and I just noticed I forgot to CC Dave for this...; adding in) > > > > > > But we can't guarantee that new OOB commands won't > > > use them too, so we still need to protect them with locks. > > > > When creating or modifying an OOB command, you need to be extra careful > > about shared state: you may have to add suitable synchronization. > > > > Please add suitable warnings about use of allow-oob to > > docs/qapi-code-gen.txt. Perhaps even a separate document on OOB, > > pointed to by qapi-code-gen.txt. > > Maybe append another bullet to existing? > > ================== > diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt > index b9b6eabd08..13f86095d1 100644 > --- a/docs/devel/qapi-code-gen.txt > +++ b/docs/devel/qapi-code-gen.txt > @@ -680,6 +680,9 @@ OOB command handlers must satisfy the following > conditions: > - It does not invoke system calls that may block, > - It does not access guest RAM that may block when userfaultfd is > enabled for postcopy live migration. > +- It needs to protect possible shared states, since as long as a > + command supports Out-Of-Band it means the handler can be run in > + parallel with the same handler running in the other thread. > > If in doubt, do not implement OOB execution support. > ================== > > > > > >> > > > >> > Signed-off-by: Peter Xu <pet...@redhat.com> > > >> > -- > > >> > Now OOB should be okay with all known tests (except iotest qcow2, since > > >> > it is still broken on master), > > >> > > >> Which tests are still failing for you? Ideally, you can still > > >> demonstrate > > >> that the tests not failing without this patch continue to pass with this > > >> patch, even if you call out the tests that have known issues to still be > > >> resolved. > > > > > > I didn't remember. We can first settle down on whether we'd like to > > > turn on this default value, then I can perform this test for my next > > > post to make sure good tests won't break. > > > > > >> > > >> > and AFAIK now we should also be okay with > > >> > ARM+Libvirt (not testsed, but Eric Auger helped to verify that before > > >> > the release). So I think it's now safe to turn OOB on again. Please > > >> > feel free to test this against any of existing testsuites to see > > >> > whether > > >> > it'll still break any stuff. Thanks, > > >> > > > >> > Signed-off-by: Peter Xu <pet...@redhat.com> > > >> > --- > > >> > monitor.c | 13 +++---------- > > >> > tests/qmp-test.c | 2 +- > > >> > vl.c | 9 ++++----- > > >> > 3 files changed, 8 insertions(+), 16 deletions(-) > > >> > > > >> > diff --git a/monitor.c b/monitor.c > > >> > index 46814af533..ce5cc5e34e 100644 > > >> > --- a/monitor.c > > >> > +++ b/monitor.c > > >> > @@ -4560,16 +4560,9 @@ void monitor_init(Chardev *chr, int flags) > > >> > bool use_readline = flags & MONITOR_USE_READLINE; > > >> > bool use_oob = flags & MONITOR_USE_OOB; > > >> > - if (use_oob) { > > >> > - if (CHARDEV_IS_MUX(chr)) { > > >> > - error_report("Monitor Out-Of-Band is not supported with " > > >> > - "MUX typed chardev backend"); > > >> > - exit(1); > > >> > - } > > >> > - if (use_readline) { > > >> > - error_report("Monitor Out-Of-band is only supported by > > >> > QMP"); > > >> > - exit(1); > > >> > - } > > >> > + if (CHARDEV_IS_MUX(chr)) { > > >> > + /* MUX is still not supported for Out-Of-Band */ > > >> > + use_oob = false; > > >> > > >> This isn't a mere reinstatement of 3fd2457d, but is now advertising OOB > > >> when > > >> using readline (which presumably is a synonym for using HMP). > > > > It effectively is a synonm now, but exploiting it would be unclean. The > > clean test for HMP is !(flags & MONITOR_USE_CONTROL). > > > > >> Is that > > >> intentional? If so, the commit message should mention it. > > > > > > At [1] below I directly moved the chunk into "mode=control" path, so > > > the QMP check is already there. Here I turn OOB off explicitly for > > > MUX no matter HMP/QMP. It should have the same affect as 3fd2457d. > > > > Switching off OOB silently for mux'ed chardev is ugly. But I don't have > > better ideas. > > > > What's keeping us from supporting OOB there? > > Monitor is only a single chardev frontend. The whole OOB work allows > monitor frontend to run isolatedly, but it never applies to any other > frontend. Considering that MUX can support arbitrary frontends to > share the single backend chardev, I suppose "supporting OOB for MUX" > basically means to support OOB for every existing chardev frontend... > > > > > Is anyone using QMP over a mux'ed chardev in anger? I guess we don't > > know. > > IMHO it will be fine even for anyone who likes MUXed - he/she just > needs an extra QMP channel only for OOB. > > (Or would the anger be a good driving force to push someone to add > complete support for Out-Of-Band? I confess, I am not angry > enough... :) > > > > > >> > } > > >> > monitor_data_init(mon, false, use_oob); > > >> > diff --git a/tests/qmp-test.c b/tests/qmp-test.c > > >> > index 88f867f8c0..c85a3964d9 100644 > > >> > --- a/tests/qmp-test.c > > >> > +++ b/tests/qmp-test.c > > >> > @@ -89,7 +89,7 @@ static void test_qmp_protocol(void) > > >> > g_assert(q); > > >> > test_version(qdict_get(q, "version")); > > >> > capabilities = qdict_get_qlist(q, "capabilities"); > > >> > - g_assert(capabilities && qlist_empty(capabilities)); > > >> > + g_assert(capabilities); > > >> > qobject_unref(resp); > > >> > /* Test valid command before handshake */ > > > > Let's check contents of @capabilities matches expectations. > > It'll be checked in the other OOB specified test (test_qmp_oob). > > > > > >> > diff --git a/vl.c b/vl.c > > >> > index 3b39bbd7a8..b71fb8eb25 100644 > > >> > --- a/vl.c > > >> > +++ b/vl.c > > >> > @@ -2394,6 +2394,10 @@ static int mon_init_func(void *opaque, QemuOpts > > >> > *opts, Error **errp) > > >> > flags = MONITOR_USE_READLINE; > > >> > } else if (strcmp(mode, "control") == 0) { > > >> > flags = MONITOR_USE_CONTROL; > > >> > + /* Out-Of-Band is on by default */ > > >> > + if (qemu_opt_get_bool(opts, "x-oob", 1)) { > > >> > + flags |= MONITOR_USE_OOB; > > >> > + } > > > > > > [1] > > > > > >> > > >> Do we really still need the x-oob property, vs. outright deletion of this > > >> bandaid? Then again, I guess keeping it for one more release makes it > > >> easier to forcefully turn things off for temporary testing when isolating > > >> whether OOB is a culprit in something breaking. > > > > > > Yes, since we already have it, I didn't have strong opinion to remove > > > it, so I kept it. Actually now we can't turn OOB off completely > > > already - it's fully embeded in QMP internal logic (e.g., now we'll > > > always have completely isolated parser, dispatcher, responder; we > > > can't go back to the old func-call ways any more). So maybe I can > > > remove that parameter directly next time. > > > > I'm leaning towards removing it, because: > > > > * There's no test coverage for x-oob=off > > > > * We already have a way to disable OOB: the QMP client can decline the > > capability > > > > * MONITOR_USE_OOB makes monitor flags even uglier: of the four flag > > bits' sixteen combinations, only a few actually occur: > > > > MONITOR_USE_READLINE on off > > MONITOR_USE_CONTROL off on > > MONITOR_USE_PRETTY N/A either > > MONITOR_USE_OOB N/A on unless mux'ed > > > > That's just three genuine configurations (HMP, QMP, pretty QMP), if we > > ignore the "mux'ed can't do OOB, yet" bit. > > > > Without x-oob, we can drop MONITOR_USE_OOB, I think. > > Yes, let me remove it. > > Thanks, > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK