On Sun, 26 May 2013 10:33:39 -0500 Michael Roth <mdr...@linux.vnet.ibm.com> wrote:
> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired > QEMUTimer. Due to timers being processing at the tail end of each main > loop iteration, this generally meant that such events would be emitted > within the same main loop iteration, prior any client data being read > by tcp/unix socket server backends. > > With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to > a "bottom-half" that is registered via g_idle_add(). This makes it > likely that the event won't be sent until a subsequent iteration, and > also adds the possibility that such events will race with the > processing of client data. > > In cases where we rely on the CHR_EVENT_OPENED event being delivered > prior to any client data being read, this may cause unexpected behavior. > > In the case of a QMP monitor session using a socket backend, the delayed > processing of the CHR_EVENT_OPENED event can lead to a situation where > a previous session, where 'qmp_capabilities' has already processed, can > cause the rejection of 'qmp_capabilities' for a subsequent session, > since we reset capabilities in response to CHR_EVENT_OPENED, which may > not yet have been delivered. This can be reproduced with the following > command, generally within 50 or so iterations: > > mdroth@loki:~$ cat cmds > {'execute':'qmp_capabilities'} > {'execute':'query-cpus'} > mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock > <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else > echo ok; fi; done; > ok > ok > failed > mdroth@loki:~$ > > As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED > hook, which gets called as part of the GIOChannel cb associated with the > client rather than a bottom-half, and is thus guaranteed to be delivered > prior to accepting any subsequent client sessions. > > This does not address the more general problem of whether or not there > are chardev users that rely on CHR_EVENT_OPENED being delivered prior to > client data, and whether or not we should consider moving CHR_EVENT_OPENED > "in-band" with connection establishment as a general solution, but fixes > QMP for the time being. > > Reported-by: Stefan Priebe <s.pri...@profihost.ag> > Cc: qemu-sta...@nongnu.org > Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com> Thanks for debugging this Michael. I'm going to apply your patch to the qmp branch because we can't let this broken (esp. in -stable) but this is papering over the real bug... > --- > v1->v2: > * remove command_mode reset from CHR_EVENT_OPENED case, since this > might still cause a race > > monitor.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/monitor.c b/monitor.c > index 6ce2a4e..f1953a0 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int > event) > > switch (event) { > case CHR_EVENT_OPENED: > - mon->mc->command_mode = 0; > data = get_qmp_greeting(); > monitor_json_emitter(mon, data); > qobject_decref(data); > @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int > event) > json_message_parser_init(&mon->mc->parser, handle_qmp_command); > mon_refcount--; > monitor_fdsets_cleanup(); > + mon->mc->command_mode = 0; > break; > } > }