Am 30.05.2013 00:29, schrieb mdroth: > On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote: >> On Mon, 27 May 2013 12:59:25 -0500 >> mdroth <mdr...@linux.vnet.ibm.com> wrote: >> >>> On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote: >>>> Luiz Capitulino <lcapitul...@redhat.com> writes: >>>> >>>>> On Sun, 26 May 2013 10:33:39 -0500 >>>>> Michael Roth <mdr...@linux.vnet.ibm.com> wrote: >>>>> >>>>>> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired >>>>>> QEMUTimer. Due to timers being processing at the tail end of each main >>>>>> loop iteration, this generally meant that such events would be emitted >>>>>> within the same main loop iteration, prior any client data being read >>>>>> by tcp/unix socket server backends. >>>>>> >>>>>> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to >>>>>> a "bottom-half" that is registered via g_idle_add(). This makes it >>>>>> likely that the event won't be sent until a subsequent iteration, and >>>>>> also adds the possibility that such events will race with the >>>>>> processing of client data. >>>>>> >>>>>> In cases where we rely on the CHR_EVENT_OPENED event being delivered >>>>>> prior to any client data being read, this may cause unexpected behavior. >>>>>> >>>>>> In the case of a QMP monitor session using a socket backend, the delayed >>>>>> processing of the CHR_EVENT_OPENED event can lead to a situation where >>>>>> a previous session, where 'qmp_capabilities' has already processed, can >>>>>> cause the rejection of 'qmp_capabilities' for a subsequent session, >>>>>> since we reset capabilities in response to CHR_EVENT_OPENED, which may >>>>>> not yet have been delivered. This can be reproduced with the following >>>>>> command, generally within 50 or so iterations: >>>>>> >>>>>> mdroth@loki:~$ cat cmds >>>>>> {'execute':'qmp_capabilities'} >>>>>> {'execute':'query-cpus'} >>>>>> mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock >>>>>> <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else >>>>>> echo ok; fi; done; >>>>>> ok >>>>>> ok >>>>>> failed >>>>>> mdroth@loki:~$ >>>>>> >>>>>> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED >>>>>> hook, which gets called as part of the GIOChannel cb associated with the >>>>>> client rather than a bottom-half, and is thus guaranteed to be delivered >>>>>> prior to accepting any subsequent client sessions. >>>>>> >>>>>> This does not address the more general problem of whether or not there >>>>>> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to >>>>>> client data, and whether or not we should consider moving >>>>>> CHR_EVENT_OPENED >>>>>> "in-band" with connection establishment as a general solution, but fixes >>>>>> QMP for the time being. >>>>>> >>>>>> Reported-by: Stefan Priebe <s.pri...@profihost.ag> >>>>>> Cc: qemu-sta...@nongnu.org >>>>>> Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com> >>>>> >>>>> Thanks for debugging this Michael. I'm going to apply your patch to the >>>>> qmp >>>>> branch because we can't let this broken (esp. in -stable) but this is >>>>> papering >>>>> over the real bug... >>>> >>>> Do we really need OPENED to happen in a bottom half? Shouldn't the open >>>> event handlers register the callback instead if they need it? >>> >>> I think that's the more general fix, but wasn't sure if there were >>> historical reasons why we didn't do this in the first place. >>> >>> Looking at it more closely it does seem like we need a general fix >>> sooner rather than later though, and I don't see any reason why we can't >>> move BH registration into the actual OPENED hooks as you suggest: >>> >>> hw/usb/ccid-card-passthru.c >>> - currently affected? no >>> debug hook only >>> - needs OPENED BH? no >>> >>> hw/usb/redirect.c >>> - currently affected? yes >>> can_read handler checks for dev->parser != NULL, which may be >>> true if CLOSED BH has been executed yet. In the past, OPENED >>> quiesced outstanding CLOSED events prior to us reading client data. >>> - need OPENED BH? possibly >>> seems to only be needed by CLOSED events, which can be issued by >>> USBDevice, so we do teardown/usb_detach in BH. For OPENED, this >>> may not apply. if we do issue a BH, we'd need to modify can_read >>> handler to avoid race. >>> >>> hw/usb/dev-serial.c >>> - currently affected? no >>> can_read checks for dev.attached != NULL. set to NULL >>> synchronously in CLOSED, will not accept reads until OPENED gets >>> called and sets dev.attached >>> - need opened BH? no >>> >>> hw/char/sclpconsole.c >>> - currently affected? no >>> guarded by can_read handler >>> - need opened BH? no >>> >>> hw/char/ipoctal232.c >>> - currently affected? no >>> debug hook only >>> - need opened BH? no >>> >>> hw/char/virtio-console.c >>> - currently affected? no >>> OPENED/CLOSED map to vq events handled asynchronously. can_read >>> checks for guest_connected state rather than anything set by OPENED >>> - need opened BH? no >>> >>> qtest.c >>> - currently affected? yes >>> can_read handler doesn't check for qtest_opened == true, can read >>> data before OPENED event is processed >>> - need opened BH? no >>> >>> monitor.c >>> - current affected? yes >>> negotiated session state can temporarilly leak into a new session >>> - need opened BH? no >>> >>> gdbstub.c >>> - currently affected? yes >>> can fail to pause machine prior to starting gdb session >>> - need opened BH? no >>> >>> So we have a number of cases that can be fixed by avoiding the use of >>> the generic BH, and only 1 possible cause where we might need a >>> device-specific BH. >>> >>> At first glance anyway. So if this all seems reasonable I can send a >>> more general fix shortly. >> >> Michael, I've dropped your first patch and am taking it that you're >> going to cook this more general fix. > > fix sent: > > http://article.gmane.org/gmane.comp.emulators.qemu/213863
thanks seems to work fine Stefan