Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
Am 30.05.2013 00:29, schrieb mdroth: On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote: On Mon, 27 May 2013 12:59:25 -0500 mdroth mdr...@linux.vnet.ibm.com wrote: On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Sun, 26 May 2013 10:33:39 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: In the past, CHR_EVENT_OPENED events were emitted via a pre-expired QEMUTimer. Due to timers being processing at the tail end of each main loop iteration, this generally meant that such events would be emitted within the same main loop iteration, prior any client data being read by tcp/unix socket server backends. With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to a bottom-half that is registered via g_idle_add(). This makes it likely that the event won't be sent until a subsequent iteration, and also adds the possibility that such events will race with the processing of client data. In cases where we rely on the CHR_EVENT_OPENED event being delivered prior to any client data being read, this may cause unexpected behavior. In the case of a QMP monitor session using a socket backend, the delayed processing of the CHR_EVENT_OPENED event can lead to a situation where a previous session, where 'qmp_capabilities' has already processed, can cause the rejection of 'qmp_capabilities' for a subsequent session, since we reset capabilities in response to CHR_EVENT_OPENED, which may not yet have been delivered. This can be reproduced with the following command, generally within 50 or so iterations: mdroth@loki:~$ cat cmds {'execute':'qmp_capabilities'} {'execute':'query-cpus'} mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock cmds | grep CommandNotFound /dev/null; then echo failed; break; else echo ok; fi; done; ok ok failed mdroth@loki:~$ As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED hook, which gets called as part of the GIOChannel cb associated with the client rather than a bottom-half, and is thus guaranteed to be delivered prior to accepting any subsequent client sessions. This does not address the more general problem of whether or not there are chardev users that rely on CHR_EVENT_OPENED being delivered prior to client data, and whether or not we should consider moving CHR_EVENT_OPENED in-band with connection establishment as a general solution, but fixes QMP for the time being. Reported-by: Stefan Priebe s.pri...@profihost.ag Cc: qemu-sta...@nongnu.org Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com Thanks for debugging this Michael. I'm going to apply your patch to the qmp branch because we can't let this broken (esp. in -stable) but this is papering over the real bug... Do we really need OPENED to happen in a bottom half? Shouldn't the open event handlers register the callback instead if they need it? I think that's the more general fix, but wasn't sure if there were historical reasons why we didn't do this in the first place. Looking at it more closely it does seem like we need a general fix sooner rather than later though, and I don't see any reason why we can't move BH registration into the actual OPENED hooks as you suggest: hw/usb/ccid-card-passthru.c - currently affected? no debug hook only - needs OPENED BH? no hw/usb/redirect.c - currently affected? yes can_read handler checks for dev-parser != NULL, which may be true if CLOSED BH has been executed yet. In the past, OPENED quiesced outstanding CLOSED events prior to us reading client data. - need OPENED BH? possibly seems to only be needed by CLOSED events, which can be issued by USBDevice, so we do teardown/usb_detach in BH. For OPENED, this may not apply. if we do issue a BH, we'd need to modify can_read handler to avoid race. hw/usb/dev-serial.c - currently affected? no can_read checks for dev.attached != NULL. set to NULL synchronously in CLOSED, will not accept reads until OPENED gets called and sets dev.attached - need opened BH? no hw/char/sclpconsole.c - currently affected? no guarded by can_read handler - need opened BH? no hw/char/ipoctal232.c - currently affected? no debug hook only - need opened BH? no hw/char/virtio-console.c - currently affected? no OPENED/CLOSED map to vq events handled asynchronously. can_read checks for guest_connected state rather than anything set by OPENED - need opened BH? no qtest.c - currently affected? yes can_read handler doesn't check for qtest_opened == true, can read data before OPENED event is processed - need opened BH? no monitor.c - current affected? yes negotiated session state can temporarilly leak into a new session - need opened BH? no gdbstub.c - currently affected? yes can fail to pause machine prior to starting gdb
Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
On Mon, 27 May 2013 12:59:25 -0500 mdroth mdr...@linux.vnet.ibm.com wrote: On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Sun, 26 May 2013 10:33:39 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: In the past, CHR_EVENT_OPENED events were emitted via a pre-expired QEMUTimer. Due to timers being processing at the tail end of each main loop iteration, this generally meant that such events would be emitted within the same main loop iteration, prior any client data being read by tcp/unix socket server backends. With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to a bottom-half that is registered via g_idle_add(). This makes it likely that the event won't be sent until a subsequent iteration, and also adds the possibility that such events will race with the processing of client data. In cases where we rely on the CHR_EVENT_OPENED event being delivered prior to any client data being read, this may cause unexpected behavior. In the case of a QMP monitor session using a socket backend, the delayed processing of the CHR_EVENT_OPENED event can lead to a situation where a previous session, where 'qmp_capabilities' has already processed, can cause the rejection of 'qmp_capabilities' for a subsequent session, since we reset capabilities in response to CHR_EVENT_OPENED, which may not yet have been delivered. This can be reproduced with the following command, generally within 50 or so iterations: mdroth@loki:~$ cat cmds {'execute':'qmp_capabilities'} {'execute':'query-cpus'} mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock cmds | grep CommandNotFound /dev/null; then echo failed; break; else echo ok; fi; done; ok ok failed mdroth@loki:~$ As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED hook, which gets called as part of the GIOChannel cb associated with the client rather than a bottom-half, and is thus guaranteed to be delivered prior to accepting any subsequent client sessions. This does not address the more general problem of whether or not there are chardev users that rely on CHR_EVENT_OPENED being delivered prior to client data, and whether or not we should consider moving CHR_EVENT_OPENED in-band with connection establishment as a general solution, but fixes QMP for the time being. Reported-by: Stefan Priebe s.pri...@profihost.ag Cc: qemu-sta...@nongnu.org Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com Thanks for debugging this Michael. I'm going to apply your patch to the qmp branch because we can't let this broken (esp. in -stable) but this is papering over the real bug... Do we really need OPENED to happen in a bottom half? Shouldn't the open event handlers register the callback instead if they need it? I think that's the more general fix, but wasn't sure if there were historical reasons why we didn't do this in the first place. Looking at it more closely it does seem like we need a general fix sooner rather than later though, and I don't see any reason why we can't move BH registration into the actual OPENED hooks as you suggest: hw/usb/ccid-card-passthru.c - currently affected? no debug hook only - needs OPENED BH? no hw/usb/redirect.c - currently affected? yes can_read handler checks for dev-parser != NULL, which may be true if CLOSED BH has been executed yet. In the past, OPENED quiesced outstanding CLOSED events prior to us reading client data. - need OPENED BH? possibly seems to only be needed by CLOSED events, which can be issued by USBDevice, so we do teardown/usb_detach in BH. For OPENED, this may not apply. if we do issue a BH, we'd need to modify can_read handler to avoid race. hw/usb/dev-serial.c - currently affected? no can_read checks for dev.attached != NULL. set to NULL synchronously in CLOSED, will not accept reads until OPENED gets called and sets dev.attached - need opened BH? no hw/char/sclpconsole.c - currently affected? no guarded by can_read handler - need opened BH? no hw/char/ipoctal232.c - currently affected? no debug hook only - need opened BH? no hw/char/virtio-console.c - currently affected? no OPENED/CLOSED map to vq events handled asynchronously. can_read checks for guest_connected state rather than anything set by OPENED - need opened BH? no qtest.c - currently affected? yes can_read handler doesn't check for qtest_opened == true, can read data before OPENED event is processed - need opened BH? no monitor.c - current affected? yes negotiated session state can temporarilly leak into a new session - need opened BH? no gdbstub.c - currently affected? yes can fail to
Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote: On Mon, 27 May 2013 12:59:25 -0500 mdroth mdr...@linux.vnet.ibm.com wrote: On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Sun, 26 May 2013 10:33:39 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: In the past, CHR_EVENT_OPENED events were emitted via a pre-expired QEMUTimer. Due to timers being processing at the tail end of each main loop iteration, this generally meant that such events would be emitted within the same main loop iteration, prior any client data being read by tcp/unix socket server backends. With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to a bottom-half that is registered via g_idle_add(). This makes it likely that the event won't be sent until a subsequent iteration, and also adds the possibility that such events will race with the processing of client data. In cases where we rely on the CHR_EVENT_OPENED event being delivered prior to any client data being read, this may cause unexpected behavior. In the case of a QMP monitor session using a socket backend, the delayed processing of the CHR_EVENT_OPENED event can lead to a situation where a previous session, where 'qmp_capabilities' has already processed, can cause the rejection of 'qmp_capabilities' for a subsequent session, since we reset capabilities in response to CHR_EVENT_OPENED, which may not yet have been delivered. This can be reproduced with the following command, generally within 50 or so iterations: mdroth@loki:~$ cat cmds {'execute':'qmp_capabilities'} {'execute':'query-cpus'} mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock cmds | grep CommandNotFound /dev/null; then echo failed; break; else echo ok; fi; done; ok ok failed mdroth@loki:~$ As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED hook, which gets called as part of the GIOChannel cb associated with the client rather than a bottom-half, and is thus guaranteed to be delivered prior to accepting any subsequent client sessions. This does not address the more general problem of whether or not there are chardev users that rely on CHR_EVENT_OPENED being delivered prior to client data, and whether or not we should consider moving CHR_EVENT_OPENED in-band with connection establishment as a general solution, but fixes QMP for the time being. Reported-by: Stefan Priebe s.pri...@profihost.ag Cc: qemu-sta...@nongnu.org Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com Thanks for debugging this Michael. I'm going to apply your patch to the qmp branch because we can't let this broken (esp. in -stable) but this is papering over the real bug... Do we really need OPENED to happen in a bottom half? Shouldn't the open event handlers register the callback instead if they need it? I think that's the more general fix, but wasn't sure if there were historical reasons why we didn't do this in the first place. Looking at it more closely it does seem like we need a general fix sooner rather than later though, and I don't see any reason why we can't move BH registration into the actual OPENED hooks as you suggest: hw/usb/ccid-card-passthru.c - currently affected? no debug hook only - needs OPENED BH? no hw/usb/redirect.c - currently affected? yes can_read handler checks for dev-parser != NULL, which may be true if CLOSED BH has been executed yet. In the past, OPENED quiesced outstanding CLOSED events prior to us reading client data. - need OPENED BH? possibly seems to only be needed by CLOSED events, which can be issued by USBDevice, so we do teardown/usb_detach in BH. For OPENED, this may not apply. if we do issue a BH, we'd need to modify can_read handler to avoid race. hw/usb/dev-serial.c - currently affected? no can_read checks for dev.attached != NULL. set to NULL synchronously in CLOSED, will not accept reads until OPENED gets called and sets dev.attached - need opened BH? no hw/char/sclpconsole.c - currently affected? no guarded by can_read handler - need opened BH? no hw/char/ipoctal232.c - currently affected? no debug hook only - need opened BH? no hw/char/virtio-console.c - currently affected? no OPENED/CLOSED map to vq events handled asynchronously. can_read checks for guest_connected state rather than anything set by OPENED - need opened BH? no qtest.c - currently affected? yes can_read handler doesn't check for qtest_opened == true, can read data before OPENED event is
Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote: On Mon, 27 May 2013 12:59:25 -0500 mdroth mdr...@linux.vnet.ibm.com wrote: On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Sun, 26 May 2013 10:33:39 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: In the past, CHR_EVENT_OPENED events were emitted via a pre-expired QEMUTimer. Due to timers being processing at the tail end of each main loop iteration, this generally meant that such events would be emitted within the same main loop iteration, prior any client data being read by tcp/unix socket server backends. With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to a bottom-half that is registered via g_idle_add(). This makes it likely that the event won't be sent until a subsequent iteration, and also adds the possibility that such events will race with the processing of client data. In cases where we rely on the CHR_EVENT_OPENED event being delivered prior to any client data being read, this may cause unexpected behavior. In the case of a QMP monitor session using a socket backend, the delayed processing of the CHR_EVENT_OPENED event can lead to a situation where a previous session, where 'qmp_capabilities' has already processed, can cause the rejection of 'qmp_capabilities' for a subsequent session, since we reset capabilities in response to CHR_EVENT_OPENED, which may not yet have been delivered. This can be reproduced with the following command, generally within 50 or so iterations: mdroth@loki:~$ cat cmds {'execute':'qmp_capabilities'} {'execute':'query-cpus'} mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock cmds | grep CommandNotFound /dev/null; then echo failed; break; else echo ok; fi; done; ok ok failed mdroth@loki:~$ As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED hook, which gets called as part of the GIOChannel cb associated with the client rather than a bottom-half, and is thus guaranteed to be delivered prior to accepting any subsequent client sessions. This does not address the more general problem of whether or not there are chardev users that rely on CHR_EVENT_OPENED being delivered prior to client data, and whether or not we should consider moving CHR_EVENT_OPENED in-band with connection establishment as a general solution, but fixes QMP for the time being. Reported-by: Stefan Priebe s.pri...@profihost.ag Cc: qemu-sta...@nongnu.org Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com Thanks for debugging this Michael. I'm going to apply your patch to the qmp branch because we can't let this broken (esp. in -stable) but this is papering over the real bug... Do we really need OPENED to happen in a bottom half? Shouldn't the open event handlers register the callback instead if they need it? I think that's the more general fix, but wasn't sure if there were historical reasons why we didn't do this in the first place. Looking at it more closely it does seem like we need a general fix sooner rather than later though, and I don't see any reason why we can't move BH registration into the actual OPENED hooks as you suggest: hw/usb/ccid-card-passthru.c - currently affected? no debug hook only - needs OPENED BH? no hw/usb/redirect.c - currently affected? yes can_read handler checks for dev-parser != NULL, which may be true if CLOSED BH has been executed yet. In the past, OPENED quiesced outstanding CLOSED events prior to us reading client data. - need OPENED BH? possibly seems to only be needed by CLOSED events, which can be issued by USBDevice, so we do teardown/usb_detach in BH. For OPENED, this may not apply. if we do issue a BH, we'd need to modify can_read handler to avoid race. hw/usb/dev-serial.c - currently affected? no can_read checks for dev.attached != NULL. set to NULL synchronously in CLOSED, will not accept reads until OPENED gets called and sets dev.attached - need opened BH? no hw/char/sclpconsole.c - currently affected? no guarded by can_read handler - need opened BH? no hw/char/ipoctal232.c - currently affected? no debug hook only - need opened BH? no hw/char/virtio-console.c - currently affected? no OPENED/CLOSED map to vq events handled asynchronously. can_read checks for guest_connected state rather than anything set by OPENED - need opened BH? no qtest.c - currently affected? yes can_read handler doesn't check for qtest_opened == true, can read data before OPENED event is
Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
On Sun, 26 May 2013 10:33:39 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: In the past, CHR_EVENT_OPENED events were emitted via a pre-expired QEMUTimer. Due to timers being processing at the tail end of each main loop iteration, this generally meant that such events would be emitted within the same main loop iteration, prior any client data being read by tcp/unix socket server backends. With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to a bottom-half that is registered via g_idle_add(). This makes it likely that the event won't be sent until a subsequent iteration, and also adds the possibility that such events will race with the processing of client data. In cases where we rely on the CHR_EVENT_OPENED event being delivered prior to any client data being read, this may cause unexpected behavior. In the case of a QMP monitor session using a socket backend, the delayed processing of the CHR_EVENT_OPENED event can lead to a situation where a previous session, where 'qmp_capabilities' has already processed, can cause the rejection of 'qmp_capabilities' for a subsequent session, since we reset capabilities in response to CHR_EVENT_OPENED, which may not yet have been delivered. This can be reproduced with the following command, generally within 50 or so iterations: mdroth@loki:~$ cat cmds {'execute':'qmp_capabilities'} {'execute':'query-cpus'} mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock cmds | grep CommandNotFound /dev/null; then echo failed; break; else echo ok; fi; done; ok ok failed mdroth@loki:~$ As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED hook, which gets called as part of the GIOChannel cb associated with the client rather than a bottom-half, and is thus guaranteed to be delivered prior to accepting any subsequent client sessions. This does not address the more general problem of whether or not there are chardev users that rely on CHR_EVENT_OPENED being delivered prior to client data, and whether or not we should consider moving CHR_EVENT_OPENED in-band with connection establishment as a general solution, but fixes QMP for the time being. Reported-by: Stefan Priebe s.pri...@profihost.ag Cc: qemu-sta...@nongnu.org Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com Thanks for debugging this Michael. I'm going to apply your patch to the qmp branch because we can't let this broken (esp. in -stable) but this is papering over the real bug... --- v1-v2: * remove command_mode reset from CHR_EVENT_OPENED case, since this might still cause a race monitor.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index 6ce2a4e..f1953a0 100644 --- a/monitor.c +++ b/monitor.c @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event) switch (event) { case CHR_EVENT_OPENED: -mon-mc-command_mode = 0; data = get_qmp_greeting(); monitor_json_emitter(mon, data); qobject_decref(data); @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event) json_message_parser_init(mon-mc-parser, handle_qmp_command); mon_refcount--; monitor_fdsets_cleanup(); +mon-mc-command_mode = 0; break; } }
Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
Luiz Capitulino lcapitul...@redhat.com writes: On Sun, 26 May 2013 10:33:39 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: In the past, CHR_EVENT_OPENED events were emitted via a pre-expired QEMUTimer. Due to timers being processing at the tail end of each main loop iteration, this generally meant that such events would be emitted within the same main loop iteration, prior any client data being read by tcp/unix socket server backends. With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to a bottom-half that is registered via g_idle_add(). This makes it likely that the event won't be sent until a subsequent iteration, and also adds the possibility that such events will race with the processing of client data. In cases where we rely on the CHR_EVENT_OPENED event being delivered prior to any client data being read, this may cause unexpected behavior. In the case of a QMP monitor session using a socket backend, the delayed processing of the CHR_EVENT_OPENED event can lead to a situation where a previous session, where 'qmp_capabilities' has already processed, can cause the rejection of 'qmp_capabilities' for a subsequent session, since we reset capabilities in response to CHR_EVENT_OPENED, which may not yet have been delivered. This can be reproduced with the following command, generally within 50 or so iterations: mdroth@loki:~$ cat cmds {'execute':'qmp_capabilities'} {'execute':'query-cpus'} mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock cmds | grep CommandNotFound /dev/null; then echo failed; break; else echo ok; fi; done; ok ok failed mdroth@loki:~$ As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED hook, which gets called as part of the GIOChannel cb associated with the client rather than a bottom-half, and is thus guaranteed to be delivered prior to accepting any subsequent client sessions. This does not address the more general problem of whether or not there are chardev users that rely on CHR_EVENT_OPENED being delivered prior to client data, and whether or not we should consider moving CHR_EVENT_OPENED in-band with connection establishment as a general solution, but fixes QMP for the time being. Reported-by: Stefan Priebe s.pri...@profihost.ag Cc: qemu-sta...@nongnu.org Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com Thanks for debugging this Michael. I'm going to apply your patch to the qmp branch because we can't let this broken (esp. in -stable) but this is papering over the real bug... Do we really need OPENED to happen in a bottom half? Shouldn't the open event handlers register the callback instead if they need it? Regards, Anthony Liguori --- v1-v2: * remove command_mode reset from CHR_EVENT_OPENED case, since this might still cause a race monitor.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index 6ce2a4e..f1953a0 100644 --- a/monitor.c +++ b/monitor.c @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event) switch (event) { case CHR_EVENT_OPENED: -mon-mc-command_mode = 0; data = get_qmp_greeting(); monitor_json_emitter(mon, data); qobject_decref(data); @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event) json_message_parser_init(mon-mc-parser, handle_qmp_command); mon_refcount--; monitor_fdsets_cleanup(); +mon-mc-command_mode = 0; break; } }
Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Sun, 26 May 2013 10:33:39 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: In the past, CHR_EVENT_OPENED events were emitted via a pre-expired QEMUTimer. Due to timers being processing at the tail end of each main loop iteration, this generally meant that such events would be emitted within the same main loop iteration, prior any client data being read by tcp/unix socket server backends. With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to a bottom-half that is registered via g_idle_add(). This makes it likely that the event won't be sent until a subsequent iteration, and also adds the possibility that such events will race with the processing of client data. In cases where we rely on the CHR_EVENT_OPENED event being delivered prior to any client data being read, this may cause unexpected behavior. In the case of a QMP monitor session using a socket backend, the delayed processing of the CHR_EVENT_OPENED event can lead to a situation where a previous session, where 'qmp_capabilities' has already processed, can cause the rejection of 'qmp_capabilities' for a subsequent session, since we reset capabilities in response to CHR_EVENT_OPENED, which may not yet have been delivered. This can be reproduced with the following command, generally within 50 or so iterations: mdroth@loki:~$ cat cmds {'execute':'qmp_capabilities'} {'execute':'query-cpus'} mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock cmds | grep CommandNotFound /dev/null; then echo failed; break; else echo ok; fi; done; ok ok failed mdroth@loki:~$ As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED hook, which gets called as part of the GIOChannel cb associated with the client rather than a bottom-half, and is thus guaranteed to be delivered prior to accepting any subsequent client sessions. This does not address the more general problem of whether or not there are chardev users that rely on CHR_EVENT_OPENED being delivered prior to client data, and whether or not we should consider moving CHR_EVENT_OPENED in-band with connection establishment as a general solution, but fixes QMP for the time being. Reported-by: Stefan Priebe s.pri...@profihost.ag Cc: qemu-sta...@nongnu.org Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com Thanks for debugging this Michael. I'm going to apply your patch to the qmp branch because we can't let this broken (esp. in -stable) but this is papering over the real bug... Do we really need OPENED to happen in a bottom half? Shouldn't the open event handlers register the callback instead if they need it? I think that's the more general fix, but wasn't sure if there were historical reasons why we didn't do this in the first place. Looking at it more closely it does seem like we need a general fix sooner rather than later though, and I don't see any reason why we can't move BH registration into the actual OPENED hooks as you suggest: hw/usb/ccid-card-passthru.c - currently affected? no debug hook only - needs OPENED BH? no hw/usb/redirect.c - currently affected? yes can_read handler checks for dev-parser != NULL, which may be true if CLOSED BH has been executed yet. In the past, OPENED quiesced outstanding CLOSED events prior to us reading client data. - need OPENED BH? possibly seems to only be needed by CLOSED events, which can be issued by USBDevice, so we do teardown/usb_detach in BH. For OPENED, this may not apply. if we do issue a BH, we'd need to modify can_read handler to avoid race. hw/usb/dev-serial.c - currently affected? no can_read checks for dev.attached != NULL. set to NULL synchronously in CLOSED, will not accept reads until OPENED gets called and sets dev.attached - need opened BH? no hw/char/sclpconsole.c - currently affected? no guarded by can_read handler - need opened BH? no hw/char/ipoctal232.c - currently affected? no debug hook only - need opened BH? no hw/char/virtio-console.c - currently affected? no OPENED/CLOSED map to vq events handled asynchronously. can_read checks for guest_connected state rather than anything set by OPENED - need opened BH? no qtest.c - currently affected? yes can_read handler doesn't check for qtest_opened == true, can read data before OPENED event is processed - need opened BH? no monitor.c - current affected? yes negotiated session state can temporarilly leak into a new session - need opened BH? no gdbstub.c - currently affected? yes can fail to pause machine prior to starting gdb session - need opened BH? no So we have a number of cases that can be fixed by avoiding the use of the generic BH, and only 1 possible cause where we might need a
[Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
In the past, CHR_EVENT_OPENED events were emitted via a pre-expired QEMUTimer. Due to timers being processing at the tail end of each main loop iteration, this generally meant that such events would be emitted within the same main loop iteration, prior any client data being read by tcp/unix socket server backends. With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to a bottom-half that is registered via g_idle_add(). This makes it likely that the event won't be sent until a subsequent iteration, and also adds the possibility that such events will race with the processing of client data. In cases where we rely on the CHR_EVENT_OPENED event being delivered prior to any client data being read, this may cause unexpected behavior. In the case of a QMP monitor session using a socket backend, the delayed processing of the CHR_EVENT_OPENED event can lead to a situation where a previous session, where 'qmp_capabilities' has already processed, can cause the rejection of 'qmp_capabilities' for a subsequent session, since we reset capabilities in response to CHR_EVENT_OPENED, which may not yet have been delivered. This can be reproduced with the following command, generally within 50 or so iterations: mdroth@loki:~$ cat cmds {'execute':'qmp_capabilities'} {'execute':'query-cpus'} mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock cmds | grep CommandNotFound /dev/null; then echo failed; break; else echo ok; fi; done; ok ok failed mdroth@loki:~$ As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED hook, which gets called as part of the GIOChannel cb associated with the client rather than a bottom-half, and is thus guaranteed to be delivered prior to accepting any subsequent client sessions. This does not address the more general problem of whether or not there are chardev users that rely on CHR_EVENT_OPENED being delivered prior to client data, and whether or not we should consider moving CHR_EVENT_OPENED in-band with connection establishment as a general solution, but fixes QMP for the time being. Reported-by: Stefan Priebe s.pri...@profihost.ag Cc: qemu-sta...@nongnu.org Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com --- v1-v2: * remove command_mode reset from CHR_EVENT_OPENED case, since this might still cause a race monitor.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index 6ce2a4e..f1953a0 100644 --- a/monitor.c +++ b/monitor.c @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event) switch (event) { case CHR_EVENT_OPENED: -mon-mc-command_mode = 0; data = get_qmp_greeting(); monitor_json_emitter(mon, data); qobject_decref(data); @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event) json_message_parser_init(mon-mc-parser, handle_qmp_command); mon_refcount--; monitor_fdsets_cleanup(); +mon-mc-command_mode = 0; break; } } -- 1.7.9.5