On Thu, 23 Jan 2014 08:44:02 -0500 Luiz Capitulino <lcapitul...@redhat.com> wrote:
> On Thu, 23 Jan 2014 19:23:51 +0800 > Fam Zheng <f...@redhat.com> wrote: > > > Bcc: > > Subject: Re: [Qemu-devel] Possible bug in monitor code > > Reply-To: > > In-Reply-To: <52e0ec4b.7010...@grnet.gr> > > > > On Thu, 01/23 12:17, Stratos Psomadakis wrote: > > > On 01/23/2014 05:07 AM, Fam Zheng wrote: > > > > On Wed, 01/22 17:53, Stratos Psomadakis wrote: > > > >> Hi, > > > >> > > > >> we've encountered a weird issue regarding monitor (qmp and hmp) > > > >> behavior > > > >> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the > > > >> issue: > > > >> > > > >> 1) Client A connects to qmp socket with socat > > > >> 2) Client A gets greeting message {"QMP": {"version": ..} > > > >> 3) Client A waits (select on the socket's fd) > > > >> 4) Client B tries to connect to the *same* qmp socket with socat > > QMP/HMP can only handle a single client per connection, so this is > not supported. What you could do is to create multiple QMP/HMP instances > on the command-line, but this has never been fully tested so we don't > officially support this either (although it may work). > > Now, not gracefully failing on step 4 is a real bug here. I think the > best thing to do would be to close client's B connection. Patches are > welcome :) Thinking about this again, I think this should already be the case (as I don't believe chardev code is sitting on accept()). So maybe you triggered a race on chardev code or broke something there. > > > > >> 5) Client B does *NOT* get any greating message > > > >> 6) Client B waits (select on the socket's fd) > > > >> 7) Client B closes connection (kill socat) > > > >> 8) Client A quits too > > > >> 9) Client C connects to qmp socket > > > >> 10) Client C gets *two* greeting messages!!! > > > > Hi Stratos, thank you for debugging and reporting this. > > > > > > > > I tested this sequence but can't fully reproduce this. What I see is 5) > > > > but no > > > > 10). Client C acts normally. And your patch below doesn't solve it for > > > > me. > > > > > > Hm, which qemu version (or repo branch / tag) did you use? We did a > > > quick scan of the master branch code / commits, but we didn't find > > > anything that might fix the issue. > > > > I tried on qemu.git master, and also 1.7. I think it is a bug: in my test, > > step > > 5), B not getting any greeting message. > > > > But I get only one greeting message in step 10), which is a bit different > > from > > what you reported. > > > > And no difference with your patch applied. > > > > Cc'ing Luiz who maintains the monitor code and may have more ideas. > > > > Thanks, > > > > Fam > > > > > > > > > To submit a patch, please follow instructions as described in > > > > http://wiki.qemu.org/Contribute/SubmitAPatch > > > > so it could be picked up by maintainers. Specifically, you need to > > > > format your > > > > patch email with "git format-patch" and add a "Signed-off-by:" line in > > > > your > > > > patch email. > > > > > > Ok. If any dev can confirm that this is a bug (and that the patch below > > > is the correct way to fix it) I'll resubmit it properly. > > > > > > Thanks, > > > Stratos > > > > > > > Thanks, > > > > > > > > Fam > > > > > > > >> After some investigation, we traced it down to the monitor_flush() > > > >> function in monitor.c. Specifically, when a second client connects to > > > >> the qmp (client B), while another one is already using it (client A), > > > >> we > > > >> get the following from stracing the second client (client B): > > > >> > > > >> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0 > > > >> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0 > > > >> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3]) > > > >> select(4, [0 3], [], [], NULL > > > >> > > > >> So, the connect() syscall from client B succeeds, although client B > > > >> connection has not yet been accepted by the qmp server (it's still in > > > >> the backlog of the qmp listening socket). > > > >> > > > >> After killing client B and then client A, we see the following when > > > >> stracing the qemu proc: > > > >> > > > >> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9 > > > >> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR) > > > >> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > > > >> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 > > > >> 22363 fcntl(9, F_GETFL) = 0x802 (flags > > > >> O_RDWR|O_NONBLOCK) > > > >> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) = > > > >> -1 EPIPE (Broken pipe) > > > >> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) --- > > > >> > > > >> The qmp server / qemu accepts the connection from client B (who has now > > > >> closed the connection) and tries to write the greeting message to the > > > >> socket fd. This results in write returning an error (EPIPE). > > > >> > > > >> The monitor_flush() function doesn't seem to handle this case (write > > > >> error). Instead, it adds a watch / handler to retry the write > > > >> operation. > > > >> Thus, mon->outbuf is not cleaned up properly, which results in > > > >> duplicate > > > >> greeting messages for the next client to connect. > > > >> > > > >> The following seems to do the trick. > > > >> > > > >> diff --git a/monitor.c b/monitor.c > > > >> index 845f608..5622f20 100644 > > > >> --- a/monitor.c > > > >> +++ b/monitor.c > > > >> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon) > > > >> > > > >> if (len && !mon->mux_out) { > > > >> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len); > > > >> - if (rc == len) { > > > >> - /* all flushed */ > > > >> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) { > > > >> + /* all flushed or error */ > > > >> QDECREF(mon->outbuf); > > > >> mon->outbuf = qstring_new(); > > > >> return; > > > >> > > > >> Comments? > > > >> > > > >> Thanks, > > > >> Stratos > > > >> > > > >> -- > > > >> Stratos Psomadakis > > > >> <pso...@grnet.gr> > > > >> > > > > > > > > > > > > > -- > > > Stratos Psomadakis > > > <pso...@grnet.gr> > > > > > > > > > > >