Hi, Also CCing Gerd.
On (Fri) 23 May 2014 [13:55:40], Stefan Hajnoczi wrote: > On Thu, May 15, 2014 at 11:23:54AM -0600, Chris Friesen wrote: > > I've run into a situation that seems like a bug. I'm using qemu 1.4.2 (with > > additional patches) from within openstack. > > > > I'm using virtio-serial-pci to provide a channel between the guest and host. > > > > On occasion when doing suspend/resume I run into a case where the main qemu > > thread ends up chewing 100% of a cpu. > > > > I attached strace to the thread and it showed qemu just spitting messages: > > > > write(35, "HRBT\0\1\0\3d<\230k\0\0\0\0\0\0\1\330\0\0\0\0enqueue\0"..., 472) > > = -1 EAGAIN (Resource temporarily unavailable) > > write(35, "HRBT\0\1\0\3d<\230k\0\0\0\0\0\0\1\330\0\0\0\0enqueue\0"..., 472) > > = -1 EAGAIN (Resource temporarily unavailable) > > write(35, "HRBT\0\1\0\3d<\230k\0\0\0\0\0\0\1\330\0\0\0\0enqueue\0"..., 472) > > = -1 EAGAIN (Resource temporarily unavailable) > > write(35, "HRBT\0\1\0\3d<\230k\0\0\0\0\0\0\1\330\0\0\0\0enqueue\0"..., 472) > > = -1 EAGAIN (Resource temporarily unavailable) > > > > File descriptor 35 is the unix socket corresponding to the virtio-serial > > port. > > > > I broke in with gdb and got a backtrace showing it was in send_all(). > > Looking at the implementation of send_all(), the core loop looks like: > > > > while (len > 0) { > > ret = write(fd, buf, len); > > if (ret < 0) { > > if (errno != EINTR && errno != EAGAIN) > > return -1; > > } else if (ret == 0) { > > break; > > } else { > > buf += ret; > > len -= ret; > > } > > } > > > > > > So if we get EAGAIN, we'll just immediately retry. > > > > I'm not sure where the unix socket would get opened, but I'm assuming it's > > set as non-blocking? And by default /proc/sys/net/unix/max_dgram_qlen is > > set to 10. > > > > So if the other end of that unix socket is connected but isn't actually > > paying attention to the messages then the first 10 messages will get > > buffered but after that we'll end up with qemu spinning forever in a > > busy-loop trying to send a message into a full buffer. > > > > This seems less than ideal. Either we should block, or else we should > > discard the data. And I don't think discarding the data makes sense. Chardev flow control was added to 1.5.0. Can you re-try with that release and let us know if it still behaves similarly? http://wiki.qemu.org/Features/ChardevFlowControl Thanks, Amit