On Thu, Aug 25, 2016 at 01:21:47PM +0000, Gaohaifeng (A) wrote: > On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote: > > Hi Daniel & Paolo, > > > > > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about > > > > > > the below code segment: > > > > > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, > > > void *opaque) > > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, > > > +void *opaque) > > > { > > > CharDriverState *chr = opaque; > > > TCPCharDriver *s = chr->opaque; > > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, > > > GIOCondition cond, void *opaque) > > > if (len > s->max_size) > > > len = s->max_size; > > > size = tcp_chr_recv(chr, (void *)buf, len); > > > - if (size == 0 || > > > - (size < 0 && > > > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) { > > > + if (size == 0 || size == -1) { > > > /* connection closed */ > > > tcp_chr_disconnect(chr); > > > } else if (size > 0) { > > > > > > The old version will not call tcp_chr_disconnect when error is not > > > EAGAIN. > > > The new version will call tcp_chr_disconnect when size==-1. From the > > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call > > > tcp_chr_disconnect. > > > > > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit > > > since link status is not up. The link is down because > > > tcp_chr_disconnect will set it. > > > > > > Can you explain it why EAGIN here changes the behavior ? > > > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an > > event loop watch so that tcp_chr_read is called when there is incoming data > > to be read. IOW, when tcp_chr_read is invoked, I would always expect that > > at least 1 byte is available, and so EAGAIN should not occurr. > > > > So I'm puzzled why you would get EAGAIN at all, unless there is some kind > > of race with other code also consuming bytes from the same socket. > > It could easily reproduce this when repeating 'rmmod virtio_net;modprobe > virtio_net'. > The call stack: > #0 tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867 > #1 0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, > opaque=0x7f09d5b2d540) at qemu-char.c:2917 > #2 0x00007f09d46364fa in qio_channel_fd_source_dispatch > (source=0x7f093e603b30, callback=0x7f09d43b1fcb <tcp_chr_read>, > user_data=0x7f09d5b2d540) at io/channel-watch.c:84 > #3 0x00007f09d2e9999a in g_main_context_dispatch () from > /lib64/libglib-2.0.so.0 > #4 0x00007f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213 > #5 0x00007f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at > main-loop.c:258 > #6 0x00007f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506 > #7 0x00007f09d43bbdd2 in main_loop () at vl.c:2119 > #8 0x00007f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, > envp=0x7ffe7ad66b70) at vl.c:4896 > (gdb) f 1 > #1 0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, > opaque=0x7f09d5b2d540) at qemu-char.c:2917 > 2917 tcp_chr_disconnect(chr); > (gdb) p errno > $1 = 11 > > EAGAIN = 11
I think what is happening is a race condition - the main event loop thread sees I/O incoming triggering tcp_chr_read. Meanwhile the vhost-user.c file is triggering vhost_user_read() in a different thread which consumes the incoming I/O before tcp_chr_read can handle it. IOW, you are right, we should have kept the EAGAIN handling code in tcp_chr_read to avoid the disconnects. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|