On Thu, Aug 25, 2016 at 01:21:47PM +0000, Gaohaifeng (A) wrote:
> On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote:
> > Hi Daniel & Paolo,
> > > 
> > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about
> > > 
> > > the below code segment:
> > > 
> > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, 
> > > void *opaque)
> > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, 
> > > +void *opaque)
> > > {
> > >      CharDriverState *chr = opaque;
> > >      TCPCharDriver *s = chr->opaque;
> > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, 
> > > GIOCondition cond, void *opaque)
> > >      if (len > s->max_size)
> > >          len = s->max_size;
> > >      size = tcp_chr_recv(chr, (void *)buf, len);
> > > -    if (size == 0 ||
> > > -        (size < 0 &&
> > > -         socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
> > > +    if (size == 0 || size == -1) {
> > >          /* connection closed */
> > >          tcp_chr_disconnect(chr);
> > >      } else if (size > 0) {
> > > 
> > > The old version will not call tcp_chr_disconnect when error is not 
> > > EAGAIN.
> > > The new version will call tcp_chr_disconnect when size==-1. From the 
> > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call 
> > > tcp_chr_disconnect.
> > > 
> > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit 
> > > since link status is not up. The link is down because 
> > > tcp_chr_disconnect will set it.
> > > 
> > > Can you explain it why EAGIN here changes the behavior ?
> 
> > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an 
> > event loop watch so that tcp_chr_read is called when there is incoming data 
> > to be read.  IOW, when tcp_chr_read is invoked, I would always expect that 
> > at least 1 byte is available, and so EAGAIN should not occurr.
> > 
> > So I'm puzzled why you would get EAGAIN at all, unless there is some kind 
> > of race with other code also consuming bytes from the same socket.
> 
> It could easily reproduce this when repeating 'rmmod virtio_net;modprobe 
> virtio_net'.
> The call stack:
> #0  tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867
> #1  0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, 
> opaque=0x7f09d5b2d540) at qemu-char.c:2917
> #2  0x00007f09d46364fa in qio_channel_fd_source_dispatch 
> (source=0x7f093e603b30, callback=0x7f09d43b1fcb <tcp_chr_read>, 
> user_data=0x7f09d5b2d540) at io/channel-watch.c:84
> #3  0x00007f09d2e9999a in g_main_context_dispatch () from 
> /lib64/libglib-2.0.so.0
> #4  0x00007f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213
> #5  0x00007f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at 
> main-loop.c:258
> #6  0x00007f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506
> #7  0x00007f09d43bbdd2 in main_loop () at vl.c:2119
> #8  0x00007f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, 
> envp=0x7ffe7ad66b70) at vl.c:4896
> (gdb) f 1
> #1  0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, 
> opaque=0x7f09d5b2d540) at qemu-char.c:2917
> 2917          tcp_chr_disconnect(chr);
> (gdb) p errno
> $1 = 11
> 
> EAGAIN = 11

I think what is happening is a race condition - the main event loop thread
sees I/O incoming triggering tcp_chr_read. Meanwhile the vhost-user.c file is
triggering vhost_user_read() in a different thread which consumes the incoming
I/O before tcp_chr_read can handle it.

IOW, you are right, we should have kept the EAGAIN handling code in tcp_chr_read
to avoid the disconnects.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

Reply via email to