On Wed, Sep 03, 2014 at 11:43:54AM +0400, Andrey Korolyov wrote: > On Wed, Sep 3, 2014 at 10:10 AM, Michael S. Tsirkin <m...@redhat.com> wrote: > > On Wed, Sep 03, 2014 at 02:17:02AM +0400, Andrey Korolyov wrote: > >> On Wed, Sep 3, 2014 at 2:09 AM, Andrey Korolyov <and...@xdel.ru> wrote: > >> > On Wed, Sep 3, 2014 at 1:51 AM, Michael S. Tsirkin <m...@redhat.com> > >> > wrote: > >> >> On Wed, Sep 03, 2014 at 01:29:29AM +0400, Andrey Korolyov wrote: > >> >>> On Wed, Sep 3, 2014 at 1:03 AM, Michael S. Tsirkin <m...@redhat.com> > >> >>> wrote: > >> >>> >> bad one is the > >> >>> >> > >> >>> >> Author: Jason Wang <jasow...@redhat.com> > >> >>> >> Date: Tue Sep 2 18:07:46 2014 +0300 > >> >>> >> > >> >>> >> vhost_net: start/stop guest notifiers properly > >> >>> > > >> >>> > > >> >>> > > >> >>> > upstream has this (pull request sent today): > >> >>> > vhost_net: cleanup start/stop condition > >> >>> > > >> >>> > Could you apply it and see if it helps please? > >> >>> > > >> >>> > Michael, if it helps it should be before start/stop guest notifiers > >> >>> > ideally to avoid bisect problems. > >> >>> > >> >>> It is already applied as shown from the list in the previous message > >> >>> (there are some aio fixes too on top of 2.1 I picked before but they > >> >>> should not impact vhost-net interaction in any mean). The symptoms are > >> >>> a bit interesting - VM crashes only at PCI device initalization (e.g. > >> >>> grub stage after reset and initrd unpacking are passing well, but then > >> >>> things getting ugly). I am running 3.14 guest i686-pae kernel from > >> >>> debian backports in guest, so it may be version-specific after all. If > >> >>> it`ll be hard to reproduce, I can try 64bit, expecting same behavior. > >> >>> Please find args in attached file. > >> >> > >> >> > >> >> > >> >> ok just to make sure - which tree do I clone exactly? > >> >> > >> > > >> > https://github.com/mdroth/qemu.git stable-2.1-staging showing same > >> > behavior for me with those patches > >> > >> Forgot to mention important detail - I am playing with -mq now, so > >> actually virtio-net working in a bit different way than it may > >> expected (it also shown in args list from above, but someone may miss > >> it): > >> ... > >> qemu-system-x86_64: unable to start vhost net: 95: falling back on > >> userspace virtio > >> qemu-system-x86_64: unable to start vhost net: 95: falling back on > >> userspace virtio > >> ... > > > > > > OK I see at least one obvious bug there: does the following fix the > > crash for you? > > Separately, we need to debug why mq vhost is broken for you. > > Is this a regression? > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > index ba5d544..1fe18c7 100644 > > --- a/hw/net/vhost_net.c > > +++ b/hw/net/vhost_net.c > > @@ -289,7 +289,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState > > *ncs, > > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); > > VirtioBusState *vbus = VIRTIO_BUS(qbus); > > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); > > - int r, i = 0; > > + int r, i; > > > > if (!vhost_net_device_endian_ok(dev)) { > > error_report("vhost-net does not support cross-endian"); > > @@ -317,16 +317,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState > > *ncs, > > r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev); > > > > if (r < 0) { > > - goto err; > > + goto err_start; > > } > > } > > > > return 0; > > > > -err: > > +err_start: > > while (--i >= 0) { > > vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev); > > } > > +err: > > + r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false); > > + if (r < 0) { > > + fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r); > > + fflush(stderr); > > + } > > return r; > > } > > > > > another bits of information: > - the userspace fallback is not specific to mq (very unfortunately > for me because I didn`t checked this exact regression week before when > I saw it for mq and it is not specific for queued patches for 2.1.1), > - bug itself is not specific to mq, reproduces every time even with > more generic interface config without queues, > - patch from above does not fix the issue. > > Strace output for all threads is available at > http://xdel.ru/downloads/qemu.out.gz, attached just before reset.
OK does my patch help? Jason sent patches to fix the fallback to virtio bug - does that work for you?