> > We find an issue when host mce trigger openvswitch(dpdk) restart in 
> > source host during guest migration,
>
>
> Did you mean the vhost-user netev was deleted from the source host?


The vhost-user netev was not deleted from the source host. I mean that:
in normal scenario, OVS(DPDK) begin to restart, then qemu_chr disconnect to OVS 
and link status is set to link down; OVS(DPDK) started, then qemu_chr reconnect 
to OVS and link status is set to link up. But in our scenario, before qemu_chr 
reconnect to OVS, the VM migrate is finished. The link_down of frontend was 
loaded from n->status in destination, it cause the network in gust never be up 
again.

qemu_chr disconnect:
#0  vhost_user_write (msg=msg@entry=0x7fff59ecb2b0, fds=fds@entry=0x0, 
fd_num=fd_num@entry=0, dev=0x295c730, dev=0x295c730)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:239
#1  0x00000000004e6bad in vhost_user_get_vring_base (dev=0x295c730, 
ring=0x7fff59ecb510)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:497
#2  0x00000000004e2e88 in vhost_virtqueue_stop (dev=dev@entry=0x295c730, 
vdev=vdev@entry=0x2ca36c0, vq=0x295c898, idx=0)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1036
#3  0x00000000004e45ab in vhost_dev_stop (hdev=hdev@entry=0x295c730, 
vdev=vdev@entry=0x2ca36c0)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1556
#4  0x00000000004bc56a in vhost_net_stop_one (net=0x295c730, 
dev=dev@entry=0x2ca36c0)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:326
#5  0x00000000004bcc3b in vhost_net_stop (dev=dev@entry=0x2ca36c0, 
ncs=<optimized out>, total_queues=4)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:407
#6  0x00000000004b85f6 in virtio_net_vhost_status (n=n@entry=0x2ca36c0, 
status=status@entry=7 '\a')
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:177
#7  0x00000000004b869f in virtio_net_set_status (vdev=<optimized out>, 
status=<optimized out>)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:243
#8  0x000000000073d00d in qmp_set_link (name=name@entry=0x2956d40 "hostnet0", 
up=up@entry=false, errp=errp@entry=0x7fff59ecd718)
    at net/net.c:1437
#9  0x00000000007460c1 in net_vhost_user_event (opaque=0x2956d40, event=4) at 
net/vhost_user.c:217//qemu_chr_be_event
#10 0x0000000000574f0d in tcp_chr_disconnect (chr=0x2951a40) at qemu_char.c:3220
#11 0x000000000057511f in tcp_chr_hup (channel=<optimized out>, cond=<optimized 
out>, opaque=<optimized out>) at qemu_char.c:3265


>
>
> > VM is still link down in frontend after migration, it cause the network in 
> > VM never be up again.
> >
> > virtio_net_load_device:
> >      /* nc.link_down can't be migrated, so infer link_down according
> >       * to link status bit in n->status */
> >      link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
> >      for (i = 0; i < n->max_queues; i++) {
> >          qemu_get_subqueue(n->nic, i)->link_down = link_down;
> >      }
> >
> > guset:               migrate begin -----> vCPU pause ---> vmsate load ---> 
> > migrate finish
> >                                      ^                ^                ^
> >                                      |                |                |
> > openvswitch in source host:   begin to restart   restarting        started
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in frontend in source:        link down        link down        link down
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in frontend in destination:   link up          link up          link down
> >                                      ^                ^                ^
> >                                      |                |                |
> > guset network:                    broken           broken           broken
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in backend in source:         link down        link down        link up
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in backend in destination:    link up          link up          link up
> >
> > The link_down of frontend was loaded from n->status, n->status is link 
> > down in source, so the link_down of frontend is true. The backend in 
> > destination host is link up, but the frontend in destination host is link 
> > down, it cause the network in gust never be up again until an guest cold 
> > reboot.
> >
> > Is there a way to auto fix the link status? or just abort the migration in 
> > virtio net device load?
>
>
> Maybe we can try to sync link status after migration?
>
> Thanks


In extreme scenario, after migration the OVS(DPDK) in source may be still not 
started.


Our plan is to check the link state of backend when load the link_down of 
frontend.
     /* nc.link_down can't be migrated, so infer link_down according
      * to link status bit in n->status */
-    link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    if (qemu_get_queue(n->nic)->peer->info->type == 
NET_CLIENT_DRIVER_VHOST_USER) {
+        link_down = (n->status & VIRTIO_NET_S_LINK_UP | 
!qemu_get_queue(n->nic)->peer->link_down) == 0;
+    } else {
+        link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    }
     for (i = 0; i < n->max_queues; i++) {
         qemu_get_subqueue(n->nic, i)->link_down = link_down;
     }

Is good enough to auto fix the link status?

Thanks

Reply via email to