> > We find an issue when host mce trigger openvswitch(dpdk) restart in > > source host during guest migration, > > > Did you mean the vhost-user netev was deleted from the source host?
The vhost-user netev was not deleted from the source host. I mean that: in normal scenario, OVS(DPDK) begin to restart, then qemu_chr disconnect to OVS and link status is set to link down; OVS(DPDK) started, then qemu_chr reconnect to OVS and link status is set to link up. But in our scenario, before qemu_chr reconnect to OVS, the VM migrate is finished. The link_down of frontend was loaded from n->status in destination, it cause the network in gust never be up again. qemu_chr disconnect: #0 vhost_user_write (msg=msg@entry=0x7fff59ecb2b0, fds=fds@entry=0x0, fd_num=fd_num@entry=0, dev=0x295c730, dev=0x295c730) at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:239 #1 0x00000000004e6bad in vhost_user_get_vring_base (dev=0x295c730, ring=0x7fff59ecb510) at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:497 #2 0x00000000004e2e88 in vhost_virtqueue_stop (dev=dev@entry=0x295c730, vdev=vdev@entry=0x2ca36c0, vq=0x295c898, idx=0) at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1036 #3 0x00000000004e45ab in vhost_dev_stop (hdev=hdev@entry=0x295c730, vdev=vdev@entry=0x2ca36c0) at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1556 #4 0x00000000004bc56a in vhost_net_stop_one (net=0x295c730, dev=dev@entry=0x2ca36c0) at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:326 #5 0x00000000004bcc3b in vhost_net_stop (dev=dev@entry=0x2ca36c0, ncs=<optimized out>, total_queues=4) at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:407 #6 0x00000000004b85f6 in virtio_net_vhost_status (n=n@entry=0x2ca36c0, status=status@entry=7 '\a') at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:177 #7 0x00000000004b869f in virtio_net_set_status (vdev=<optimized out>, status=<optimized out>) at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:243 #8 0x000000000073d00d in qmp_set_link (name=name@entry=0x2956d40 "hostnet0", up=up@entry=false, errp=errp@entry=0x7fff59ecd718) at net/net.c:1437 #9 0x00000000007460c1 in net_vhost_user_event (opaque=0x2956d40, event=4) at net/vhost_user.c:217//qemu_chr_be_event #10 0x0000000000574f0d in tcp_chr_disconnect (chr=0x2951a40) at qemu_char.c:3220 #11 0x000000000057511f in tcp_chr_hup (channel=<optimized out>, cond=<optimized out>, opaque=<optimized out>) at qemu_char.c:3265 > > > > VM is still link down in frontend after migration, it cause the network in > > VM never be up again. > > > > virtio_net_load_device: > > /* nc.link_down can't be migrated, so infer link_down according > > * to link status bit in n->status */ > > link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0; > > for (i = 0; i < n->max_queues; i++) { > > qemu_get_subqueue(n->nic, i)->link_down = link_down; > > } > > > > guset: migrate begin -----> vCPU pause ---> vmsate load ---> > > migrate finish > > ^ ^ ^ > > | | | > > openvswitch in source host: begin to restart restarting started > > ^ ^ ^ > > | | | > > nc in frontend in source: link down link down link down > > ^ ^ ^ > > | | | > > nc in frontend in destination: link up link up link down > > ^ ^ ^ > > | | | > > guset network: broken broken broken > > ^ ^ ^ > > | | | > > nc in backend in source: link down link down link up > > ^ ^ ^ > > | | | > > nc in backend in destination: link up link up link up > > > > The link_down of frontend was loaded from n->status, n->status is link > > down in source, so the link_down of frontend is true. The backend in > > destination host is link up, but the frontend in destination host is link > > down, it cause the network in gust never be up again until an guest cold > > reboot. > > > > Is there a way to auto fix the link status? or just abort the migration in > > virtio net device load? > > > Maybe we can try to sync link status after migration? > > Thanks In extreme scenario, after migration the OVS(DPDK) in source may be still not started. Our plan is to check the link state of backend when load the link_down of frontend. /* nc.link_down can't be migrated, so infer link_down according * to link status bit in n->status */ - link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0; + if (qemu_get_queue(n->nic)->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) { + link_down = (n->status & VIRTIO_NET_S_LINK_UP | !qemu_get_queue(n->nic)->peer->link_down) == 0; + } else { + link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0; + } for (i = 0; i < n->max_queues; i++) { qemu_get_subqueue(n->nic, i)->link_down = link_down; } Is good enough to auto fix the link status? Thanks