On Mon, 5 Dec 2022 at 14:24, Ming Yang via <qemu-devel@nongnu.org> wrote: > > Hot-pluging a vhost-net may cause virtual machine crash in following steps: > 1. Starting a vm without net devices. > 2. Hot-pluging 70 memory devices. > 3. Hot-pluging a vhost-net device. > > The reason is : if hotplug a vhost-net failed, the nc cannot be found via > function qemu_find_netdev, as > it has been cleaned up through function qemu_cleanup_net_client. Which leads > to the result > that assert(nc) failed, then qemu crashed. > > While, the root reason is that, in commit 46d4d36d0bf2 if not both > has_vhostforce and vhostforce flags > are true, the errp would not be set. Then net_init_tap would not return a > negative value, fallowed by founding nc > and assert nc. > > In this patch, asserting nc is replaced with setting an error message. > > Fixes: 46d4d36d0bf2("tap: setting error appropriately when calling > net_init_tap_one()") > Signed-off-by: Ming Yang <yangmin...@huawei.com> > Signed-off-by: Liang Zhang <zhanglia...@huawei.com> > --- > net/net.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/net/net.c b/net/net.c > index 840ad9dca5..1d1d7e54c4 100644 > --- a/net/net.c > +++ b/net/net.c > @@ -1103,7 +1103,16 @@ static int net_client_init1(const Netdev *netdev, bool > is_netdev, Error **errp) > > if (is_netdev) { > nc = qemu_find_netdev(netdev->id); > - assert(nc); > + /* > + * If the tap of hotpluged net device do not has both has_vhostforce > flag and vhostforce flags, > + * when error occurs, the error messags will be report but not set > to errp. Thus net_client_init_fun > + * will not return a negatave value. Therefore the value of nc might > be NULL. To make qemu robust, > + * it is better to judge if nc is NULL. > + */ > + if (!nc) { > + error_setg(errp, "Device '%s' could not be initialized", > netdev->id); > + return -1; > + } > nc->is_netdev = true; > }
This doesn't look like the right fix to me. If the net_client_init_fun doesn't correctly initialize the netdev, it should report that error back to the caller. We should make the tap init function correctly return an error in this error situation, not work around it in the caller. The assert() is correct, because it is detecting a bug elsewhere in QEMU that we need to fix. thanks -- PMM