On Thu, May 7, 2020 at 11:35 AM Dima Stepanov <dimas...@yandex-team.ru> wrote: > > What do you think? >
Apologies - I tripped over the if (dev->started && r < 0) check. Never-mind my point with race conditions and failing migrations. Rather than modifying vhost_dev_set_log(), it may be clearer to put a check after vhost_dev_log_resize()? Something like: --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -829,11 +829,22 @@ static int vhost_migration_log(MemoryListener *listener, int enable) vhost_log_put(dev, false); } else { vhost_dev_log_resize(dev, vhost_get_log_size(dev)); + /* + * A device can be stopped because of backend disconnect inside + * vhost_dev_log_resize(). In this case we should mark logging + * enabled and return without attempting to set the backend + * logging state. + */ + if (!dev->started) { + goto out_success; + } r = vhost_dev_set_log(dev, true); if (r < 0) { return r; } } + +out_success: dev->log_enabled = enable; return 0; } This seems harmless enough to me, and I see how it fixes your particular crash, but I would still prefer we worked towards a more robust solution. In particular I think we could handle this inside vhost-user-blk if we let the device state persist between connections (i.e. call vhost_dev_cleanup() inside vhost_user_blk_connect() before vhost_dev_init() on reconnect). This should also fix some of the crashes Li Feng has hit, and probably others which haven’t been reported yet. What do you think? If that’s unworkable I guess we will need to add these vhost level checks. In that case I would still prefer we add a “disconnected” flag in struct vhost_dev struct, and make sure it isn’t cleared by vhost_dev_cleanup(). That way we don’t conflate stopping a device with backend disconnect at the vhost level and potentially regress behavior for other device types.