On Thu, 17 Nov 2022 19:11:47 +0200 Avihai Horon <avih...@nvidia.com> wrote:
> On 16/11/2022 20:36, Alex Williamson wrote: > > External email: Use caution opening links or attachments > > > > > > On Thu, 3 Nov 2022 18:16:17 +0200 > > Avihai Horon <avih...@nvidia.com> wrote: > > > >> If vfio_migration_set_state() fails to set the device in the requested > >> state it tries to put it in a recover state. If setting the device in > >> the recover state fails as well, hw_error is triggered and the VM is > >> aborted. > >> > >> To improve user experience and avoid VM data loss, reset the device with > >> VFIO_RESET_DEVICE instead of aborting the VM. > >> > >> Signed-off-by: Avihai Horon <avih...@nvidia.com> > >> --- > >> hw/vfio/migration.c | 14 ++++++++++++-- > >> 1 file changed, 12 insertions(+), 2 deletions(-) > >> > >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > >> index f8c3228314..e8068b9147 100644 > >> --- a/hw/vfio/migration.c > >> +++ b/hw/vfio/migration.c > >> @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice > >> *vbasedev, > >> > >> mig_state->device_state = recover_state; > >> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { > >> - hw_error("%s: Failed setting device in recover state, err: > >> %s", > >> - vbasedev->name, strerror(errno)); > >> + error_report( > >> + "%s: Failed setting device in recover state, err: %s. > >> Resetting device", > >> + vbasedev->name, strerror(errno)); > >> + > >> + if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) { > >> + hw_error("%s: Failed resetting device, err: %s", > >> vbasedev->name, > >> + strerror(errno)); > >> + } > >> + > >> + migration->device_state = VFIO_DEVICE_STATE_RUNNING; > >> + > >> + return -1; > >> } > >> > >> migration->device_state = recover_state; > > This addresses one of my comments on 12/ and should probably be rolled > > in there. > > Not sure to which comment you refer to. Could you elaborate? Hmm, I guess I thought this was in the section immediately following where I questioned going to recovery state. I'm still not sure why this is a separate patch from the initial implementation of the function in 12/ though. Thanks, ' Alex