On 16/06/2023 15:04, Cédric Le Goater wrote: > On 6/16/23 12:12, Joao Martins wrote: >> On 16/06/2023 10:53, Duan, Zhenzhong wrote: >>>> -----Original Message----- >>>> From: Joao Martins <joao.m.mart...@oracle.com> >>>> Sent: Friday, June 16, 2023 5:06 PM >>>> To: Duan, Zhenzhong <zhenzhong.d...@intel.com> >>>> Cc: alex.william...@redhat.com; c...@redhat.com; qemu-devel@nongnu.org; >>>> Peng, Chao P <chao.p.p...@intel.com> >>>> Subject: Re: [PATCH] vfio/migration: Fix return value of >>>> vfio_migration_realize() >>>> >>>> On 16/06/2023 03:42, Duan, Zhenzhong wrote: >>>>>> -----Original Message----- >>>>>> From: Joao Martins <joao.m.mart...@oracle.com> >>>>>> Sent: Thursday, June 15, 2023 6:23 PM >>>>>> To: Duan, Zhenzhong <zhenzhong.d...@intel.com> >>>>>> Cc: alex.william...@redhat.com; c...@redhat.com; >>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.p...@intel.com> >>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of >>>>>> vfio_migration_realize() >>>>>> >>>>>> On 15/06/2023 10:19, Duan, Zhenzhong wrote: >>>>>>>> -----Original Message----- >>>>>>>> From: Joao Martins <joao.m.mart...@oracle.com> >>>>>>>> Sent: Thursday, June 15, 2023 4:54 PM >>>>>>>> To: Duan, Zhenzhong <zhenzhong.d...@intel.com> >>>>>>>> Cc: alex.william...@redhat.com; c...@redhat.com; >>>>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.p...@intel.com> >>>>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of >>>>>>>> vfio_migration_realize() >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 15/06/2023 09:18, Zhenzhong Duan wrote: >>>>>>>>> We should print "Migration disabled" when migration is blocked in >>>>>>>>> vfio_migration_realize(). >>>>>>>>> >>>>>>>>> Fix it by reverting return value of migrate_add_blocker(), >>>>>>>>> meanwhile error out directly once migrate_add_blocker() failed. >>>>>>>>> >>>>>>>> >>>>>>>> It wasn't immediately obvious from commit message that this is >>>>>>>> mainly about just printing an error message when blockers get added >>>>>>>> and that well >>>>>>>> migrate_add_blocker() returns 0 on success and caller of >>>>>>>> vfio_migration_realize expects the opposite when blockers are added. >>>>>>>> >>>>>>>> Perhaps better wording would be: >>>>>>>> >>>>>>>> migrate_add_blocker() returns 0 when successfully adding the >>>>>>>> migration blocker. However, the caller of vfio_migration_realize() >>>>>>>> considers that migration was blocked when the latter returned an >>>>>>>> error. Fix it by negating the return value obtained by >>>>>>>> migrate_add_blocker(). What matters for migration is that the >>>>>>>> blocker is added in core migration, so this cleans up usability >>>>>>>> such that user sees "Migrate disabled" when any of the vfio >>>>>>>> migration blockers are >>>>>> active. >>>>>>> >>>>>>> Great, I'll use your words. >>>>>>> >>>>>>>> >>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>>>>>>> --- >>>>>>>>> hw/vfio/common.c | 4 ++-- >>>>>>>>> hw/vfio/migration.c | 6 +++--- >>>>>>>>> 2 files changed, 5 insertions(+), 5 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index >>>>>>>>> fa8fd949b1cf..8505385798f3 100644 >>>>>>>>> --- a/hw/vfio/common.c >>>>>>>>> +++ b/hw/vfio/common.c >>>>>>>>> @@ -399,7 +399,7 @@ int >>>>>>>>> vfio_block_multiple_devices_migration(Error >>>>>>>> **errp) >>>>>>>>> multiple_devices_migration_blocker = NULL; >>>>>>>>> } >>>>>>>>> >>>>>>>>> - return ret; >>>>>>>>> + return !ret; >>>>>>>>> } >>>>>>>>> >>>>>>>>> void vfio_unblock_multiple_devices_migration(void) >>>>>>>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp) >>>>>>>>> giommu_migration_blocker = NULL; >>>>>>>>> } >>>>>>>>> >>>>>>>>> - return ret; >>>>>>>>> + return !ret; >>>>>>>>> } >>>>>>>>> >>>>>>>>> void vfio_migration_finalize(void) diff --git >>>>>>>>> a/hw/vfio/migration.c b/hw/vfio/migration.c index >>>>>>>>> 6b58dddb8859..0146521d129a 100644 >>>>>>>>> --- a/hw/vfio/migration.c >>>>>>>>> +++ b/hw/vfio/migration.c >>>>>>>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice >>>>>>>>> *vbasedev, >>>>>>>> Error **errp) >>>>>>>>> } >>>>>>>>> >>>>>>>>> ret = vfio_block_multiple_devices_migration(errp); >>>>>>>>> - if (ret) { >>>>>>>>> + if (ret || (errp && *errp)) { >>>>>>>> >>>>>>>> Why do you need this extra clause? >>>>>>> >>>>>>> Now that error happens, no need to add other blockers which will >>>>>>> fail for same reason. >>>>>>> >>>>>> >>>>>> But you don't need the (errp && *errp) for that as that's the whole >>>>>> point of negating the result. >>>>>> >>>>>> And if there's an error set it means migrate_add_blocker failed to >>>>>> add the blocker (e.g. snapshotting IIUC), and you would be failing here >>>> unnecessarily? >>>>> >>>>> If there is an error qdev_device_add() will fail, continue execution is >>>> meaningless here? >>>>> There is ERRP_GUARD in this path, so it looks (*errp) is enough. >>>>> >>>>> If I removed (errp && *errp) to continue, need below change to bypass >>>>> trace_vfio_migration_probe Do you prefer this way? >>>>> >>>>> if (!*errp) { >>>>> trace_vfio_migration_probe(vbasedev->name); >>>>> } >>>>> >>>> >>>> I am mainly questioning that the error testing is correct to test here. >>>> >>>> IIUC, the only one that can propagate any *new* error in >>>> vfio_migration_realize is the calls to migrate_add_blocker failing within >>>> the >>>> vfio_block* (migration code suggests that this happens on snapshotting). >>>> Failing to add migration blocker just means we haven't installed any >>>> blockers. >>>> And the current code presents that as a "Migration disabled" case. If we >>>> want >>>> to preserve that behaviour on migration_add_blocker() failures (which seems >>>> like that's what you are doing here) then instead of this: >>> >>> Current behavior(without my patch): >>> "Migration disabled" isn't printed if migrate_add_blocker succeed. >>> "Migration disabled" is printed if migrate_add_blocker fail. >>> >>> I think this behavior isn't correct, I want to revert it not preserve it, so >>> I used !ret. >>> Imagine we hotplug a vfio device when snapshotting, migrate_add_blocker >>> failure >>> will lead to hotplug fail, then the guest is still migratable as no vfio >>> plugged. >>> But we see "Migration disabled" which will confuse us. >>> >> >> /me nods > > Yes. Overall, the reason for which migration is not supported or is > disabled is not very clear today (for the user). It might need some > more adjustments, like this one, before we remove the experimental flag.
I think the "Migration Disabled" was an UX oversight when we refactored blockers into vfio_migration_realize(). The actual migration is *not* disabled, it is just that the message or tracepoint shouldn't be printed. The reasons why it is blocked / not supported are clear in code right now. > It will also help QE to define test scenarios and track expected results. > >> >> >>>> return !ret; >>>> >>>> you would do this: >>>> >>>> ret = migration_add_blocker(...); >>>> if (ret < 0) { >>>> error_free(...); >>>> blocker = NULL; >>>> + return ret; >>>> } >>>> >>>> + return 1; >>>> >>>> Or something like that. And then if you return early as you intended? >>> >>> Yes, this change make sense, I also want to add below: >>> >>> if (!pdev->failover_pair_id) { >>> ret = vfio_migration_realize(vbasedev, errp); >>> - if (ret) { >>> + if (ret > 0) { >>> error_report("%s: Migration disabled", vbasedev->name); >>> } >>> >> Makes sense. Checking errp above before printing the tracepoint (like you >> suggested) is also an option taking the discussion so far, but perhaps the >> return type to be bool to make it clear to callers that this is not no >> longer an >> error code? Maybe let's wait for others on what style is generally preferred >> in >> error propagation. > > I think the code should follow the qdev_realize() prototype, which returns > a bool. > That makes sense, it makes the decision a lot simpler. > Thanks, > > C. > >