On Thu, 19 Jan 2023 at 11:58, Anton Kuchin <antonkuc...@yandex-team.ru> wrote: > > On 19/01/2023 18:02, Stefan Hajnoczi wrote: > > On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuc...@yandex-team.ru> > > wrote: > >> On 19/01/2023 16:30, Stefan Hajnoczi wrote: > >>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuc...@yandex-team.ru> > >>> wrote: > >>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote: > >>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuc...@yandex-team.ru> > >>>>> wrote: > >>>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents > >>>>>> qemu update without stopping the VM. In most cases that makes sense > >>>>>> because qemu has no way to transfer FUSE session state. > >>>>>> > >>>>>> But we can give an option to orchestrator to override this if it can > >>>>>> guarantee that state will be preserved (e.g. it uses migration to > >>>>>> update qemu and dst will run on the same host as src and use the same > >>>>>> socket endpoints). > >>>>>> > >>>>>> This patch keeps default behavior that prevents migration with such > >>>>>> devices > >>>>>> but adds migration capability 'vhost-user-fs' to explicitly allow > >>>>>> migration. > >>>>>> > >>>>>> Signed-off-by: Anton Kuchin <antonkuc...@yandex-team.ru> > >>>>>> --- > >>>>>> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > >>>>>> qapi/migration.json | 7 ++++++- > >>>>>> 2 files changed, 30 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > >>>>>> index f5049735ac..13d920423e 100644 > >>>>>> --- a/hw/virtio/vhost-user-fs.c > >>>>>> +++ b/hw/virtio/vhost-user-fs.c > >>>>>> @@ -24,6 +24,7 @@ > >>>>>> #include "hw/virtio/vhost-user-fs.h" > >>>>>> #include "monitor/monitor.h" > >>>>>> #include "sysemu/sysemu.h" > >>>>>> +#include "migration/migration.h" > >>>>>> > >>>>>> static const int user_feature_bits[] = { > >>>>>> VIRTIO_F_VERSION_1, > >>>>>> @@ -298,9 +299,31 @@ static struct vhost_dev > >>>>>> *vuf_get_vhost(VirtIODevice *vdev) > >>>>>> return &fs->vhost_dev; > >>>>>> } > >>>>>> > >>>>>> +static int vhost_user_fs_pre_save(void *opaque) > >>>>>> +{ > >>>>>> + MigrationState *s = migrate_get_current(); > >>>>>> + > >>>>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) > >>>>>> { > >>>>>> + error_report("Migration of vhost-user-fs devices requires > >>>>>> internal FUSE " > >>>>>> + "state of backend to be preserved. If > >>>>>> orchestrator can " > >>>>>> + "guarantee this (e.g. dst connects to the same > >>>>>> backend " > >>>>>> + "instance or backend state is migrated) set > >>>>>> 'vhost-user-fs' " > >>>>>> + "migration capability to true to enable > >>>>>> migration."); > >>>>>> + return -1; > >>>>>> + } > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> static const VMStateDescription vuf_vmstate = { > >>>>>> .name = "vhost-user-fs", > >>>>>> - .unmigratable = 1, > >>>>>> + .minimum_version_id = 0, > >>>>>> + .version_id = 0, > >>>>>> + .fields = (VMStateField[]) { > >>>>>> + VMSTATE_VIRTIO_DEVICE, > >>>>>> + VMSTATE_END_OF_LIST() > >>>>>> + }, > >>>>>> + .pre_save = vhost_user_fs_pre_save, > >>>>>> }; > >>>>> Will it be possible to extend this vmstate when virtiofsd adds support > >>>>> for stateful migration without breaking migration compatibility? > >>>>> > >>>>> If not, then I think a marker field should be added to the vmstate: > >>>>> 0 - stateless/reconnect migration (the approach you're adding in this > >>>>> patch) > >>>>> 1 - stateful migration (future virtiofsd feature) > >>>>> > >>>>> When the field is 0 there are no further vmstate fields and we trust > >>>>> that the destination vhost-user-fs server already has the necessary > >>>>> state. > >>>>> > >>>>> When the field is 1 there are additional vmstate fields that contain > >>>>> the virtiofsd state. > >>>>> > >>>>> The goal is for QEMU to support 3 migration modes, depending on the > >>>>> vhost-user-fs server: > >>>>> 1. No migration support. > >>>>> 2. Stateless migration. > >>>>> 3. Stateful migration. > >>>> Sure. These vmstate fields are very generic and mandatory for any > >>>> virtio device. If in future more state can be transfer in migration > >>>> stream the vmstate can be extended with additional fields. This can > >>>> be done with new subsections and/or bumping version_id. > >>> My concern is that the vmstate introduced in this patch may be > >>> unusable when stateful migration is added. So additional compatibility > >>> code will need to be introduced to make your stateless migration > >>> continue working with extended vmstate. > >>> > >>> By adding a marker field in this patch it should be possible to > >>> continue using the same vmstate for stateless migration without adding > >>> extra compatibility code in the future. > >> I understand, but this fields in vmstate just packs generic virtio > >> device state that is accessible by qemu. All additional data could be > >> added later by extra fields. I believe we couldn't pull off any type > >> of virtio device migration without transferring virtqueues so more > >> sophisticated types of migration would require adding more data and > >> not modification to this part of vmstate. > > What I'm saying is that your patch could define the vmstate such that > > it that contains a field to differentiate between stateless and > > stateful migration. That way QEMU versions that only support stateless > > migration (this patch) will be able to migrate to future QEMU versions > > that support both stateless and stateful without compatibility issues. > I double-checked migration documentation for subsections at > https://www.qemu.org/docs/master/devel/migration.html#subsections > and believe it perfectly describes our case: virtio device state > should always be transferred both in stateless or stateful migration. > With stateful one we would add new subsection with extra data that > will be transferred only if stateless capability is not set. We can > connect this subsection to device property and machine type if we > need to. > On the receiving side we always need basic virtio device state and > newer versions will be able to load extra data from subsection if it > is present, while older versions will be still able to accept the > migrations that were initiated from new versions with stateless flag > set and don't have extra subsection. > > The only scenario that will fail is older qemu won't be able to load > new migration stream with additional subsection that it can't > understand - this is general limitation of migration compatibility. > So we can't completely protect older versions from future migration > stream format because we don't know what will be in that stream > but looks like we have all the tools to maintain compatibility > reasonably wide. > > I'm not sure if my suggestion to add a marker field to vuf_vmstate is > > the best way to do this, but have you thought of how to handle the > > future addition of stateful migration to the vmstate without breaking > > stateless vmstates? Maybe David Gilbert has a suggestion for how to do > > this cleanly. > > > > Stefan > > I think we'd be better without a new marker because migration code > has standard generic way of solving such puzzles that I described > above. So adding new marker would go against existing practice.
That sounds good to me, thanks! Stefan