Re: [PATCH for 9.0 08/12] vdpa: add vhost_vdpa_load_setup

Eugenio Perez Martin Tue, 19 Dec 2023 23:08:03 -0800

On Wed, Dec 20, 2023 at 6:22 AM Jason Wang <jasow...@redhat.com> wrote:
>
> On Sat, Dec 16, 2023 at 1:28 AM Eugenio Pérez <epere...@redhat.com> wrote:
> >
> > Callers can use this function to setup the incoming migration thread.
> >
> > This thread is able to map the guest memory while the migration is
> > ongoing, without blocking QMP or other important tasks. While this
> > allows the destination QEMU not to block, it expands the mapping time
> > during migration instead of making it pre-migration.
>
> If it's just QMP, can we simply use bh with a quota here?
>


Because QEMU cannot guarantee the quota at write(fd,
VHOST_IOTLB_UPDATE, ...). Also, synchronization with
vhost_vdpa_dev_start would complicate as it would need to be
re-scheduled too.

As a half-baked idea, we can split the mapping chunks in manageable
sizes, but I don't like that idea a lot.

> Btw, have you measured the hotspot that causes such slowness? Is it
> pinning or vendor specific mapping that slows down the progress? Or if
> VFIO has a similar issue?
>

Si-Wei did the actual profiling as he is the one with the 128G guests,
but most of the time was spent in the memory pinning. Si-Wei, please
correct me if I'm wrong.

I didn't check VFIO, but I think it just maps at realize phase with
vfio_realize -> vfio_attach_device -> vfio_connect_container(). In
previous testings, this delayed the VM initialization by a lot, as
we're moving that 20s of blocking to every VM start.

Investigating a way to do it only in the case of being the destination
of a live migration, I think the right place is .load_setup migration
handler. But I'm ok to move it for sure.

> >
> > This thread joins at vdpa backend device start, so it could happen that
> > the guest memory is so large that we still have guest memory to map
> > before this time.
>
> So we would still hit the QMP stall in this case?
>

This paragraph is kind of outdated, sorry. I can only cause this if I
don't enable switchover_ack migration capability and if I artificially
make memory pinning in the kernel artificially slow. But I didn't
check QMP to be honest, so I can try to test it, yes.

If QMP is not responsive, that means QMP is not responsive in QEMU
master in that period actually. So we're only improving anyway.

Thanks!

> > This can be improved in later iterations, when the
> > destination device can inform QEMU that it is not ready to complete the
> > migration.
> >
> > If the device is not started, the clean of the mapped memory is done at
> > .load_cleanup.  This is far from ideal, as the destination machine has
> > mapped all the guest ram for nothing, and now it needs to unmap it.
> > However, we don't have information about the state of the device so its
> > the best we can do.  Once iterative migration is supported, this will be
> > improved as we know the virtio state of the device.
> >
> > If the VM migrates before finishing all the maps, the source will stop
> > but the destination is still not ready to continue, and it will wait
> > until all guest RAM is mapped.  It is still an improvement over doing
> > all the map when the migration finish, but next patches use the
> > switchover_ack method to prevent source to stop until all the memory is
> > mapped at the destination.
> >
> > The memory unmapping if the device is not started is weird
> > too, as ideally nothing would be mapped.  This can be fixed when we
> > migrate the device state iteratively, and we know for sure if the device
> > is started or not.  At this moment we don't have such information so
> > there is no better alternative.
> >
> > Signed-off-by: Eugenio Pérez <epere...@redhat.com>
> >
> > ---
>
> Thanks
>

Re: [PATCH for 9.0 08/12] vdpa: add vhost_vdpa_load_setup

Reply via email to