On Wed, Jan 3, 2024 at 7:16 AM Peter Xu <pet...@redhat.com> wrote:
>
> On Tue, Jan 02, 2024 at 12:28:48PM +0100, Eugenio Perez Martin wrote:
> > On Tue, Jan 2, 2024 at 6:33 AM Peter Xu <pet...@redhat.com> wrote:
> > >
> > > Jason, Eugenio,
> > >
> > > Apologies for a late reply; just back from the long holiday.
> > >
> > > On Thu, Dec 21, 2023 at 09:20:40AM +0100, Eugenio Perez Martin wrote:
> > > > Si-Wei did the actual profiling as he is the one with the 128G guests,
> > > > but most of the time was spent in the memory pinning. Si-Wei, please
> > > > correct me if I'm wrong.
> > >
> > > IIUC we're talking about no-vIOMMU use case.  The pinning should indeed
> > > take a lot of time if it's similar to what VFIO does.
> > >
> > > >
> > > > I didn't check VFIO, but I think it just maps at realize phase with
> > > > vfio_realize -> vfio_attach_device -> vfio_connect_container(). In
> > > > previous testings, this delayed the VM initialization by a lot, as
> > > > we're moving that 20s of blocking to every VM start.
> > > >
> > > > Investigating a way to do it only in the case of being the destination
> > > > of a live migration, I think the right place is .load_setup migration
> > > > handler. But I'm ok to move it for sure.
> > >
> > > If it's destined to map the 128G, it does sound sensible to me to do it
> > > when VM starts, rather than anytime afterwards.
> > >
> >
> > Just for completion, it is not 100% sure the driver will start the
> > device. But it is likely for sure.
>
> My understanding is that vDPA is still a quite special device, assuming
> only targeting advanced users, and should not appear in a default config
> for anyone.  It means the user should hopefully remove the device if the
> guest is not using it, instead of worrying on a slow boot.
>
> >
> > > Could anyone help to explain what's the problem if vDPA maps 128G at VM
> > > init just like what VFIO does?
> > >
> >
> > The main problem was the delay of VM start. In the master branch, the
> > pinning is done when the driver starts the device. While it takes the
> > BQL, the rest of the vCPUs can move work forward while the host is
> > pinning. So the impact of it is not so evident.
> >
> > To move it to initialization time made it very noticeable. To make
> > things worse, QEMU did not respond to QMP commands and similar. That's
> > why it was done only if the VM was the destination of a LM.
>
> Is that a major issue for us?

To me it is a regression but I'm ok with it for sure.

>  IIUC then VFIO shares the same condition.
> If it's a real problem, do we want to have a solution that works for both
> (or, is it possible)?
>

I would not consider a regression for VFIO since I think it has
behaved that way from the beginning. But yes, I'm all in to find a
common solution.

> >
> > However, we've added the memory map thread in this version, so this
> > might not be a problem anymore. We could move the spawn of the thread
> > to initialization time.
> >
> > But how to undo this pinning in the case the guest does not start the
> > device? In this series, this is done at the destination with
> > vhost_vdpa_load_cleanup. Or is it ok to just keep the memory mapped as
> > long as QEMU has the vDPA device?
>
> I think even if vDPA decides to use a thread, we should keep the same
> behavior before/after the migration.  Having assymetric behavior over DMA
> from the assigned HWs might have unpredictable implications.
>
> What I worry is we may over-optimize / over-engineer the case where the
> user will specify the vDPA device but not use it, as I mentioned above.
>

I agree with all of the above. If it is ok to keep memory mapped while
the guest has not started I think we can move the spawn of the thread,
or even just the map write itself, to the vdpa init.

> For the long term, maybe there's chance to optimize DMA pinning for both
> vdpa/vfio use cases, then we can always pin them during VM starts? Assuming
> that issue only exists for large VMs, while they should normally be good
> candidates for huge pages already.  Then, it means maybe one folio/page can
> cover a large range (e.g. 1G on x86_64) in one pin, and physical continuity
> also provides possibility of IOMMU large page mappings.  I didn't check at
> which stage we are for VFIO on this, Alex may know better.

Sounds interesting, and I think it should be implemented. Thanks for
the pointer!

> I'm copying Alex
> anyway since the problem seems to be a common one already, so maybe he has
> some thoughts.
>

Appreciated :).

Thanks!


Reply via email to