Re: [PATCH v4 0/7] Move memory listener register to vhost_vdpa_init

Markus Armbruster Mon, 02 Jun 2025 01:09:26 -0700

Si-Wei Liu <si-wei....@oracle.com> writes:

> On 5/26/2025 2:16 AM, Markus Armbruster wrote:
>> Si-Wei Liu <si-wei....@oracle.com> writes:
>>
>>> On 5/15/2025 11:40 PM, Markus Armbruster wrote:
>>>> Jason Wang <jasow...@redhat.com> writes:
>>>>
>>>>> On Thu, May 8, 2025 at 2:47 AM Jonah Palmer <jonah.pal...@oracle.com> 
>>>>> wrote:
>>>>>> Current memory operations like pinning may take a lot of time at the
>>>>>> destination.  Currently they are done after the source of the migration 
>>>>>> is
>>>>>> stopped, and before the workload is resumed at the destination.  This is 
>>>>>> a
>>>>>> period where neigher traffic can flow, nor the VM workload can continue
>>>>>> (downtime).
>>>>>>
>>>>>> We can do better as we know the memory layout of the guest RAM at the
>>>>>> destination from the moment that all devices are initializaed.  So
>>>>>> moving that operation allows QEMU to communicate the kernel the maps
>>>>>> while the workload is still running in the source, so Linux can start
>>>>>> mapping them.
>>>>>>
>>>>>> As a small drawback, there is a time in the initialization where QEMU
>>>>>> cannot respond to QMP etc.  By some testing, this time is about
>>>>>> 0.2seconds.
>>>>> Adding Markus to see if this is a real problem or not.
>>>> I guess the answer is "depends", and to get a more useful one, we need
>>>> more information.
>>>>
>>>> When all you care is time from executing qemu-system-FOO to guest
>>>> finish booting, and the guest takes 10s to boot, then an extra 0.2s
>>>> won't matter much.
>>>
>>> There's no such delay of an extra 0.2s or higher per se, it's just shifting 
>>> around the page pinning hiccup, no matter it is 0.2s or something else, 
>>> from the time of guest booting up to before guest is booted. This saves 
>>> back guest boot time or start up delay, but in turn the same delay 
>>> effectively will be charged to VM launch time. We follow the same model 
>>> with VFIO, which would see the same hiccup during launch (at an early stage 
>>> where no real mgmt software would care about).
>>>
>>>> When a management application runs qemu-system-FOO several times to
>>>> probe its capabilities via QMP, then even milliseconds can hurt.
>>>>
>>> Not something like that, this page pinning hiccup is one time only that 
>>> occurs in the very early stage when launching QEMU, i.e. there's no 
>>> consistent delay every time when QMP is called. The delay in QMP response 
>>> at that very point depends on how much memory the VM has, but this is just 
>>> specif to VM with VFIO or vDPA devices that have to pin memory for DMA. 
>>> Having said, there's no extra delay at all if QEMU args has no vDPA device 
>>> assignment, on the other hand, there's same delay or QMP hiccup when VFIO 
>>> is around in QEMU args.
>>>
>>>> In what scenarios exactly is QMP delayed?
>>>
>>> Having said, this is not a new problem to QEMU in particular, this QMP 
>>> delay is not peculiar, it's existent on VFIO as well.
>>
>> In what scenarios exactly is QMP delayed compared to before the patch?
>
> The page pinning process now runs in a pretty early phase at
> qemu_init() e.g. machine_run_board_init(),


It runs within

    qemu_init()
        qmp_x_exit_preconfig()
            qemu_init_board()
                machine_run_board_init()

Except when --preconfig is given, it instead runs within QMP command
x-exit-preconfig.

Correct?

> before any QMP command can be serviced, the latter of which typically
> would be able to get run from qemu_main_loop() until the AIO gets
> chance to be started to get polled and dispatched to bh.

We create the QMP monitor within qemu_create_late_backends(), which runs
before qmp_x_exit_preconfig(), but commands get processed only in the
main loop, which we enter later.

Correct?

> Technically it's not a real delay for specific QMP command, but rather
> an extended span of initialization process may take place before the
> very first QMP request, usually qmp_capabilities, will be
> serviced. It's natural for mgmt software to expect initialization
> delay for the first qmp_capabilities response if it has to immediately
> issue one after launching qemu, especially when you have a large guest
> with hundred GBs of memory and with passthrough device that has to pin
> memory for DMA e.g. VFIO, the delayed effect from the QEMU
> initialization process is very visible too.



>                                             On the other hand, before
> the patch, if memory happens to be in the middle of being pinned, any
> ongoing QMP can't be serviced by the QEMU main loop, either.
>
> I'd also like to highlight that without this patch, the pretty high
> delay due to page pinning is even visible to the guest in addition to
> just QMP delay, which largely affected guest boot time with vDPA
> device already. It is long standing, and every VM user with vDPA
> device would like to avoid such high delay for the first boot, which
> is not seen with similar device e.g. VFIO passthrough.
>
>>
>>> Thanks,
>>> -Siwei
>>>
>>>> You told us an absolute delay you observed.  What's the relative delay,
>>>> i.e. what's the delay with and without these patches?
>>
>> Can you answer this question?
>
> I thought I already got that answered in earlier reply. The relative
> delay is subject to the size of memory. Usually mgmt software won't be
> able to notice, unless the guest has more than 100GB of THP memory to
> pin, for DMA or whatever reason.
>
>
>>
>>>> We need QMP to become available earlier in the startup sequence for
>>>> other reasons.  Could we bypass the delay that way?  Please understand
>>>> that this would likely be quite difficult: we know from experience that
>>>> messing with the startup sequence is prone to introduce subtle
>>>> compatility breaks and even bugs.
>>>>
>>>>> (I remember VFIO has some optimization in the speed of the pinning,
>>>>> could vDPA do the same?)
>>>>
>>>> That's well outside my bailiwick :)
>
> Please be understood that any possible optimization is out of scope of
> this patch series, while there's certainly way around that already and
> to be carry out in the future, as Peter alluded to in earlier
> discussion thread:
>
> https://lore.kernel.org/qemu-devel/ZZT7wuq-_IhfN_wR@x1n/
> https://lore.kernel.org/qemu-devel/ZZZUNsOVxxqr-H5S@x1n/
>
> Thanks,
> -Siwei
>
>>>>
>>>> [...]
>>>>

Re: [PATCH v4 0/7] Move memory listener register to vhost_vdpa_init

Reply via email to