On 2026-04-21 23:46, Paolo Bonzini wrote:
On Mon, Apr 20, 2026 at 11:52 AM Florian Schmidt <[email protected]> wrote:
Here's another option, though I'm not sure you'll like it: We can get
the same information from /proc/<pid>/pagemap, right? We just have to
check whether bits 62 and 63 are both 0 or not for every page.
It's less efficient, because we have to read 8 bytes per page

It's also completely useless for the very common case of pre-allocated
hugepages. That's the case where you can get the largest benefit,
because when pre-allocating you use threads on the host and do the
zeroing before the VM starts. For non-pre-allocated pages you still
pay the price of double zeroing, but guest and host do it one after
another while the VM is already running.

Yes, that's fair. There's another reason this is dodgy that I had totally forgotten about and only remembered the other week: shared memory's swap state is not tracked properly in /proc/<pid>/pagemap at all, so that's another situation where this won't work correctly. So I agree, that approach is useless.


So I don't think there is any option other than the ioctl. I would
suggest experimenting to understand how Windows uses the hypercall;
and possibly looking at the QEMU write tracking part. The KVM changes
are relatively simple and for quick experimentation you can operate as
if the KVM bitmap is always entirely zero, not unlike this patch.

I will look a bit more at Windows, but I think generally speaking, we can probably not make final and never-changing assertions about when or how often a guest would use this hypercall.


Regarding the QEMU write tracking: I'm not very familiar with this code, so I'm still working my way through it wrapping my mind around it all. But at this point, I wonder whether there are advantages to not using the current dirty tracking wholesale by adding a fourth option (vs implementing something separate). What we care about is slightly different from dirty tracking: we only care about memory that QEMU touched, not about dirty-tracking guest pages via KVM, which is quite tightly coupled with the other dirty tracking approaches.

It means we have to support another tracking mode outside the existing DIRTY_MEMORY_* ones, but on the plus side: it allows us to easily set a different granularity; to not allocate that memory unconditionally at start even if the feature isn't enabled (though once we have the different granularity, the overhead is much smaller); and we could skip any hotplug support logic (having to extend the bitmap) since Windows never enquires for hotplugged memory anyway (though maybe that will change at some point in the future and could be neat to have eventually).

For migration, I'm still thinking through the implications, but one option would be to just say "after a migration, nothing is pre-zeroed anymore".

I'd be interested to hear your opinions on that.

Cheers,
Florian

Reply via email to