On 08/05/19 01:15, Peter Xu wrote: > Design > =================== > > I started with a naive/stupid design that I always pass all 1's to the > KVM for a memory range to clear all the dirty bits within that memory > range, but then I encountered guest oops - it's simply because we > can't clear any dirty bit from QEMU if we are not _sure_ that the bit > is dirty in the kernel. Otherwise we might accidentally clear a bit > that we don't even know of (e.g., the bit was clear in migration's > dirty bitmap in QEMU) but actually that page was just being written so > QEMU will never remember to migrate that new page again. > > The new design is focused on a dirty bitmap cache within the QEMU kvm > layer (which is per kvm memory slot). With that we know what's dirty > in the kernel previously (note! the kernel bitmap is still growing all > the time so the cache will only be a subset of the realtime kernel > bitmap but that's far enough for us) and with that we'll be sure to > not accidentally clear unknown dirty pages. > > With this method, we can also avoid race when multiple users (e.g., > DIRTY_MEMORY_VGA and DIRTY_MEMORY_MIGRATION) want to clear the bit for > multiple time. If without the kvm memory slot cached dirty bitmap we > won't be able to know which bit has been cleared and then if we send > the CLEAR operation upon the same bit twice (or more) we can still > face the same issue to clear something accidentally while we > shouldn't. > > Summary: we really need to be careful on what bit to clear otherwise > we can face anything after the migration completes. And I hope this > series has considered all about this.
The disadvantage of this is that you won't clear in the kernel those dirty bits that come from other sources (e.g. vhost or address_space_map). This can lead to double-copying of pages. Migration already makes a local copy in rb->bmap, and memory_region_snapshot_and_clear_dirty can also do the clear. Would it be possible to invoke the clear using rb->bmap instead of the KVMSlot's new bitmap? Thanks, Paolo > Besides the new KVM cache layer and the new ioctl support, this series > introduced the memory_region_clear_dirty_bitmap() in the memory API > layer to allow clearing dirty bits of a specific memory range within > the memory region.