On 03.07.21 04:53, Wang, Wei W wrote:
On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
On 02.07.21 04:48, Wang, Wei W wrote:
On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
On 01.07.21 14:51, Peter Xu wrote:

I think that clearly shows the issue.

My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
Assume the VM reports all pages within a 1GB chunk as free (easy with a fresh
VM). While processing hints, we will clear the bits from the dirty bmap in the
RAMBlock. As we will never migrate any page of that 1GB chunk, we will not
actually clear the dirty bitmap of the memory region. When re-syncing, we will
set all bits bits in the dirty bmap again from the dirty bitmap in the memory
region. Thus, many of our hints end up being mostly ignored. The smaller the
clear bmap chunk, the more extreme the issue.

OK, that looks possible. We need to clear the related bits from the memory 
region
bitmap before skipping the free pages. Could you try with below patch:

I did a quick test (with the memhog example) and it seems like it mostly works. However, we're now doing the bitmap clearing from another, racing with the migration thread. Especially:

1. Racing with clear_bmap_set() via cpu_physical_memory_sync_dirty_bitmap()
2. Racing with migration_bitmap_clear_dirty()

So that might need some thought, if I'm not wrong.

The simplest approach would be removing/freeing the clear_bmap via PRECOPY_NOTIFY_SETUP(), similar to precopy_enable_free_page_optimization() we had before. Of course, this will skip the clear_bmap optimization.

--
Thanks,

David / dhildenb


Reply via email to