Hi Alex, On 2021/3/11 4:24, Alex Williamson wrote: > On Wed, 10 Mar 2021 15:19:33 +0800 > Kunkun Jiang <jiangkun...@huawei.com> wrote: > >> Hi Alex, >> >> On 2021/3/10 7:17, Alex Williamson wrote: >>> On Thu, 4 Mar 2021 21:34:46 +0800 >>> Kunkun Jiang <jiangkun...@huawei.com> wrote: >>> >>>> The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with >>>> the dirty pages of memory by bitmap-traveling, regardless of whether >>>> the bitmap is aligned correctly or not. >>>> >>>> cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >>>> host page size. So it'd better to set bitmap_pgsize to host page size >>>> to support more translation granule sizes. >>>> >>>> Fixes: 87ea529c502 (vfio: Get migration capability flags for container) >>>> Signed-off-by: Kunkun Jiang <jiangkun...@huawei.com> >>>> --- >>>> hw/vfio/common.c | 44 ++++++++++++++++++++++---------------------- >>>> 1 file changed, 22 insertions(+), 22 deletions(-) >>>> >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index 6ff1daa763..69fb5083a4 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -378,7 +378,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer >>>> *container, >>>> { >>>> struct vfio_iommu_type1_dma_unmap *unmap; >>>> struct vfio_bitmap *bitmap; >>>> - uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS; >>>> + uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / >>>> qemu_real_host_page_size; >>>> int ret; >>>> >>>> unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap)); >>>> @@ -390,12 +390,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer >>>> *container, >>>> bitmap = (struct vfio_bitmap *)&unmap->data; >>>> >>>> /* >>>> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >>>> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to >>>> - * TARGET_PAGE_SIZE. >>>> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap >>>> of >>>> + * qemu_real_host_page_size to mark those dirty. Hence set >>>> bitmap_pgsize >>>> + * to qemu_real_host_page_size. >>> >>> I don't see that this change is well supported by the code, >>> cpu_physical_memory_set_dirty_lebitmap() seems to operate on >> Yes, cpu_physical_memory_set_dirty_lebitmap() is finally to operate on >> TARGET_PAGE_SIZE. But actually it supports pages in bitmap of >> qemu_real_host_page_size to mark those dirty. It uses >> "hpratio = qemu_real_host_page_size / TARGET_PAGE_SIZE" to adapt to >> different translation granule size(e.g. 4K 2M 1G). > > Thanks for the explanation, it becomes more clear but I'm still a > little confused. It appears that > cpu_physical_memory_set_dirty_lebitmap() requires a bitmap in terms of > qemu_real_host_page_size which is translated to target pages using > hpratio. It's not clear to me by the explanation here and in the > commit log that we're technically using the wrong page size reference > for this function. > >>> TARGET_PAGE_SIZE, and the next three patch chunks take a detour through >>> memory listener code that seem unrelated to the change described in the >>> commit log. This claims to fix something, what is actually broken? >>> Thanks, >>> >>> Alex >> This patch 87ea529c502 (vfio: Get migration capability flags for container) >> is the start of the bug. The code of [1](marked below) restricts the host >> page size must be TARGET_PAGE_SIZE(e.g. 4K) to set >> container->dirty_pages_supported = true. It is inappropriate to limit the >> page size to TARGET_PAGE_SIZE. > > Right, the noted code requires that vfio supports the target page size, > which for all practical purposes implies an hpratio = 1 currently. > That unnecessarily restricts migration support to cases where target and > host use the same page size, but this change allows migration in any > case where vfio dirty bitmap supports the host page size, which is > effectively any case where the device supports migration. However, the > hpratio calculation requires that qemu_real_host_page_size is >= > TARGET_PAGE_SIZE, otherwise the value becomes zero and it appears we'd > only ever dirty the target zero page. Is this configuration restricted > elsewhere, ex. 64K guest on 4K host? Exchanging an unnecessary > restriction for allowing a buggy configuration isn't a good trade-off. > Thanks, FYI, The restriction that (qemu_real_host_page_size is >= TARGET_PAGE_SIZE) has already existed. As in the kvm_init(), we have an assert: assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size);
As I understand, the TARGET_PAGE_SIZE is an architecture specific value, and not affected by the page size of guest OS. For arm64, it is fixed to be 12 bit(4K), while the qemu_real_host_page_size depends on host kernel configuration, it can be 4K, 16K or 64K. IIUC, the kernel vfio/iommu_type1 actually reports supported page_size of dirty log to be host_page_size, so if host use 16K or 64K, Qemu will refuse to support vfio migration. Thanks, Keqian > > Alex > >>>> */ >>>> >>>> - bitmap->pgsize = TARGET_PAGE_SIZE; >>>> + bitmap->pgsize = qemu_real_host_page_size; >>>> bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / >>>> BITS_PER_BYTE; >>>> >>>> @@ -674,16 +674,16 @@ static void vfio_listener_region_add(MemoryListener >>>> *listener, >>>> return; >>>> } >>>> >>>> - if (unlikely((section->offset_within_address_space & >>>> ~TARGET_PAGE_MASK) != >>>> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { >>>> + if (unlikely((section->offset_within_address_space & >>>> ~qemu_real_host_page_mask) != >>>> + (section->offset_within_region & >>>> ~qemu_real_host_page_mask))) { >>>> error_report("%s received unaligned region", __func__); >>>> return; >>>> } >>>> >>>> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); >>>> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); >>>> llend = int128_make64(section->offset_within_address_space); >>>> llend = int128_add(llend, section->size); >>>> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); >>>> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); >>>> >>>> if (int128_ge(int128_make64(iova), llend)) { >>>> return; >>>> @@ -892,8 +892,8 @@ static void vfio_listener_region_del(MemoryListener >>>> *listener, >>>> return; >>>> } >>>> >>>> - if (unlikely((section->offset_within_address_space & >>>> ~TARGET_PAGE_MASK) != >>>> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { >>>> + if (unlikely((section->offset_within_address_space & >>>> ~qemu_real_host_page_mask) != >>>> + (section->offset_within_region & >>>> ~qemu_real_host_page_mask))) { >>>> error_report("%s received unaligned region", __func__); >>>> return; >>>> } >>>> @@ -921,10 +921,10 @@ static void vfio_listener_region_del(MemoryListener >>>> *listener, >>>> */ >>>> } >>>> >>>> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); >>>> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); >>>> llend = int128_make64(section->offset_within_address_space); >>>> llend = int128_add(llend, section->size); >>>> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); >>>> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); >>>> >>>> if (int128_ge(int128_make64(iova), llend)) { >>>> return; >>>> @@ -1004,13 +1004,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer >>>> *container, uint64_t iova, >>>> range->size = size; >>>> >>>> /* >>>> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >>>> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to >>>> - * TARGET_PAGE_SIZE. >>>> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap >>>> of >>>> + * qemu_real_host_page_size to mark those dirty. Hence set bitmap's >>>> pgsize >>>> + * to qemu_real_host_page_size. >>>> */ >>>> - range->bitmap.pgsize = TARGET_PAGE_SIZE; >>>> + range->bitmap.pgsize = qemu_real_host_page_size; >>>> >>>> - pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS; >>>> + pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size; >>>> range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / >>>> BITS_PER_BYTE; >>>> range->bitmap.data = g_try_malloc0(range->bitmap.size); >>>> @@ -1114,7 +1114,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer >>>> *container, >>>> section->offset_within_region; >>>> >>>> return vfio_get_dirty_bitmap(container, >>>> - >>>> TARGET_PAGE_ALIGN(section->offset_within_address_space), >>>> + >>>> REAL_HOST_PAGE_ALIGN(section->offset_within_address_space), >>>> int128_get64(section->size), ram_addr); >>>> } >>>> >>>> @@ -1655,10 +1655,10 @@ static void >>>> vfio_get_iommu_info_migration(VFIOContainer *container, >>>> header); >>>> >>>> /* >>>> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >>>> - * TARGET_PAGE_SIZE to mark those dirty. >>>> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap >>>> of >>>> + * qemu_real_host_page_size to mark those dirty. >>>> */ >>>> - if (cap_mig->pgsize_bitmap & TARGET_PAGE_SIZE) { >>>> + if (cap_mig->pgsize_bitmap & qemu_real_host_page_size) { >> [1] >>>> container->dirty_pages_supported = true; >>>> container->max_dirty_bitmap_size = >>>> cap_mig->max_dirty_bitmap_size; >>>> container->dirty_pgsizes = cap_mig->pgsize_bitmap; >>> . >> >> > > . >