Re: [PATCH] vfio: Support host translation granule size

Keqian Zhu Wed, 10 Mar 2021 17:25:27 -0800

Hi Alex,

On 2021/3/11 4:24, Alex Williamson wrote:
> On Wed, 10 Mar 2021 15:19:33 +0800
> Kunkun Jiang <jiangkun...@huawei.com> wrote:
> 
>> Hi Alex,
>>
>> On 2021/3/10 7:17, Alex Williamson wrote:
>>> On Thu, 4 Mar 2021 21:34:46 +0800
>>> Kunkun Jiang <jiangkun...@huawei.com> wrote:
>>>  
>>>> The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with
>>>> the dirty pages of memory by bitmap-traveling, regardless of whether
>>>> the bitmap is aligned correctly or not.
>>>>
>>>> cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of
>>>> host page size. So it'd better to set bitmap_pgsize to host page size
>>>> to support more translation granule sizes.
>>>>
>>>> Fixes: 87ea529c502 (vfio: Get migration capability flags for container)
>>>> Signed-off-by: Kunkun Jiang <jiangkun...@huawei.com>
>>>> ---
>>>>   hw/vfio/common.c | 44 ++++++++++++++++++++++----------------------
>>>>   1 file changed, 22 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 6ff1daa763..69fb5083a4 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -378,7 +378,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer 
>>>> *container,
>>>>   {
>>>>       struct vfio_iommu_type1_dma_unmap *unmap;
>>>>       struct vfio_bitmap *bitmap;
>>>> -    uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS;
>>>> +    uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / 
>>>> qemu_real_host_page_size;
>>>>       int ret;
>>>>   
>>>>       unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
>>>> @@ -390,12 +390,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer 
>>>> *container,
>>>>       bitmap = (struct vfio_bitmap *)&unmap->data;
>>>>   
>>>>       /*
>>>> -     * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
>>>> -     * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to
>>>> -     * TARGET_PAGE_SIZE.
>>>> +     * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap 
>>>> of
>>>> +     * qemu_real_host_page_size to mark those dirty. Hence set 
>>>> bitmap_pgsize
>>>> +     * to qemu_real_host_page_size.  
>>>
>>> I don't see that this change is well supported by the code,
>>> cpu_physical_memory_set_dirty_lebitmap() seems to operate on  
>> Yes, cpu_physical_memory_set_dirty_lebitmap() is finally to operate on
>> TARGET_PAGE_SIZE. But actually it supports pages in bitmap of
>> qemu_real_host_page_size to mark those dirty. It uses
>> "hpratio = qemu_real_host_page_size / TARGET_PAGE_SIZE" to adapt to
>> different translation granule size(e.g. 4K 2M 1G).
> 
> Thanks for the explanation, it becomes more clear but I'm still a
> little confused.  It appears that
> cpu_physical_memory_set_dirty_lebitmap() requires a bitmap in terms of
> qemu_real_host_page_size which is translated to target pages using
> hpratio.  It's not clear to me by the explanation here and in the
> commit log that we're technically using the wrong page size reference
> for this function.
> 
>>> TARGET_PAGE_SIZE, and the next three patch chunks take a detour through
>>> memory listener code that seem unrelated to the change described in the
>>> commit log.  This claims to fix something, what is actually broken?
>>> Thanks,
>>>
>>> Alex  
>> This patch 87ea529c502 (vfio: Get migration capability flags for container)
>> is the start of the bug. The code of [1](marked below) restricts the host
>> page size must be TARGET_PAGE_SIZE(e.g. 4K) to set
>> container->dirty_pages_supported = true. It is inappropriate to limit the
>> page size to TARGET_PAGE_SIZE.
> 
> Right, the noted code requires that vfio supports the target page size,
> which for all practical purposes implies an hpratio = 1 currently.
> That unnecessarily restricts migration support to cases where target and
> host use the same page size, but this change allows migration in any
> case where vfio dirty bitmap supports the host page size, which is
> effectively any case where the device supports migration.  However, the
> hpratio calculation requires that qemu_real_host_page_size is >=
> TARGET_PAGE_SIZE, otherwise the value becomes zero and it appears we'd
> only ever dirty the target zero page.  Is this configuration restricted
> elsewhere, ex. 64K guest on 4K host?  Exchanging an unnecessary
> restriction for allowing a buggy configuration isn't a good trade-off.
> Thanks,
FYI, The restriction that (qemu_real_host_page_size is >= TARGET_PAGE_SIZE) has 
already
existed. As in the kvm_init(), we have an assert: assert(TARGET_PAGE_SIZE <= 
qemu_real_host_page_size);


As I understand, the TARGET_PAGE_SIZE is an architecture specific value,
and not affected by the page size of guest OS. For arm64, it is fixed to be
12 bit(4K), while the qemu_real_host_page_size depends on host kernel 
configuration,
it can be 4K, 16K or 64K.

IIUC, the kernel vfio/iommu_type1 actually reports supported page_size of dirty 
log
to be host_page_size, so if host use 16K or 64K, Qemu will refuse to support 
vfio migration.

Thanks,
Keqian



> 
> Alex
> 
>>>>        */
>>>>   
>>>> -    bitmap->pgsize = TARGET_PAGE_SIZE;
>>>> +    bitmap->pgsize = qemu_real_host_page_size;
>>>>       bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
>>>>                      BITS_PER_BYTE;
>>>>   
>>>> @@ -674,16 +674,16 @@ static void vfio_listener_region_add(MemoryListener 
>>>> *listener,
>>>>           return;
>>>>       }
>>>>   
>>>> -    if (unlikely((section->offset_within_address_space & 
>>>> ~TARGET_PAGE_MASK) !=
>>>> -                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
>>>> +    if (unlikely((section->offset_within_address_space & 
>>>> ~qemu_real_host_page_mask) !=
>>>> +                 (section->offset_within_region & 
>>>> ~qemu_real_host_page_mask))) {
>>>>           error_report("%s received unaligned region", __func__);
>>>>           return;
>>>>       }
>>>>   
>>>> -    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
>>>> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
>>>>       llend = int128_make64(section->offset_within_address_space);
>>>>       llend = int128_add(llend, section->size);
>>>> -    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
>>>> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask));
>>>>   
>>>>       if (int128_ge(int128_make64(iova), llend)) {
>>>>           return;
>>>> @@ -892,8 +892,8 @@ static void vfio_listener_region_del(MemoryListener 
>>>> *listener,
>>>>           return;
>>>>       }
>>>>   
>>>> -    if (unlikely((section->offset_within_address_space & 
>>>> ~TARGET_PAGE_MASK) !=
>>>> -                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
>>>> +    if (unlikely((section->offset_within_address_space & 
>>>> ~qemu_real_host_page_mask) !=
>>>> +                 (section->offset_within_region & 
>>>> ~qemu_real_host_page_mask))) {
>>>>           error_report("%s received unaligned region", __func__);
>>>>           return;
>>>>       }
>>>> @@ -921,10 +921,10 @@ static void vfio_listener_region_del(MemoryListener 
>>>> *listener,
>>>>            */
>>>>       }
>>>>   
>>>> -    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
>>>> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
>>>>       llend = int128_make64(section->offset_within_address_space);
>>>>       llend = int128_add(llend, section->size);
>>>> -    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
>>>> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask));
>>>>   
>>>>       if (int128_ge(int128_make64(iova), llend)) {
>>>>           return;
>>>> @@ -1004,13 +1004,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer 
>>>> *container, uint64_t iova,
>>>>       range->size = size;
>>>>   
>>>>       /*
>>>> -     * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
>>>> -     * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to
>>>> -     * TARGET_PAGE_SIZE.
>>>> +     * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap 
>>>> of
>>>> +     * qemu_real_host_page_size to mark those dirty. Hence set bitmap's 
>>>> pgsize
>>>> +     * to qemu_real_host_page_size.
>>>>        */
>>>> -    range->bitmap.pgsize = TARGET_PAGE_SIZE;
>>>> +    range->bitmap.pgsize = qemu_real_host_page_size;
>>>>   
>>>> -    pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS;
>>>> +    pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size;
>>>>       range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
>>>>                                            BITS_PER_BYTE;
>>>>       range->bitmap.data = g_try_malloc0(range->bitmap.size);
>>>> @@ -1114,7 +1114,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer 
>>>> *container,
>>>>                  section->offset_within_region;
>>>>   
>>>>       return vfio_get_dirty_bitmap(container,
>>>> -                       
>>>> TARGET_PAGE_ALIGN(section->offset_within_address_space),
>>>> +                       
>>>> REAL_HOST_PAGE_ALIGN(section->offset_within_address_space),
>>>>                          int128_get64(section->size), ram_addr);
>>>>   }
>>>>   
>>>> @@ -1655,10 +1655,10 @@ static void 
>>>> vfio_get_iommu_info_migration(VFIOContainer *container,
>>>>                               header);
>>>>   
>>>>       /*
>>>> -     * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
>>>> -     * TARGET_PAGE_SIZE to mark those dirty.
>>>> +     * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap 
>>>> of
>>>> +     * qemu_real_host_page_size to mark those dirty.
>>>>        */
>>>> -    if (cap_mig->pgsize_bitmap & TARGET_PAGE_SIZE) {
>>>> +    if (cap_mig->pgsize_bitmap & qemu_real_host_page_size) {  
>> [1]
>>>>           container->dirty_pages_supported = true;
>>>>           container->max_dirty_bitmap_size = 
>>>> cap_mig->max_dirty_bitmap_size;
>>>>           container->dirty_pgsizes = cap_mig->pgsize_bitmap;  
>>> .  
>>
>>
> 
> .
>

Re: [PATCH] vfio: Support host translation granule size

Reply via email to