RE: [PATCH 4/5] intel_iommu: Optimize unmap_bitmap during migration

Duan, Zhenzhong Sat, 18 Oct 2025 11:40:01 -0700


>-----Original Message-----
>From: Cédric Le Goater <[email protected]>
>Subject: Re: [PATCH 4/5] intel_iommu: Optimize unmap_bitmap during
>migration
>
>On 10/16/25 11:53, Yi Liu wrote:
>> On 2025/10/16 16:48, Duan, Zhenzhong wrote:
>>>
>>>>>>>>>> how about an empty iova_tree? If guest has not mapped anything
>for
>>>>>> the
>>>>>>>>>> device, the tree is empty. And it is fine to not unmap anyting.
>While,
>>>>>>>>>> if the device is attached to an identify domain, the iova_tree is
>empty
>>>>>>>>>> as well. Are we sure that we need not to unmap anything here? It
>>>> looks
>>>>>>>>>> the answer is yes. But I'm suspecting the unmap failure will
>happen in
>>>>>>>>>> the vfio side? If yes, need to consider a complete fix. :)
>>>>>>>>>
>>>>>>>>> Not get what failure will happen, could you elaborate?
>>>>>>>>> In case of identity domain, IOMMU memory region is disabled, no
>>>> iommu
>>>>>>>>> notifier will ever be triggered. vfio_listener monitors memory
>address
>>>>>>>> space,
>>>>>>>>> if any memory region is disabled, vfio_listener will catch it and do
>dirty
>>>>>>>> tracking.
>>>>>>>>
>>>>>>>> My question comes from the reason why DMA unmap fails. It is due
>to
>>>>>>>> a big range is given to kernel while kernel does not support. So if
>>>>>>>> VFIO gives a big range as well, it should fail as well. And this is
>>>>>>>> possible when guest (a VM with large size memory) switches from
>>>> identify
>>>>>>>> domain to a paging domain. In this case, vfio_listener will unmap all
>>>>>>>> the system MRs. And it can be a big range if VM size is big enough.
>>>>>>>
>>>>>>> Got you point. Yes, currently vfio_type1 driver limits unmap_bitmap
>to
>>>> 8TB
>>>>>> size.
>>>>>>> If guest memory is large enough and lead to a memory region of
>more
>>>> than
>>>>>> 8TB size,
>>>>>>> unmap_bitmap will fail. It's a rare case to live migrate VM with more
>than
>>>>>> 8TB memory,
>>>>>>> instead of fixing it in qemu with complex change, I'd suggest to bump
>>>> below
>>>>>> MACRO
>>>>>>> value to enlarge the limit in kernel, or switch to use iommufd which
>>>> doesn't
>>>>>> have such limit.
>>>>>>
>>>>>> This limit shall not affect the usage of device dirty tracking. right?
>>>>>> If yes, add something to tell user use iommufd backend is better. e.g
>>>>>> if memory size is bigger than the limit of vfio iommu type1's dirty
>>>>>> bitmap limit (query cap_mig.max_dirty_bitmap_size), then fail user if
>>>>>> user wants migration capability.
>>>>>
>>>>> Do you mean just dirty tracking instead of migration, like dirtyrate?
>>>>> In that case, there is error print as above, I think that's enough as a 
>>>>> hint?
>>>>
>>>> it's not related to diryrate.
>>>>
>>>>> I guess you mean to add a migration blocker if limit is reached? It's hard
>>>>> because the limit is only helpful for identity domain, DMA domain in
>guest
>>>>> doesn't have such limit, and we can't know guest's choice of domain
>type
>>>>> of each VFIO device attached.
>>>>
>>>> I meant a blocker to boot QEMU if there is limit. something like below:
>>>>
>>>>     if (VM memory > 8TB && legacy_container_backend &&
>>>> migration_enabled)
>>>>         fail the VM boot.
>>>
>>> OK, will add below to vfio_migration_realize() with an extra patch:
>>
>> yeah, let's see Alex and Cedric's feedback.
>>
>>>      if (!vbasedev->iommufd && current_machine->ram_size > 8 * TiB)
>{
>>>          /*
>>>           * The 8TB comes from default kernel and QEMU config, it
>may be
>>>           * conservative here as VM can use large page or run with
>vIOMMU
>>>           * so the limitation may be relaxed. But 8TB is already quite
>>>           * large for live migration. One can also switch to use
>IOMMUFD
>>>           * backend if there is a need to migrate large VM.
>>>           */
>>
>> instead of hard code 8TB. May convert cap_mig.max_dirty_bitmap_size to
>> memory size. :)
>yes. It would reflect better that it's a VFIO dirty tracking limitation.
>
>
>Zhenzhong,
>
>Soft freeze is w45. I plan to send a PR next week, w43, and I will be out
>w44. I will have some (limited) time to address more changes on w45.


Got it, I'll send a new version soon.

Thanks
Zhenzhong

RE: [PATCH 4/5] intel_iommu: Optimize unmap_bitmap during migration

Reply via email to