>-----Original Message----- >From: Cédric Le Goater <[email protected]> >Subject: Re: [PATCH 4/5] intel_iommu: Optimize unmap_bitmap during >migration > >On 10/16/25 11:53, Yi Liu wrote: >> On 2025/10/16 16:48, Duan, Zhenzhong wrote: >>> >>>>>>>>>> how about an empty iova_tree? If guest has not mapped anything >for >>>>>> the >>>>>>>>>> device, the tree is empty. And it is fine to not unmap anyting. >While, >>>>>>>>>> if the device is attached to an identify domain, the iova_tree is >empty >>>>>>>>>> as well. Are we sure that we need not to unmap anything here? It >>>> looks >>>>>>>>>> the answer is yes. But I'm suspecting the unmap failure will >happen in >>>>>>>>>> the vfio side? If yes, need to consider a complete fix. :) >>>>>>>>> >>>>>>>>> Not get what failure will happen, could you elaborate? >>>>>>>>> In case of identity domain, IOMMU memory region is disabled, no >>>> iommu >>>>>>>>> notifier will ever be triggered. vfio_listener monitors memory >address >>>>>>>> space, >>>>>>>>> if any memory region is disabled, vfio_listener will catch it and do >dirty >>>>>>>> tracking. >>>>>>>> >>>>>>>> My question comes from the reason why DMA unmap fails. It is due >to >>>>>>>> a big range is given to kernel while kernel does not support. So if >>>>>>>> VFIO gives a big range as well, it should fail as well. And this is >>>>>>>> possible when guest (a VM with large size memory) switches from >>>> identify >>>>>>>> domain to a paging domain. In this case, vfio_listener will unmap all >>>>>>>> the system MRs. And it can be a big range if VM size is big enough. >>>>>>> >>>>>>> Got you point. Yes, currently vfio_type1 driver limits unmap_bitmap >to >>>> 8TB >>>>>> size. >>>>>>> If guest memory is large enough and lead to a memory region of >more >>>> than >>>>>> 8TB size, >>>>>>> unmap_bitmap will fail. It's a rare case to live migrate VM with more >than >>>>>> 8TB memory, >>>>>>> instead of fixing it in qemu with complex change, I'd suggest to bump >>>> below >>>>>> MACRO >>>>>>> value to enlarge the limit in kernel, or switch to use iommufd which >>>> doesn't >>>>>> have such limit. >>>>>> >>>>>> This limit shall not affect the usage of device dirty tracking. right? >>>>>> If yes, add something to tell user use iommufd backend is better. e.g >>>>>> if memory size is bigger than the limit of vfio iommu type1's dirty >>>>>> bitmap limit (query cap_mig.max_dirty_bitmap_size), then fail user if >>>>>> user wants migration capability. >>>>> >>>>> Do you mean just dirty tracking instead of migration, like dirtyrate? >>>>> In that case, there is error print as above, I think that's enough as a >>>>> hint? >>>> >>>> it's not related to diryrate. >>>> >>>>> I guess you mean to add a migration blocker if limit is reached? It's hard >>>>> because the limit is only helpful for identity domain, DMA domain in >guest >>>>> doesn't have such limit, and we can't know guest's choice of domain >type >>>>> of each VFIO device attached. >>>> >>>> I meant a blocker to boot QEMU if there is limit. something like below: >>>> >>>> if (VM memory > 8TB && legacy_container_backend && >>>> migration_enabled) >>>> fail the VM boot. >>> >>> OK, will add below to vfio_migration_realize() with an extra patch: >> >> yeah, let's see Alex and Cedric's feedback. >> >>> if (!vbasedev->iommufd && current_machine->ram_size > 8 * TiB) >{ >>> /* >>> * The 8TB comes from default kernel and QEMU config, it >may be >>> * conservative here as VM can use large page or run with >vIOMMU >>> * so the limitation may be relaxed. But 8TB is already quite >>> * large for live migration. One can also switch to use >IOMMUFD >>> * backend if there is a need to migrate large VM. >>> */ >> >> instead of hard code 8TB. May convert cap_mig.max_dirty_bitmap_size to >> memory size. :) >yes. It would reflect better that it's a VFIO dirty tracking limitation. > > >Zhenzhong, > >Soft freeze is w45. I plan to send a PR next week, w43, and I will be out >w44. I will have some (limited) time to address more changes on w45.
Got it, I'll send a new version soon. Thanks Zhenzhong
