Hi Eric, > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: Thursday, October 18, 2018 11:16 PM > > Hi Yi, > > On 10/18/18 12:30 PM, Liu, Yi L wrote: > > Hi Eric, > > > >> From: Eric Auger [mailto:eric.au...@redhat.com] > >> Sent: Friday, September 21, 2018 4:18 PM > >> Subject: [RFC v2 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration > >> > >> Up to now vSMMUv3 has not been integrated with VFIO. VFIO integration > >> requires to program the physical IOMMU consistently with the guest > >> mappings. However, as opposed to VTD, SMMUv3 has no "Caching Mode" > >> which allows easy trapping of guest mappings. > >> This means the vSMMUV3 cannot use the same VFIO integration as VTD. > >> > >> However SMMUv3 has 2 translation stages. This was devised with > >> virtualization use case in mind where stage 1 is "owned" by the guest > >> whereas the host uses stage 2 for VM isolation. > >> > >> This series sets up this nested translation stage. It only works if > >> there is one physical SMMUv3 used along with QEMU vSMMUv3 (in other > >> words, it does not work if there is a physical SMMUv2). > >> > >> The series uses a new kernel user API [1], still under definition. > >> > >> - We force the host to use stage 2 instead of stage 1, when we > >> detect a vSMMUV3 is behind a VFIO device. For a VFIO device > >> without any virtual IOMMU, we still use stage 1 as many existing > >> SMMUs expect this behavior. > >> - We introduce new IOTLB "config" notifiers, requested to notify > >> changes in the config of a given iommu memory region. So now > >> we have notifiers for IOTLB changes and config changes. > >> - vSMMUv3 calls config notifiers when STE (Stream Table Entries) > >> are updated by the guest. > >> - We implement a specific UNMAP notifier that conveys guest > >> IOTLB invalidations to the host > >> - We implement a new MAP notifiers only used for MSI IOVAs so > >> that the host can build a nested stage translation for MSI IOVAs > >> - As the legacy MAP notifier is not called anymore, we must make > >> sure stage 2 mappings are set. This is achieved through another > >> memory listener. > >> - Physical SMMUs faults are reported to the guest via en eventfd > >> mechanism and reinjected into this latter. > >> > >> Note: some iommu memory notifier rework related patches are close to > >> those previously published by Peter and Liu. I will be pleased to add > >> their Signed-off-by if they agree/wish. > > > > Yeah, feel free to add mine if it's originated from our previous work. > OK > > > > BTW. Curiously, are you planning to implement all vIOMMU related > > operations through MemoryRegion notifiers? Honestly, I did it in such > > way in early RFC for vSVA work. However, we encountered two issues at > > that time. First, how to check whether the notifier should be registered. > On my side I think I resolved this by querying the iommu mr about the new > IOMMU_ATTR_VFIO_NESTED IOMMU attribute in vfio_connect_container
So it's to detect it by checking iommu mr? I think it is also similar to my early RFC. https://patchwork.kernel.org/patch/9701003/ + /* Check if vIOMMU exists */ + QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) { + if (memory_region_is_iommu(subregion)) { + IOMMUNotifier n1; + + /* + FIXME: current iommu notifier is actually designed for + IOMMUTLB MAP/UNMAP. However, vIOMMU emulator may need + notifiers other than MAP/UNMAP, so it'll be better to + split the non-IOMMUTLB notifier from the current IOMMUTLB + notifier framewrok. + */ + iommu_notifier_init(&n1, vfio_iommu_bind_pasid_tbl_notify, + IOMMU_NOTIFIER_SVM_PASIDT_BIND, + 0, + 0); + vfio_register_notifier(group->container, + subregion, + 0, + &n1); + } + } For VT-d, we only have 1 iommu mr. But I was told ther may be multiple iommu mr on other platform. So I switched to use PCISVAOps. Thoughts? > See patches 5 and 6, 8 > > This tells me whether the nested mode must be setup and choose the right > container->iommu_type which is then used in vfio_listener_region_add() > to decide whether the specific notifiers mustd to be registered. > > > Second, there are cases in which the vfio_listener_region_add is not > > triggered but vIOMMU exists. Details can be got by the link below: > > (http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html) > > Yes I remember this, due to the PT=1 case. On ARM we don't have this > specificity > hence this integration. > > > > As thus, we had some discussions in community. Last time, PCIPASIDOps > > was proposed. It is to add callbacks in PCIDevice. VFIO would register > > its implementations in vfio_realize(). Supposedly, pasid_table_bind, > > page_table_bind, sva_tlb_invalidation_passdown and other vIOMMU > > related operations can be done in such way. The sample patch below may > > show how it looks like. (the full patch is in my sandbox, planned to > > send out with Scalable Mode emulation patch). > > To be honest, I lost track of the series and did not see this PCIPASIDOps > proposal. I > will study whether this can fit my needs. It's proposed in below link. The name at that time is PCISVAOps. I made it to be PCIPASIDOps to be more generic. http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00081.html Regards, Yi Liu