On Fri, Jan 14, 2022 at 11:31 AM Peter Xu <pet...@redhat.com> wrote: > > On Fri, Jan 14, 2022 at 10:47:44AM +0800, Jason Wang wrote: > > > > 在 2022/1/13 下午1:06, Peter Xu 写道: > > > On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote: > > > > @@ -1725,11 +1780,16 @@ static bool > > > > vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus, > > > > cc_entry->context_cache_gen = s->context_cache_gen; > > > > } > > > > + /* Try to fetch slpte form IOTLB */ > > > > + if ((pasid == PCI_NO_PASID) && s->root_scalable) { > > > > + pasid = VTD_CE_GET_RID2PASID(&ce); > > > > + } > > > > + > > > > /* > > > > * We don't need to translate for pass-through context entries. > > > > * Also, let's ignore IOTLB caching as well for PT devices. > > > > */ > > > > - if (vtd_dev_pt_enabled(s, &ce)) { > > > > + if (vtd_dev_pt_enabled(s, &ce, pasid)) { > > > > entry->iova = addr & VTD_PAGE_MASK_4K; > > > > entry->translated_addr = entry->iova; > > > > entry->addr_mask = ~VTD_PAGE_MASK_4K; > > > > @@ -1750,14 +1810,24 @@ static bool > > > > vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus, > > > > return true; > > > > } > > > > + iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid); > > > > + if (iotlb_entry) { > > > > + trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte, > > > > + iotlb_entry->domain_id); > > > > + slpte = iotlb_entry->slpte; > > > > + access_flags = iotlb_entry->access_flags; > > > > + page_mask = iotlb_entry->mask; > > > > + goto out; > > > > + } > > > IIUC the iotlb lookup moved down just because the pasid==NO_PASID case > > > then > > > we'll need to fetch the default pasid from the context entry. That looks > > > reasonable. > > > > > > It's just a bit of pity because logically it'll slow down iotlb hits due > > > to > > > context entry operations. When NO_PASID we could have looked up iotlb > > > without > > > checking pasid at all, assuming that "default pasid" will always match. > > > But > > > that is a little bit hacky. > > > > > > Right, but I think you meant to do this only when scalable mode is disabled. > > Yes IMHO it will definitely suite for !scalable case since that's exactly what > we did before. What I'm also wondering is even if scalable is enabled but no > "real" pasid is used, so if all the translations go through the default pasid > that stored in the device context entry, then maybe we can ignore checking it. > The latter is the "hacky" part mentioned above.
The problem I see is that we can't know what PASID is used as default without reading the context entry? > > The other thing to mention is, if we postpone the iotlb lookup to be after > context entry, then logically we can have per-device iotlb, that means we can > replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too, > which can also be more efficient. Right but we still need to limit the total slots and ATS is a better way to deal with the IOTLB bottleneck actually. > > Not sure whether Michael will have a preference, for me I think either way can > be done on top. > > > > > > > > > > > vIOMMU seems to be mostly used for assigned devices and dpdk in > > > production in > > > the future due to its slowness otherwise.. so maybe not a big deal at all. > > > > > > [...] > > > > > > > @@ -2011,7 +2083,52 @@ static void > > > > vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, > > > > vtd_iommu_lock(s); > > > > g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, > > > > &info); > > > > vtd_iommu_unlock(s); > > > > - vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am); > > > > + vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, > > > > PCI_NO_PASID); > > > > +} > > > > + > > > > +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s, > > > > + uint16_t domain_id, > > > > + hwaddr addr, uint8_t am, > > > > + uint32_t pasid) > > > > +{ > > > > + VTDIOTLBPageInvInfo info; > > > > + > > > > + trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid); > > > > + > > > > + assert(am <= VTD_MAMV); > > > > + info.domain_id = domain_id; > > > > + info.addr = addr; > > > > + info.mask = ~((1 << am) - 1); > > > > + info.pasid = pasid; > > > > + vtd_iommu_lock(s); > > > > + g_hash_table_foreach_remove(s->iotlb, > > > > vtd_hash_remove_by_page_pasid, &info); > > > > + vtd_iommu_unlock(s); > > > > + vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid); > > > Hmm, I think indeed we need a notification, but it'll be unnecessary for > > > e.g. vfio map notifiers, because this is 1st level invalidation and at > > > least so > > > far vfio map notifiers are rewalking only the 2nd level page table, so > > > it'll be > > > destined to be a no-op and pure overhead. > > > > > > Right, consider we don't implement l1 and we don't have a 1st level > > abstraction in neither vhost nor vfio, we can simply remove this. > > We probably still need the real pasid invalidation parts in the future? Yes. > Either > for vhost (if vhost will going to cache pasid-based translations), or for > compatible assigned devices in the future where the HW can cache it. Vhost has the plan to support ASID here: https://patchwork.kernel.org/project/kvm/patch/20201216064818.48239-11-jasow...@redhat.com/#23866593 > > I'm not sure what's the best way to do this, yet. Perhaps adding a new field > to > vtd_iotlb_page_invalidate_notify() telling whether this is pasid-based or not > (basically, an invalidation for 1st or 2nd level pgtable)? AFAIK there's no L1 in the abstraction for device IOTLB but a combined translation result from IVOA-GPA > Then if it is > pasid-based, we could opt-out for the shadow page walking. > > But as you mentioned we could also postpone it to the future. Your call. :-) Right, I tend to defer it otherwise there seems no way to test this. Thanks > > Thanks, > > > > > > > > > > > > +} > > > > + > > > > +static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t > > > > domain_id, > > > > + uint32_t pasid) > > > > +{ > > > > + VTDIOTLBPageInvInfo info; > > > > + VTDAddressSpace *vtd_as; > > > > + VTDContextEntry ce; > > > > + > > > > + trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid); > > > > + > > > > + info.domain_id = domain_id; > > > > + info.pasid = pasid; > > > > + vtd_iommu_lock(s); > > > > + g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, > > > > &info); > > > > + vtd_iommu_unlock(s); > > > > + > > > > + QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) { > > > > + if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus), > > > > + vtd_as->devfn, &ce) && > > > > + domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) && > > > > + pasid == vtd_as->pasid) { > > > > + vtd_sync_shadow_page_table(vtd_as); > > > Do we need to rewalk the shadow pgtable (which is the 2nd level, afaict) > > > even > > > if we got the 1st level pgtable invalidated? > > > > > > Seems not and this makes me think to remove the whole PASID based > > invalidation logic since they are for L1 which is not implemented in this > > series. > > -- > Peter Xu >