On Thu, May 18, 2023 at 10:16:24AM -0400, Peter Xu wrote:

> What you mentioned above makes sense to me from the POV that 1 vIOMMU may
> not suffice, but that's at least totally new area to me because I never
> used >1 IOMMUs even bare metal (excluding the case where I'm aware that
> e.g. a GPU could have its own IOMMU-like dma translator).

Even x86 systems are multi-iommu, one iommu per physical CPU socket.

I'm not sure how they model this though - Kevin do you know? Do we get
multiple iommu instances in Linux or is all the broadcasting of
invalidates and sharing of tables hidden?

> What's the system layout of your multi-vIOMMU world?  Is there still a
> centric vIOMMU, or multi-vIOMMUs can run fully in parallel, so that e.g. we
> can have DEV1,DEV2 under vIOMMU1 and DEV3,DEV4 under vIOMMU2?

Just like physical, each viommu is parallel and independent. Each has
its own caches, ASIDs, DIDs/etc and thus invalidation domains.

The seperated caches is the motivating reason to do this as something
like vCMDQ is a direct command channel for invalidations to only the
caches of a single IOMMU block.

> Is it a common hardware layout or nVidia specific?

I think it is pretty normal, you have multiple copies of the IOMMU and
its caches for physical reasons.

The only choice is if the platform HW somehow routes invalidations to
all IOMMUs or requires SW to route/replicate invalidates.

ARM's IP seems to be designed toward the latter so I expect it is
going to be common on ARM.

Jason

Reply via email to