Hi Shameer, On 5/22/26 12:01 PM, Shameer Kolothum Thodi wrote: >>> On 5/19/26 12:36 PM, Shameer Kolothum wrote: >>> Actually is the whole host passthrough principle is not really explained >>> anywhere. At least that's my feeling. It would be nice to have a summary >>> for it, in the coverletter and in individual patch. Or maybe we can link >>> to another doc. Reading the kernel uapi does not really provide the full >>> picture, at least that's my own feeling. >> Fair point. A summary of operation is useful. I am thinking of adding >> it at the top of hw/arm/tegra241-cmdqv.c: >> >> /* >> * Tegra241 CMDQV - overview >> * --------------------------------------- >> *... >> */ >> >> I will populate the details and share it for review before v6. > Please find below. Hopefully, I have captured all the important details. > Please take a look and let me know. > > Thanks, > Shameer > > diff --git a/hw/arm/tegra241-cmdqv.c b/hw/arm/tegra241-cmdqv.c > index ebf12d0597..5a103f37b8 100644 > --- a/hw/arm/tegra241-cmdqv.c > +++ b/hw/arm/tegra241-cmdqv.c > @@ -7,6 +7,84 @@ > * SPDX-License-Identifier: GPL-2.0-or-later > */ > > +/* > + * Tegra241 CMDQV - overview > + * ========================= > + * > + * NVIDIA Tegra241 extends SMMUv3 with a Command Queue Virtualization > (CMDQ-V) > + * block. It lets a guest issue SMMU invalidation commands directly to > + * dedicated hardware queues (vCMDQs) without trapping into the hypervisor on > + * the fast path. vCMDQs are grouped into Virtual Interfaces (VINTFs); the > + * host kernel allocates one VINTF per emulated SMMUv3 instance via iommufd. > + * QEMU emulates the CMDQV MMIO region and drives the host kernel calls > + * (VIOMMU_ALLOC, HW_QUEUE_ALLOC, mmap); the actual command processing > happens > + * on real hardware. > + * > + * MMIO layout (64KB pages, total TEGRA241_CMDQV_IO_LEN) > + * ----------------------------------------------------- > + * The direct vCMDQ apertures (0x10000/0x20000) are HW aliases of the VINTF > + * apertures (0x30000/0x40000); they expose the same per-vCMDQ register slots > + * under different addressing. > + * > + * 0x00000 CMDQV Config page: QEMU-trapped. > + * 0x10000 Direct vCMDQ Page 0 (control/status): QEMU-trapped and routed > + * via vintf_ptr() to either the mmap'd VINTF page (allocated > + * slot) or a per-vCMDQ register cache (unallocated slot). > + * 0x20000 Direct vCMDQ Page 1 (BASE / DRAM addresses): QEMU-trapped. > + * 0x30000 VINTF Page 0 (per-VINTF control/status): mmap'd from the host > + * via iommufd and installed into guest MMIO as a RAM-device > + * subregion after the first HW_QUEUE_ALLOC; subsequent accesses > + * bypass QEMU. > + * 0x40000 VINTF Page 1 (per-VINTF BASE): QEMU-trapped. > + * > + * The direct vCMDQ aperture stays trapped (rather than aliased to the VINTF
direct vCMDQ aperture page 0 stays trapped as opposed to the VIINTF page0 > + * mmap) to preserve the spec's R/W register semantics for unallocated > + * vCMDQs: the direct aperture allows programming before VINTF allocation, > + * while aliasing would route through the VINTF drop path instead. see last discussion > + * > + * Lifecycle (driven by guest events) > + * ---------------------------------- > + * 1. First vfio-pci device attach (.set_iommu_device) triggers: > + * - tegra241_cmdqv_probe(): IOMMU_GET_HW_INFO confirms host CMDQV > support. > + * - IOMMU_VIOMMU_ALLOC: the kernel allocates a VINTF for this VM, > + * configures the VM's VMID (from its stage-2 HWPT) in VINTF_CONFIG, > + * forces HYP_OWN=0, and returns the mmap offset/length for VINTF Page > 0. what about the v/p SID mapping. How does the kernel know which SIDs are supposed to write into that VINTF? where do we pass this info? > + * > + * 2. Guest writes VINTF_CONFIG.ENABLE = 1: > + * QEMU mmap()s the offset from step 1 into its address space and reports > + * STATUS.ENABLE_OK = 1. The host VINTF was already enabled by > + * IOMMU_VIOMMU_ALLOC; QEMU only acks back. > + * > + * 3. Guest completes vCMDQ setup (BASE, CMDQ_ALLOC_MAP.ALLOC, CMDQV_EN, > + * VINTF.ENABLE, in any order; each precondition write retries the > + * allocation): > + * IOMMU_HW_QUEUE_ALLOC binds the guest BASE GPA (translated through > + * stage-2 and pinned by the kernel) to a host vCMDQ in this VM's VINTF. > + * > + * 4. After the first successful HW_QUEUE_ALLOC, the mmap'd VINTF Page 0 is > + * installed into guest MMIO as a RAM-device subregion. Guest vCMDQ Page 0 > + * accesses (CMDQ_EN, PROD/CONS_INDX, STATUS, GERROR/GERRORN) thereafter > + * go straight to host hardware, bypassing QEMU. > + * > + * Per-VM isolation > + * ---------------- > + * - Each VM has its own iommufd FD; all iommufd objects (VINTF, hw_queues, > + * mmap regions) belong to that FD. Cross-FD lookups fail, so one VM > + * cannot reach another VM's IDs. > + * - IOMMU_VIOMMU_ALLOC configures the VM's VMID in VINTF_CONFIG; the CMDQV > + * hardware substitutes / checks VMID on every command the guest issues. > + * - The kernel allocates the VINTF with HYP_OWN = 0, which restricts the > + * guest to a safe subset of commands. > + * - Per VINTF, the kernel programs SID_MATCH and SID_REPLACE to restrict > + * invalidations to the StreamIDs assigned to this VM. > + * - IOMMU_HW_QUEUE_ALLOC binds each vCMDQ to a single VINTF, so a guest > + * cannot reach a vCMDQ that belongs to another VM. > + * > + * Limits exposed to the guest > + * --------------------------- > + * One VINTF per emulated SMMUv3 and two vCMDQs per VINTF. > + */ > + > #include "qemu/osdep.h" > #include "qemu/log.h" > #include "qemu/error-report.h" Eric
