Add an overview describing the Tegra241 CMDQV passthrough model, MMIO
layout, guest-driven lifecycle, and per-VM isolation.

Signed-off-by: Shameer Kolothum <[email protected]>
---
 hw/arm/tegra241-cmdqv.c | 93 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/hw/arm/tegra241-cmdqv.c b/hw/arm/tegra241-cmdqv.c
index 4dba1783fa..5dbf68b421 100644
--- a/hw/arm/tegra241-cmdqv.c
+++ b/hw/arm/tegra241-cmdqv.c
@@ -7,6 +7,99 @@
  * SPDX-License-Identifier: GPL-2.0-or-later
  */
 
+/*
+ * Tegra241 CMDQV - overview
+ * =========================
+ *
+ * NVIDIA Tegra241 extends SMMUv3 with a Command Queue Virtualization (CMDQ-V)
+ * block. It lets a guest issue SMMU invalidation commands directly to
+ * dedicated hardware queues (vCMDQs) without trapping into the hypervisor on
+ * the fast path. vCMDQs are grouped into Virtual Interfaces (VINTFs); the
+ * host kernel allocates one VINTF per emulated SMMUv3 instance via iommufd.
+ * QEMU emulates the CMDQV MMIO region and drives the host kernel calls
+ * (VIOMMU_ALLOC, HW_QUEUE_ALLOC, mmap); the actual command processing
+ * happens on real hardware.
+ *
+ * A vCMDQ becomes functional only once allocated to a VINTF; until then,
+ * register accesses only touch the QEMU-side cache and no command
+ * processing happens. After allocation, command processing runs on the
+ * host hardware, and guest accesses to the live control/status registers
+ * bypass QEMU and reach the host directly.
+ *
+ * MMIO layout (64KB pages, total TEGRA241_CMDQV_IO_LEN)
+ * -----------------------------------------------------
+ * The direct vCMDQ apertures (0x10000/0x20000) are HW aliases of the VINTF
+ * apertures (0x30000/0x40000); they expose the same per-vCMDQ register slots
+ * under different addressing.
+ *
+ *   0x00000  CMDQV Config page: QEMU-trapped.
+ *   0x10000  Direct vCMDQ Page 0 (control/status): QEMU-trapped and routed
+ *            to either the mmap'd VINTF Page 0 (if the vCMDQ has been
+ *            allocated to a VINTF) or a per-vCMDQ register cache (otherwise).
+ *   0x20000  Direct vCMDQ Page 1 (BASE / DRAM addresses): QEMU-trapped.
+ *   0x30000  VINTF Page 0 (per-VINTF control/status): mmap'd from the host
+ *            via iommufd and installed into guest MMIO as a RAM-device
+ *            subregion after the first HW_QUEUE_ALLOC; subsequent accesses
+ *            bypass QEMU.
+ *   0x40000  VINTF Page 1 (per-VINTF BASE): QEMU-trapped. Although this is
+ *            a HW alias of the direct Page 1, the kernel only exposes mmap
+ *            for VINTF Page 0; VINTF Page 1 is not mmap'd and stays trapped.
+ *
+ * The direct vCMDQ Page 0 stays trapped rather than aliased to the VINTF
+ * Page 0 mmap. Direct aperture read/write is not an expected common usecase,
+ * but trapping keeps accesses well-defined for an unallocated vCMDQ.
+ *
+ * Lifecycle (driven by guest events)
+ * ----------------------------------
+ * 1. First vfio-pci device attach (.set_iommu_device) triggers:
+ *    - tegra241_cmdqv_probe(): IOMMU_GET_HW_INFO confirms host CMDQV support.
+ *    - IOMMU_VIOMMU_ALLOC: the kernel allocates a VINTF for this VM,
+ *      configures the VM's VMID (from its stage-2 HWPT) in VINTF_CONFIG,
+ *      forces HYP_OWN=0, and returns the mmap offset/length for VINTF Page 0.
+ *
+ * 2. Guest writes VINTF_CONFIG.ENABLE = 1:
+ *    QEMU mmap()s the offset from step 1 into its address space and reports
+ *    STATUS.ENABLE_OK = 1. The host VINTF was already enabled by
+ *    IOMMU_VIOMMU_ALLOC; QEMU only acks back.
+ *
+ * 3. Guest completes vCMDQ setup (BASE, CMDQ_ALLOC_MAP.ALLOC, CMDQV_EN,
+ *    VINTF.ENABLE, in any order; each precondition write retries the
+ *    allocation):
+ *    IOMMU_HW_QUEUE_ALLOC binds the guest BASE GPA (translated through
+ *    stage-2 and pinned by the kernel) to a host vCMDQ in this VM's VINTF.
+ *
+ * 4. After the first successful HW_QUEUE_ALLOC, the mmap'd VINTF Page 0 is
+ *    installed into guest MMIO as a RAM-device subregion. Guest VINTF Page 0
+ *    accesses (CMDQ_EN, PROD/CONS_INDX, STATUS, GERROR/GERRORN) thereafter
+ *    go straight to host hardware, bypassing QEMU.
+ *
+ * 5. Guest SMMU driver programs a Stream Table Entry for a passthrough
+ *    device: IOMMU_VDEVICE_ALLOC programs SID_MATCH/SID_REPLACE in this
+ *    VM's VINTF so the device's guest vSID translates to its host pSID.
+ *    Commands referencing unmapped SIDs are rejected by HW.
+ *
+ * Per-VM isolation
+ * ----------------
+ * - Each VM has its own iommufd FD; all iommufd objects (VINTF, vdevices,
+ *   hw_queues, mmap regions) belong to that FD. Cross-FD lookups fail, so
+ *   one VM cannot reach another VM's IDs.
+ * - IOMMU_VIOMMU_ALLOC configures the VM's VMID in VINTF_CONFIG; the CMDQV
+ *   hardware substitutes / checks VMID on every command the guest issues.
+ * - The kernel allocates the VINTF with HYP_OWN = 0, which restricts the
+ *   guest to a safe subset of commands.
+ * - IOMMU_VDEVICE_ALLOC populates SID_MATCH/SID_REPLACE so invalidations
+ *   only reach the host StreamIDs assigned to this VM (see step 5).
+ * - IOMMU_HW_QUEUE_ALLOC binds each vCMDQ to a single VINTF, so a guest
+ *   cannot reach a vCMDQ that belongs to another VM.
+ *
+ * Limits exposed to the guest
+ * ---------------------------
+ * One VINTF per emulated SMMUv3 and two vCMDQs per VINTF. Maximum vCMDQ
+ * size is 8MiB. The queue must be physically contiguous (the HW reads it
+ * via host PA), so QEMU caps it to the host memory-backend page size. Use
+ * hugepage backing large enough to keep CMDQS at the HW maximum.
+ */
+
 #include "qemu/osdep.h"
 #include "qemu/log.h"
 #include "qemu/error-report.h"
-- 
2.43.0


Reply via email to