Hi Shammer,

On 6/1/26 1:42 PM, Shameer Kolothum wrote:
> Add an overview describing the Tegra241 CMDQV passthrough model, MMIO
> layout, guest-driven lifecycle, and per-VM isolation.
some suggestions inline. Thank you very much for the documentation
efforts! That's really helpful.
>
> Signed-off-by: Shameer Kolothum <[email protected]>
> ---
>  hw/arm/tegra241-cmdqv.c | 93 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 93 insertions(+)
>
> diff --git a/hw/arm/tegra241-cmdqv.c b/hw/arm/tegra241-cmdqv.c
> index 4dba1783fa..5dbf68b421 100644
> --- a/hw/arm/tegra241-cmdqv.c
> +++ b/hw/arm/tegra241-cmdqv.c
> @@ -7,6 +7,99 @@
>   * SPDX-License-Identifier: GPL-2.0-or-later
>   */
>  
> +/*
> + * Tegra241 CMDQV - overview
> + * =========================
> + *
> + * NVIDIA Tegra241 extends SMMUv3 with a Command Queue Virtualization 
> (CMDQ-V)
> + * block. It lets a guest issue SMMU invalidation commands directly to
> + * dedicated hardware queues (vCMDQs) without trapping into the hypervisor on
> + * the fast path. vCMDQs are grouped into Virtual Interfaces (VINTFs); the
s/are grouped/are exclusively allocated to VINTFS?
> + * host kernel allocates one VINTF per emulated SMMUv3 instance via iommufd.
> + * QEMU emulates the CMDQV MMIO region and drives the host kernel calls
> + * (VIOMMU_ALLOC, HW_QUEUE_ALLOC, mmap); the actual command processing
> + * happens on real hardware.
> + *
> + * A vCMDQ becomes functional only once allocated to a VINTF; until then,
> + * register accesses only touch the QEMU-side cache and no command
> + * processing happens. After allocation, command processing runs on the
state is migrated from cache to HW and command processing ...
> + * host hardware, and guest accesses to the live control/status registers
> + * bypass QEMU and reach the host directly.
> + *
> + * MMIO layout (64KB pages, total TEGRA241_CMDQV_IO_LEN)
> + * -----------------------------------------------------
> + * The direct vCMDQ apertures (0x10000/0x20000) are HW aliases of the VINTF
> + * apertures (0x30000/0x40000); they expose the same per-vCMDQ register slots
> + * under different addressing.
I would put above sentence after the layout description.
> + *
> + *   0x00000  CMDQV Config page: QEMU-trapped.
> + *   0x10000  Direct vCMDQ Page 0 (control/status): QEMU-trapped and routed
> + *            to either the mmap'd VINTF Page 0 (if the vCMDQ has been
> + *            allocated to a VINTF) or a per-vCMDQ register cache 
> (otherwise).
> + *   0x20000  Direct vCMDQ Page 1 (BASE / DRAM addresses): QEMU-trapped.
> + *   0x30000  VINTF Page 0 (per-VINTF control/status): mmap'd from the host
> + *            via iommufd and installed into guest MMIO as a RAM-device
> + *            subregion after the first HW_QUEUE_ALLOC; subsequent accesses
> + *            bypass QEMU.
> + *   0x40000  VINTF Page 1 (per-VINTF BASE): QEMU-trapped. Although this is
> + *            a HW alias of the direct Page 1, the kernel only exposes mmap
> + *            for VINTF Page 0; VINTF Page 1 is not mmap'd and stays trapped.
> + *
> + * The direct vCMDQ Page 0 stays trapped rather than aliased to the VINTF
> + * Page 0 mmap. Direct aperture read/write is not an expected common usecase,
You can justify it as you did in the individual patch (to keep
management of unmapped vcmds in direct page compliant with the spec)
> + * but trapping keeps accesses well-defined for an unallocated vCMDQ.
> + *
> + * Lifecycle (driven by guest events)
> + * ----------------------------------
> + * 1. First vfio-pci device attach (.set_iommu_device) triggers:
> + *    - tegra241_cmdqv_probe(): IOMMU_GET_HW_INFO confirms host CMDQV 
> support.
> + *    - IOMMU_VIOMMU_ALLOC: the kernel allocates a VINTF for this VM,
> + *      configures the VM's VMID (from its stage-2 HWPT) in VINTF_CONFIG,
> + *      forces HYP_OWN=0, and returns the mmap offset/length for VINTF Page 
> 0.
> + *
> + * 2. Guest writes VINTF_CONFIG.ENABLE = 1:
> + *    QEMU mmap()s the offset from step 1 into its address space and reports
> + *    STATUS.ENABLE_OK = 1. The host VINTF was already enabled by
> + *    IOMMU_VIOMMU_ALLOC; QEMU only acks back.
the host VINTF ... : I would add this in 1. I would remove "QEMU only
acks back."
> + *
> + * 3. Guest completes vCMDQ setup (BASE, CMDQ_ALLOC_MAP.ALLOC, CMDQV_EN,
> + *    VINTF.ENABLE, in any order; each precondition write retries the
the HW queue allocation:
> + *    allocation):
> + *    IOMMU_HW_QUEUE_ALLOC binds the guest BASE GPA (translated through
> + *    stage-2 and pinned by the kernel) to a host vCMDQ in this VM's VINTF.
the guest is granted a new host vcmdq in its vintf?
> + *
> + * 4. After the first successful HW_QUEUE_ALLOC, the mmap'd VINTF Page 0 is
> + *    installed into guest MMIO as a RAM-device subregion. Guest VINTF Page 0
> + *    accesses (CMDQ_EN, PROD/CONS_INDX, STATUS, GERROR/GERRORN) thereafter
> + *    go straight to host hardware, bypassing QEMU.
> + *
> + * 5. Guest SMMU driver programs a Stream Table Entry for a passthrough
> + *    device: IOMMU_VDEVICE_ALLOC programs SID_MATCH/SID_REPLACE in this
> + *    VM's VINTF so the device's guest vSID translates to its host pSID.
> + *    Commands referencing unmapped SIDs are rejected by HW.
> + *
> + * Per-VM isolation
> + * ----------------
> + * - Each VM has its own iommufd FD; all iommufd objects (VINTF, vdevices,
> + *   hw_queues, mmap regions) belong to that FD. Cross-FD lookups fail, so
> + *   one VM cannot reach another VM's IDs.
> + * - IOMMU_VIOMMU_ALLOC configures the VM's VMID in VINTF_CONFIG; the CMDQV
> + *   hardware substitutes / checks VMID on every command the guest issues.
> + * - The kernel allocates the VINTF with HYP_OWN = 0, which restricts the
> + *   guest to a safe subset of commands.
> + * - IOMMU_VDEVICE_ALLOC populates SID_MATCH/SID_REPLACE so invalidations
> + *   only reach the host StreamIDs assigned to this VM (see step 5).
> + * - IOMMU_HW_QUEUE_ALLOC binds each vCMDQ to a single VINTF, so a guest
> + *   cannot reach a vCMDQ that belongs to another VM.
> + *
> + * Limits exposed to the guest
> + * ---------------------------
> + * One VINTF per emulated SMMUv3 and two vCMDQs per VINTF. Maximum vCMDQ
> + * size is 8MiB. The queue must be physically contiguous (the HW reads it
> + * via host PA), so QEMU caps it to the host memory-backend page size. Use
> + * hugepage backing large enough to keep CMDQS at the HW maximum.
> + */
> +
>  #include "qemu/osdep.h"
>  #include "qemu/log.h"
>  #include "qemu/error-report.h"
Thanks

Eric


Reply via email to