Hi Shameer,

On 5/22/26 12:01 PM, Shameer Kolothum Thodi wrote:
>>> On 5/19/26 12:36 PM, Shameer Kolothum wrote:
>>> Actually is the whole host passthrough principle is not really explained
>>> anywhere. At least that's my feeling. It would be nice to have a summary
>>> for it, in the coverletter and in individual patch. Or maybe we can link
>>> to another doc. Reading the kernel uapi does not really provide the full
>>> picture, at least that's my own feeling.
>> Fair point. A summary of operation is useful. I am thinking of adding
>> it at the top of hw/arm/tegra241-cmdqv.c:
>>
>> /*
>>    * Tegra241 CMDQV - overview
>>    * ---------------------------------------
>>    *...
>>    */
>>
>> I will populate the details and share it for review before v6.
> Please find below. Hopefully, I have captured all the important details.
> Please take a look and let me know.
>
> Thanks,
> Shameer
>
> diff --git a/hw/arm/tegra241-cmdqv.c b/hw/arm/tegra241-cmdqv.c
> index ebf12d0597..5a103f37b8 100644
> --- a/hw/arm/tegra241-cmdqv.c
> +++ b/hw/arm/tegra241-cmdqv.c
> @@ -7,6 +7,84 @@
>   * SPDX-License-Identifier: GPL-2.0-or-later
>   */
>
> +/*
> + * Tegra241 CMDQV - overview
> + * =========================
> + *
> + * NVIDIA Tegra241 extends SMMUv3 with a Command Queue Virtualization 
> (CMDQ-V)
> + * block. It lets a guest issue SMMU invalidation commands directly to
> + * dedicated hardware queues (vCMDQs) without trapping into the hypervisor on
> + * the fast path. vCMDQs are grouped into Virtual Interfaces (VINTFs); the
> + * host kernel allocates one VINTF per emulated SMMUv3 instance via iommufd.
> + * QEMU emulates the CMDQV MMIO region and drives the host kernel calls
> + * (VIOMMU_ALLOC, HW_QUEUE_ALLOC, mmap); the actual command processing 
> happens
> + * on real hardware.
> + *
> + * MMIO layout (64KB pages, total TEGRA241_CMDQV_IO_LEN)
> + * -----------------------------------------------------
> + * The direct vCMDQ apertures (0x10000/0x20000) are HW aliases of the VINTF
> + * apertures (0x30000/0x40000); they expose the same per-vCMDQ register slots
> + * under different addressing.
> + *
> + *   0x00000  CMDQV Config page: QEMU-trapped.
> + *   0x10000  Direct vCMDQ Page 0 (control/status): QEMU-trapped and routed
> + *            via vintf_ptr() to either the mmap'd VINTF page (allocated
> + *            slot) or a per-vCMDQ register cache (unallocated slot).
> + *   0x20000  Direct vCMDQ Page 1 (BASE / DRAM addresses): QEMU-trapped.
> + *   0x30000  VINTF Page 0 (per-VINTF control/status): mmap'd from the host
> + *            via iommufd and installed into guest MMIO as a RAM-device
> + *            subregion after the first HW_QUEUE_ALLOC; subsequent accesses
> + *            bypass QEMU.
> + *   0x40000  VINTF Page 1 (per-VINTF BASE): QEMU-trapped.
> + *
> + * The direct vCMDQ aperture stays trapped (rather than aliased to the VINTF

direct vCMDQ aperture page 0 stays trapped as opposed to the VIINTF page0

> + * mmap) to preserve the spec's R/W register semantics for unallocated
> + * vCMDQs: the direct aperture allows programming before VINTF allocation,
> + * while aliasing would route through the VINTF drop path instead.
see last discussion
> + *
> + * Lifecycle (driven by guest events)
> + * ----------------------------------
> + * 1. First vfio-pci device attach (.set_iommu_device) triggers:
> + *    - tegra241_cmdqv_probe(): IOMMU_GET_HW_INFO confirms host CMDQV 
> support.
> + *    - IOMMU_VIOMMU_ALLOC: the kernel allocates a VINTF for this VM,
> + *      configures the VM's VMID (from its stage-2 HWPT) in VINTF_CONFIG,
> + *      forces HYP_OWN=0, and returns the mmap offset/length for VINTF Page 
> 0.
what about the v/p SID mapping. How does the kernel know which SIDs are
supposed to write into that VINTF? where do we pass this info?
> + *
> + * 2. Guest writes VINTF_CONFIG.ENABLE = 1:
> + *    QEMU mmap()s the offset from step 1 into its address space and reports
> + *    STATUS.ENABLE_OK = 1. The host VINTF was already enabled by
> + *    IOMMU_VIOMMU_ALLOC; QEMU only acks back.
> + *
> + * 3. Guest completes vCMDQ setup (BASE, CMDQ_ALLOC_MAP.ALLOC, CMDQV_EN,
> + *    VINTF.ENABLE, in any order; each precondition write retries the
> + *    allocation):
> + *    IOMMU_HW_QUEUE_ALLOC binds the guest BASE GPA (translated through
> + *    stage-2 and pinned by the kernel) to a host vCMDQ in this VM's VINTF.
> + *
> + * 4. After the first successful HW_QUEUE_ALLOC, the mmap'd VINTF Page 0 is
> + *    installed into guest MMIO as a RAM-device subregion. Guest vCMDQ Page 0
> + *    accesses (CMDQ_EN, PROD/CONS_INDX, STATUS, GERROR/GERRORN) thereafter
> + *    go straight to host hardware, bypassing QEMU.
> + *
> + * Per-VM isolation
> + * ----------------
> + * - Each VM has its own iommufd FD; all iommufd objects (VINTF, hw_queues,
> + *   mmap regions) belong to that FD. Cross-FD lookups fail, so one VM
> + *   cannot reach another VM's IDs.
> + * - IOMMU_VIOMMU_ALLOC configures the VM's VMID in VINTF_CONFIG; the CMDQV
> + *   hardware substitutes / checks VMID on every command the guest issues.
> + * - The kernel allocates the VINTF with HYP_OWN = 0, which restricts the
> + *   guest to a safe subset of commands.
> + * - Per VINTF, the kernel programs SID_MATCH and SID_REPLACE to restrict
> + *   invalidations to the StreamIDs assigned to this VM.
> + * - IOMMU_HW_QUEUE_ALLOC binds each vCMDQ to a single VINTF, so a guest
> + *   cannot reach a vCMDQ that belongs to another VM.
> + *
> + * Limits exposed to the guest
> + * ---------------------------
> + * One VINTF per emulated SMMUv3 and two vCMDQs per VINTF.
> + */
> +
>  #include "qemu/osdep.h"
>  #include "qemu/log.h"
>  #include "qemu/error-report.h"

Eric


Reply via email to