On 5/21/26 2:26 PM, Shameer Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Eric Auger <[email protected]>
>> Sent: 21 May 2026 10:30
>> To: Shameer Kolothum Thodi <[email protected]>; qemu-
>> [email protected]; [email protected]
>> Cc: [email protected]; [email protected]; [email protected]; Nicolin
>> Chen <[email protected]>; Nathan Chen <[email protected]>; Matt
>> Ochs <[email protected]>; Jiandi An <[email protected]>; Jason Gunthorpe
>> <[email protected]>; [email protected]; Krishnakant Jaju
>> <[email protected]>; [email protected]
>> Subject: Re: [PATCH v5 19/32] hw/arm/tegra241-cmdqv: Allocate HW
>> VCMDQs once configured
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi,
>>
>> On 5/19/26 12:36 PM, Shameer Kolothum wrote:
>>> From: Nicolin Chen <[email protected]>
>>>
>>> Add support for allocating IOMMUFD hardware queues when the guest
>>> programs the VCMDQ BASE registers.
>> You should add the fact that
>>
>> iommufd_backend_alloc_hw_queue() requires the GPA of the VCMDQ. This
>> explains why you need to those operations once the GPA has been populated.
> Ok.
>
>>> VCMDQ_EN lives in VCMDQ_CONFIG, which is on the VINTF Page0 region
>>> that a later patch maps directly into the guest — so QEMU won't trap
>>> its writes. Allocate the hardware queue instead once all of these
>>> are set: a RAM-backed BASE, CMDQ_ALLOC_MAP.ALLOC, and CMDQV /
>> VINTF
>>> enabled. Each precondition write retries the allocation, so the
>>> guest may program them in any order.
>>> If a hardware queue was previously allocated for the same VCMDQ,
>>> free it before reallocation.
>>>
>>> Writes with invalid addresses are ignored.
>>>
>>> All allocated VCMDQs are freed when CMDQV or VINTF is disabled, or
>>> when the ALLOC bit is cleared.
>>>
>>> Signed-off-by: Nicolin Chen <[email protected]>
>>> Co-developed-by: Shameer Kolothum <[email protected]>
>>> Signed-off-by: Shameer Kolothum <[email protected]>
>>> ---
>>> hw/arm/tegra241-cmdqv.h | 11 ++++
>>> hw/arm/tegra241-cmdqv.c | 138
>> ++++++++++++++++++++++++++++++++++++++--
>>> 2 files changed, 143 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/arm/tegra241-cmdqv.h b/hw/arm/tegra241-cmdqv.h
>>> index f0b031b4dc..7a8cb2ebc7 100644
>>> --- a/hw/arm/tegra241-cmdqv.h
>>> +++ b/hw/arm/tegra241-cmdqv.h
>>> @@ -45,6 +45,7 @@ typedef struct Tegra241CMDQV {
>>> MemoryRegion mmio_cmdqv;
>>> qemu_irq irq;
>>> IOMMUFDVeventq *veventq;
>>> + IOMMUFDHWqueue *vcmdq[TEGRA241_CMDQV_MAX_CMDQ];
>>> void *vintf_page0;
>>>
>>> /* CMDQ-V Config page register cache */
>>> @@ -361,6 +362,16 @@
>> SMMU_CMDQV_VI_VCMDQi_CONS_INDX_BASE_DRAM_L_(1)
>>> SMMU_CMDQV_VI_VCMDQi_CONS_INDX_BASE_DRAM_H_(0)
>>> SMMU_CMDQV_VI_VCMDQi_CONS_INDX_BASE_DRAM_H_(1)
>>>
>>> +static inline bool tegra241_cmdqv_enabled(Tegra241CMDQV *cmdqv)
>>> +{
>>> + return cmdqv->status & R_STATUS_CMDQV_ENABLED_MASK;
>>> +}
>>> +
>>> +static inline bool tegra241_vintf_enabled(Tegra241CMDQV *cmdqv)
>>> +{
>>> + return cmdqv->vintf_status & R_VINTF0_STATUS_ENABLE_OK_MASK;
>>> +}
>>> +
>>> const SMMUv3AccelCmdqvOps *tegra241_cmdqv_get_ops(void);
>>>
>>> #endif /* HW_ARM_TEGRA241_CMDQV_H */
>>> diff --git a/hw/arm/tegra241-cmdqv.c b/hw/arm/tegra241-cmdqv.c
>>> index 4d9f094b2a..f4968520f3 100644
>>> --- a/hw/arm/tegra241-cmdqv.c
>>> +++ b/hw/arm/tegra241-cmdqv.c
>>> @@ -18,6 +18,95 @@
>>> #include "tegra241-cmdqv.h"
>>> #include "trace.h"
>>>
>>> +static void tegra241_cmdqv_free_vcmdq(Tegra241CMDQV *cmdqv, int
>> index)
>>> +{
>>> + IOMMUFDViommu *viommu = cmdqv->s_accel->viommu;
>>> + IOMMUFDHWqueue *vcmdq = cmdqv->vcmdq[index];
>>> +
>>> + if (!vcmdq) {
>>> + return;
>>> + }
>>> + iommufd_backend_free_id(viommu->iommufd, vcmdq->hw_queue_id);
>>> + g_free(vcmdq);
>>> + cmdqv->vcmdq[index] = NULL;
>>> +}
>>> +
>>> +static void tegra241_cmdqv_free_all_vcmdq(Tegra241CMDQV *cmdqv)
>>> +{
>>> + /* uapi/linux/iommufd.h: hw_queue destroy must be in descending
>> @index. */
>>> + for (int i = (TEGRA241_CMDQV_MAX_CMDQ - 1); i >= 0; i--) {
>>> + tegra241_cmdqv_free_vcmdq(cmdqv, i);
>>> + }
>>> +}
>>> +
>>> +/*
>>> + * Allocate a host HW VCMDQ from the current cached BASE / size for
>> @index.
>>> + *
>>> + * Setup is a no-op (returns true) when any of the following preconditions
>>> + * isn't met yet:
>>> + * - BASE not prgrammed yet.
>>> + * - VCMDQ is not mapped to a VINTF (CMDQ_ALLOC_MAP.ALLOC == 0)
>>> + * - BASE / size don't resolve to a RAM region
>>> + * - CMDQV global enable or VINTF enable is not yet asserted
>>> + */
>>> +static bool tegra241_cmdqv_setup_vcmdq(Tegra241CMDQV *cmdqv, int
>> index,
>>> + Error **errp)
>>> +{
>>> + SMMUv3AccelState *accel = cmdqv->s_accel;
>>> + uint64_t base_mask = (uint64_t)R_VCMDQ0_BASE_L_ADDR_MASK |
>>> + (uint64_t)R_VCMDQ0_BASE_H_ADDR_MASK << 32;
>>> + uint64_t addr = cmdqv->vcmdq_base[index] & base_mask;
>>> + uint64_t log2 = cmdqv->vcmdq_base[index] &
>> R_VCMDQ0_BASE_L_LOG2SIZE_MASK;
>>> + uint64_t size = 1ULL << (log2 + 4);
>>> + IOMMUFDViommu *viommu = accel->viommu;
>>> + IOMMUFDHWqueue *hw_queue;
>>> + uint32_t hw_queue_id;
>>> +
>>> + /* BASE not programmed yet. */
>>> + if (!cmdqv->vcmdq_base[index]) {
>>> + return true;
>>> + }
>>> +
>>> + /* VCMDQ not yet mapped to a VINTF. */
>>> + if (!(cmdqv->cmdq_alloc_map[index] &
>> R_CMDQ_ALLOC_MAP_0_ALLOC_MASK)) {
>>> + return true;
>>> + }
>>> +
>>> + /* Ignore any invalid address. This may come as part of reset etc. */
>>> + if (!address_space_range_is_ram(&address_space_memory, addr, size)) {
>> why do you need to test the whole range and not only the start address
> That was to make it more defensive based on v4 discussion here:
> https://lore.kernel.org/qemu-devel/aft%[email protected]/
>
> Please let me know if there is a better way.
Well it is unclear to me. You want to make sure the GPA was fully set
through the H and L regs. To me that's enough. As far as I understand
Nicolin's reply, the kernel checks the rest. But please sync with Jason
or Nicolin
Eric
>
>>> + return true;
>>> + }
>>> +
>>> + if (!tegra241_cmdqv_enabled(cmdqv) ||
>> !tegra241_vintf_enabled(cmdqv)) {
>>> + return true;
>>> + }
>>> +
>>> + tegra241_cmdqv_free_vcmdq(cmdqv, index);
>>> +
>>> + if (!iommufd_backend_alloc_hw_queue(viommu->iommufd, viommu-
>>> viommu_id,
>>> + IOMMU_HW_QUEUE_TYPE_TEGRA241_CMDQV,
>>> + index, addr, size, &hw_queue_id,
>>> + errp)) {
>>> + return false;
>>> + }
>>> + hw_queue = g_new(IOMMUFDHWqueue, 1);
>>> + hw_queue->hw_queue_id = hw_queue_id;
>>> + hw_queue->viommu = viommu;
>>> + cmdqv->vcmdq[index] = hw_queue;
>>> +
>>> + return true;
>>> +}
>>> +
>>> +static void tegra241_cmdqv_setup_all_vcmdq(Tegra241CMDQV *cmdqv,
>>> + Error **errp)
>>> +{
>>> + for (int i = 0; i < TEGRA241_CMDQV_MAX_CMDQ; i++) {
>>> + if (!tegra241_cmdqv_setup_vcmdq(cmdqv, i, errp)) {
>> if case of failure,what is the functional consequence. Can you comment
>> on this in the commit msg or here.
> Today QEMU only logs the error:
>
> qemu-system-aarch64: IOMMU_HW_QUEUE_ALLOC failed: error -22
>
> The guest visible behaviour then depends on whether VINTF0 Page 0 was
> mapped into guest memory before this failure:
>
> Case 1: setup_vcmdq() succeeds for vcmdq0, fails for vcmdq1:
>
> - VINTF0 Page 0 was mapped to the guest after vcmdq0's success.
> - Guest writes VCMDQ1_CONFIG go directly to HW.
> - Guest reads VCMDQ1_STATUS directly from HW:
>
> ...
> tegra241_cmdqv: VINTF0: VCMDQ1/LVCMDQ1: failed to enable,
> STATUS=0x00000000
> tegra241_cmdqv: VINTF0: VCMDQ1/LVCMDQ1: GERRORN=0x0, GERROR=0x0, CONS=0x0
> ...
>
> Case 2: setup_vcmdq() fails for vcmdq0:
>
> - VINTF0 Page 0 is not mapped to the guest.
> - Guest writes VCMDQ0_CONFIG. Trapped by QEMU.
> - Guest reads VCMDQ0_STATUS. Trapped by QEMU.
> - Both served from the register cache. The cache currently doesn't
> reflect the failure, so the guest thinks the queue is enabled and
> proceeds to issue commands, eventually:
>
> arm-smmu-v3 arm-smmu-v3.0.auto: CMD_SYNC timeout at 0x00000002 [hwprod
> 0x00000002, hwcons 0x00000000]
> arm-smmu-v3 arm-smmu-v3.0.auto: CMD_SYNC timeout at 0x00000004 [hwprod
> 0x00000004, hwcons 0x00000000]
>
> Case 2 needs fixing. I'll fix the register cache on the error path
> for v6 and document it.
OK ;-)
Thanks
Eric
>
> Thanks,
> Shameer