On 11/4/25 1:26 PM, Shameer Kolothum wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <[email protected]>
>> Sent: 04 November 2025 11:06
>> To: Shameer Kolothum <[email protected]>; qemu-
>> [email protected]; [email protected]
>> Cc: [email protected]; Jason Gunthorpe <[email protected]>; Nicolin
>> Chen <[email protected]>; [email protected]; [email protected];
>> Nathan Chen <[email protected]>; Matt Ochs <[email protected]>;
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> Krishnakant Jaju <[email protected]>
>> Subject: Re: [PATCH v5 13/32] hw/arm/smmuv3-accel: Add nested vSTE
>> install/uninstall support
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi Shameer,
>>
>> On 10/31/25 11:49 AM, Shameer Kolothum wrote:
>>> From: Nicolin Chen <[email protected]>
>>>
>>> A device placed behind a vSMMU instance must have corresponding vSTEs
>>> (bypass, abort, or translate) installed. The bypass and abort proxy
>>> nested HWPTs are pre-allocated.
>>>
>>> For translat HWPT, a vDEVICE object is allocated and associated with
>>> the vIOMMU for each guest device. This allows the host kernel to
>>> establish a virtual SID to physical SID mapping, which is required for
>>> handling invalidations and event reporting.
>>>
>>> An translate HWPT is allocated based on the guest STE configuration
>>> and attached to the device when the guest issues SMMU_CMD_CFGI_STE or
>>> SMMU_CMD_CFGI_STE_RANGE, provided the STE enables S1 translation.
>>>
>>> If the guest STE is invalid or S1 translation is disabled, the device
>>> is attached to one of the pre-allocated ABORT or BYPASS HWPTs instead.
>>>
>>> While at it, export both smmu_find_ste() and smmuv3_flush_config() for
>>> use here.
>>>
>>> Signed-off-by: Nicolin Chen <[email protected]>
>>> Signed-off-by: Shameer Kolothum
>> <[email protected]>
>>> Reviewed-by: Jonathan Cameron <[email protected]>
>>> Signed-off-by: Shameer Kolothum <[email protected]>
>>> ---
>>> hw/arm/smmuv3-accel.c | 193
>> +++++++++++++++++++++++++++++++++++++++
>>> hw/arm/smmuv3-accel.h | 23 +++++
>>> hw/arm/smmuv3-internal.h | 20 ++++
>>> hw/arm/smmuv3.c | 18 +++-
>>> hw/arm/trace-events | 2 +
>>> 5 files changed, 253 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c index
>>> d4d65299a8..c74e95a0ea 100644
>>> --- a/hw/arm/smmuv3-accel.c
>>> +++ b/hw/arm/smmuv3-accel.c
>>> @@ -28,6 +28,191 @@ MemoryRegion root; MemoryRegion sysmem;
>> static
>>> AddressSpace *shared_as_sysmem;
>>>
>>> +static bool
>>> +smmuv3_accel_alloc_vdev(SMMUv3AccelDevice *accel_dev, int sid, Error
>>> +**errp) {
>>> + SMMUViommu *vsmmu = accel_dev->vsmmu;
>>> + IOMMUFDVdev *vdev;
>>> + uint32_t vdevice_id;
>>> +
>>> + if (!accel_dev->idev || accel_dev->vdev) {
>>> + return true;
>>> + }
>>> +
>>> + if (!iommufd_backend_alloc_vdev(vsmmu->iommufd, accel_dev->idev-
>>> devid,
>>> + vsmmu->viommu.viommu_id, sid,
>>> + &vdevice_id, errp)) {
>>> + return false;
>>> + }
>>> + if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev,
>>> + vsmmu->bypass_hwpt_id,
>>> errp)) {
>>> + iommufd_backend_free_id(vsmmu->iommufd, vdevice_id);
>>> + return false;
>>> + }
>>> +
>>> + vdev = g_new(IOMMUFDVdev, 1);
>>> + vdev->vdevice_id = vdevice_id;
>>> + vdev->virt_id = sid;
>>> + accel_dev->vdev = vdev;
>>> + return true;
>>> +}
>>> +
>>> +static bool
>>> +smmuv3_accel_dev_uninstall_nested_ste(SMMUv3AccelDevice *accel_dev,
>> bool abort,
>>> + Error **errp)+{
>>> + HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
>>> + SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
>>> + uint32_t hwpt_id;
>>> +
>>> + if (!s1_hwpt || !accel_dev->vsmmu) {
>>> + return true;
>>> + }
>>> +
>>> + if (abort) {
>>> + hwpt_id = accel_dev->vsmmu->abort_hwpt_id;
>>> + } else {
>>> + hwpt_id = accel_dev->vsmmu->bypass_hwpt_id;
>>> + }
>>> +
>>> + if (!host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, errp)) {
>>> + return false;
>>> + }
>>> + trace_smmuv3_accel_uninstall_nested_ste(smmu_get_sid(&accel_dev-
>>> sdev),
>>> + abort ? "abort" : "bypass",
>>> + hwpt_id);
>>> +
>>> + iommufd_backend_free_id(s1_hwpt->iommufd, s1_hwpt->hwpt_id);
>>> + accel_dev->s1_hwpt = NULL;
>>> + g_free(s1_hwpt);
>>> + return true;
>>> +}
>>> +
>>> +static bool
>>> +smmuv3_accel_dev_install_nested_ste(SMMUv3AccelDevice *accel_dev,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data, Error **errp)
>> the name is very close to the caller function, ie.
>> smmuv3_accel_install_nested_ste which also takes a sdev.
>> I would rename to smmuv3_accel_install_hwpt() or something alike
> This one is going to change a bit based on Nicolin's feedback on taking
> care of SMMUEN/GBPA values.
> https://lore.kernel.org/all/aQVLzfaxxSfw1HBL@Asurada-Nvidia/
OK
>
> Probably smmuv3_accel_attach_nested_hwpt() suits better considering
> that’s what it finally ends up doing.
>
>>> +{
>>> + SMMUViommu *vsmmu = accel_dev->vsmmu;
>>> + SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
>>> + HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
>>> + uint32_t flags = 0;
>>> +
>>> + if (!idev || !vsmmu) {
>>> + error_setg(errp, "Device 0x%x has no associated IOMMU dev or
>> vIOMMU",
>>> + smmu_get_sid(&accel_dev->sdev));
>>> + return false;
>>> + }
>>> +
>>> + if (s1_hwpt) {
>>> + if (!smmuv3_accel_dev_uninstall_nested_ste(accel_dev, true, errp))
>>> {
>>> + return false;
>>> + }
>>> + }
>>> +
>>> + s1_hwpt = g_new0(SMMUS1Hwpt, 1);
>>> + s1_hwpt->iommufd = idev->iommufd;
>>> + if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
>>> + vsmmu->viommu.viommu_id, flags,
>>> + data_type, data_len, data,
>>> + &s1_hwpt->hwpt_id, errp)) {
>>> + return false;
>>> + }
>>> +
>>> + if (!host_iommu_device_iommufd_attach_hwpt(idev, s1_hwpt-
>>> hwpt_id, errp)) {
>>> + iommufd_backend_free_id(idev->iommufd, s1_hwpt->hwpt_id);
>>> + return false;
>>> + }
>>> + accel_dev->s1_hwpt = s1_hwpt;
>>> + return true;
>>> +}
>>> +
>>> +bool
>>> +smmuv3_accel_install_nested_ste(SMMUv3State *s, SMMUDevice *sdev,
>> int sid,
>>> + Error **errp) {
>>> + SMMUv3AccelDevice *accel_dev;
>>> + SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
>>> + .inval_ste_allowed = true};
>>> + struct iommu_hwpt_arm_smmuv3 nested_data = {};
>>> + uint64_t ste_0, ste_1;
>>> + uint32_t config;
>>> + STE ste;
>>> + int ret;
>>> +
>>> + if (!s->accel) {
>> don't you want to check !s->vsmmu as well done in
>> smmuv3_accel_install_nested_ste_range()
> Nicolin has a suggestion to merge struct SMMUViommu and
> SMMUv3AccelState into one and avoid the extra layering. I will
> attempt that and all these checking might change as a result.
interesting idea indeed.
>
>>> + return true;
>>> + }
>>> +
>>> + accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
>>> + if (!accel_dev->vsmmu) {
>>> + return true;
>>> + }
>>> +
>>> + if (!smmuv3_accel_alloc_vdev(accel_dev, sid, errp)) {
>>> + return false;
>>> + }
>>> +
>>> + ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
>>> + if (ret) {
>>> + error_setg(errp, "Failed to find STE for Device 0x%x", sid);
>>> + return true;
>> returning true while setting errp looks wrong to me.
> Right, will just return true from here. I am not sure under what circumstances
> we will hit here though.
>
>> + }
>> +
>> + config = STE_CONFIG(&ste);
>> + if (!STE_VALID(&ste) || !STE_CFG_S1_ENABLED(config)) {
>> + if (!smmuv3_accel_dev_uninstall_nested_ste(accel_dev,
>> + STE_CFG_ABORT(config),
>> + errp)) {
>> + return false;
>> + }
>> + smmuv3_flush_config(sdev);
>> + return true;
>> + }
>> +
>> + ste_0 = (uint64_t)ste.word[0] | (uint64_t)ste.word[1] << 32;
>> + ste_1 = (uint64_t)ste.word[2] | (uint64_t)ste.word[3] << 32;
>> + nested_data.ste[0] = cpu_to_le64(ste_0 & STE0_MASK);
>> + nested_data.ste[1] = cpu_to_le64(ste_1 & STE1_MASK);
>> +
>> + if (!smmuv3_accel_dev_install_nested_ste(accel_dev,
>> + IOMMU_HWPT_DATA_ARM_SMMUV3,
>> + sizeof(nested_data),
>> + &nested_data, errp)) {
>> + error_append_hint(errp, "Unable to install sid=0x%x nested STE="
>> + "0x%"PRIx64":=0x%"PRIx64"", sid,
> nit: why ":=" between both 64b?
>> + (uint64_t)le64_to_cpu(nested_data.ste[1]),
>> + (uint64_t)le64_to_cpu(nested_data.ste[0]));
>> + return false;
> in case of various failure cases, do we need to free the vdev?
>
> I don't think we need to fee the vdev corresponding to this vSID on failures
> here. I think the association between vSID and host SID can remain
> intact even if the nested HWPT alloc/attach fails for whatever reason.
OK then worth a comment
Eric
>
> Thanks,
> Shameer