from:"Tian, Kevin"

RE: [PATCH v4 02/11] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()

2022-07-06 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Wednesday, July 6, 2022 10:55 AM
> 
> The disable_dmar_iommu() is called when IOMMU initialization fails or
> the IOMMU is hot-removed from the system. In both cases, there is no
> need to clear the IOMMU translation data structures for devices.
> 
> On the initialization path, the device probing only happens after the
> IOMMU is initialized successfully, hence there're no translation data
> structures.
> 
> On the hot-remove path, there is no real use case where the IOMMU is
> hot-removed, but the devices that it manages are still alive in the
> system. The translation data structures were torn down during device
> release, hence there's no need to repeat it in IOMMU hot-remove path
> either. This removes the unnecessary code and only leaves a check.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 02/11] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()

2022-07-06 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Sunday, July 3, 2022 12:34 PM
> 
> On 2022/7/1 15:58, Tian, Kevin wrote:
> >> From: Lu Baolu  Sent: Wednesday, June 29,
> >> 2022 3:47 PM
> >>
> >> The disable_dmar_iommu() is called when IOMMU initialization fails
> >> or the IOMMU is hot-removed from the system. In both cases, there
> >> is no need to clear the IOMMU translation data structures for
> >> devices.
> >>
> >> On the initialization path, the device probing only happens after
> >> the IOMMU is initialized successfully, hence there're no
> >> translation data structures.
> >>
> >> On the hot-remove path, there is no real use case where the IOMMU
> >> is hot-removed, but the devices that it manages are still alive in
> >> the system. The translation data structures were torn down during
> >> device release, hence there's no need to repeat it in IOMMU
> >> hot-remove path either. This removes the unnecessary code and only
> >> leaves a check.
> >>
> >> Signed-off-by: Lu Baolu 
> >
> > You probably overlooked my last comment on kexec:
> >
> >
> https://lore.kernel.org/lkml/BL1PR11MB52711A71AD9F11B7AE42694C8CAC9
> @BL1PR11MB5271.namprd11.prod.outlook.com/
> >
> >  I think my question is still not answered.
> 
> Sorry! I did overlook that comment. I can see your points now, though it
> seems to be irrelevant to the problems that this series tries to solve.
> 
> The failure path of copying table still needs some improvement. At least
> the pages allocated for root/context tables should be freed in the
> failure path. Even worse, the software occupied a bit of page table
> entry which is feasible for the old ECS, but not work for the new
> scalable mode anymore.
> 
> All these problems deserve a separate series. We could address your
> concerns there. Does this work for you?

Yes, this makes sense to me.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 4/6] iommu/vt-d: Remove unnecessary check in intel_iommu_add()

2022-07-06 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Saturday, July 2, 2022 9:56 AM
> 
> The Intel IOMMU hot-add process starts from dmar_device_hotplug(). It
> uses the global dmar_global_lock to synchronize all the hot-add and
> hot-remove paths. In the hot-add path, the new IOMMU data structures
> are allocated firstly by dmar_parse_one_drhd() and then initialized by
> dmar_hp_add_drhd(). All the IOMMU units are allocated and initialized
> in the same synchronized path. There is no case where any IOMMU unit
> is created and then initialized for multiple times.
> 
> This removes the unnecessary check in intel_iommu_add() which is the
> last reference place of the global IOMMU array.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 3/6] iommu/vt-d: Refactor iommu information of each domain

2022-07-06 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Saturday, July 2, 2022 9:56 AM
> 
> -out_unlock:
> + set_bit(num, iommu->domain_ids);
> + info->refcnt= 1;
> + info->did   = num;
> + info->iommu = iommu;
> + domain->nid = iommu->node;

One nit. this line should be removed as it's incorrect to blindly update
domain->nid and we should just leave to domain_update_iommu_cap()
to decide the right node. Otherwise this causes a policy conflict as
here it is the last attached device deciding the node which is different
from domain_update_iommu_cap() which picks the node of the first
attached device.

Otherwise,

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 2/6] iommu/vt-d: Use IDA interface to manage iommu sequence id

2022-07-06 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Saturday, July 2, 2022 9:56 AM
> 
> Switch dmar unit sequence id allocation and release from bitmap to IDA
> interface.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 11/11] iommu/vt-d: Convert global spinlock into per domain lock

2022-07-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> Using a global device_domain_lock spinlock to protect per-domain device
> tracking lists is an inefficient way, especially considering this lock
> is also needed in the hot paths. This optimizes the locking mechanism
> by converting the global lock to per domain lock.
> 
> On the other hand, as the device tracking lists are never accessed in
> any interrupt context, there is no need to disable interrupts while
> spinning. Replace irqsave variant with spinlock calls.
> 
> Signed-off-by: Lu Baolu 

except the previous comment on where to convert spin_lock_irqsave()
the rest looks good to me.

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 10/11] iommu/vt-d: Use device_domain_lock accurately

2022-07-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> + spin_lock_irqsave(&device_domain_lock, flags);
>   list_for_each_entry(info, &domain->devices, link) {
> - if (!info->dev)
> - continue;
> -

suppose you can replace all spin_lock_irqsave() with spin_lock()
in patch5 instead of leaving some replacement to next patch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 09/11] iommu/vt-d: Fold __dmar_remove_one_dev_info() into its caller

2022-07-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> Fold __dmar_remove_one_dev_info() into dmar_remove_one_dev_info()
> which
> is its only caller. Make the spin lock critical range only cover the
> device list change code and remove some unnecessary checks.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 06/11] iommu/vt-d: Acquiring lock in domain ID allocation helpers

2022-07-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> The iommu->lock is used to protect the per-IOMMU domain ID resource.
> Moving the lock into the ID alloc/free helpers makes the code more
> compact. At the same time, the device_domain_lock is irrelevant to
> the domain ID resource, remove its assertion as well.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 02/11] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()

2022-07-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> The disable_dmar_iommu() is called when IOMMU initialization fails or
> the IOMMU is hot-removed from the system. In both cases, there is no
> need to clear the IOMMU translation data structures for devices.
> 
> On the initialization path, the device probing only happens after the
> IOMMU is initialized successfully, hence there're no translation data
> structures.
> 
> On the hot-remove path, there is no real use case where the IOMMU is
> hot-removed, but the devices that it manages are still alive in the
> system. The translation data structures were torn down during device
> release, hence there's no need to repeat it in IOMMU hot-remove path
> either. This removes the unnecessary code and only leaves a check.
> 
> Signed-off-by: Lu Baolu 

You probably overlooked my last comment on kexec:

https://lore.kernel.org/lkml/bl1pr11mb52711a71ad9f11b7ae42694c8c...@bl1pr11mb5271.namprd11.prod.outlook.com/

I think my question is still not answered.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 01/11] iommu/vt-d: debugfs: Remove device_domain_lock usage

2022-07-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> The domain_translation_struct debugfs node is used to dump the DMAR
> page
> tables for the PCI devices. It potentially races with setting domains to
> devices. The existing code uses the global spinlock device_domain_lock to
> avoid the races.
> 
> This removes the use of device_domain_lock outside of iommu.c by replacing
> it with the group mutex lock. Using the group mutex lock is cleaner and
> more compatible to following cleanups.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 00/11] iommu/vt-d: Optimize the use of locks

2022-07-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> v3:
>  - Split reduction of lock ranges from changing irqsave.
>https://lore.kernel.org/linux-
> iommu/BN9PR11MB52760A3D7C6BF1AF9C9D34658CAA9@BN9PR11MB5276.
> namprd11.prod.outlook.com/
>  - Fully initialize the dev_info before adding it to the list.
>https://lore.kernel.org/linux-
> iommu/BN9PR11MB52764D7CD86448C5E4EB46668CAA9@BN9PR11MB5276.
> namprd11.prod.outlook.com/
>  - Various code and comments refinement.
> 

This doesn't say why original patch2 was removed:

"iommu/vt-d: Remove for_each_device_domain()"

It took me a while to realize that it's already covered by your another
patch fixing RID2PASID. 😊
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v1 4/6] iommu/vt-d: Add VTD_FLAG_IOMMU_PROBED flag

2022-06-30 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Friday, July 1, 2022 11:13 AM
> 
> On 6/30/22 4:29 PM, Tian, Kevin wrote:
> >> From: Lu Baolu 
> >> Sent: Saturday, June 25, 2022 8:52 PM
> >>
> >> In the IOMMU hot-add path, there's a need to check whether an IOMMU
> >> has been probed. Instead of checking the IOMMU pointer in the global
> >> list, it's better to allocate a flag bit in iommu->flags for this
> >> purpose.
> >
> > Sorry I didn't get the point of original check. This is the hotplug path
> > hence the caller of this function should already figure out it's a new
> > iommu before reaching this point?
> >
> 
> Either did I. It was added by below commit without any comments about
> this check.
> 
> commit ffebeb46dd34736c90ffbca1ccb0bef8f4827c44
> Author: Jiang Liu 
> Date:   Sun Nov 9 22:48:02 2014 +0800
> 
>  iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug
> 
>  Implement required callback functions for intel-iommu driver
>  to support DMAR unit hotplug.
> 
>  Signed-off-by: Jiang Liu 
>  Reviewed-by: Yijing Wang 
>  Signed-off-by: Joerg Roedel 
> 
> I went through the whole hot-add process and found this check seemed to
> be duplicate.
> 
> Hot-add process starts from dmar_device_hotplug(), it uses a rwlock to
> synchronize the hot-add paths.
> 
> 2386 down_write(&dmar_global_lock);
> 2387 if (insert)
> 2388 ret = dmar_hotplug_insert(tmp);
> 2389 else
> 2390 ret = dmar_hotplug_remove(tmp);
> 2391 up_write(&dmar_global_lock);
> 
> dmar_device_hotplug()
> ->dmar_hotplug_insert()
> -->dmar_parse_one_drhd()   /* the added intel_iommu is allocated here*/
> -->dmar_hp_add_drhd()/* the intel_iommu is about to bring up */
> --->intel_iommu_add()
> 
> The duplicate check here:
> 
>  if (g_iommus[iommu->seq_id])
>  return 0;
> 
> All the iommu units are allocated and then initialized in the same
> synchronized path. There is no need to check a duplicate initialization.
> 
> I would like to remove this check if no objection.
> 

This matches my impression.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 4/4] vfio: Require that devices support DMA cache coherence

2022-06-30 Thread Tian, Kevin

> From: Alexey Kardashevskiy 
> Sent: Friday, July 1, 2022 12:58 PM
> 
> On 4/8/22 01:23, Jason Gunthorpe via iommu wrote:
> > IOMMU_CACHE means that normal DMAs do not require any additional
> coherency
> > mechanism and is the basic uAPI that VFIO exposes to userspace. For
> > instance VFIO applications like DPDK will not work if additional coherency
> > operations are required.
> >
> > Therefore check IOMMU_CAP_CACHE_COHERENCY like vdpa & usnic do
> before
> > allowing an IOMMU backed VFIO device to be created.
> 
> 
> This just broke VFIO on POWER which does not use iommu_ops.

In this case below check is more reasonable to be put in type1
attach_group(). Do a iommu_group_for_each_dev() to verify
CACHE_COHERENCY similar to what Robin did for INTR_REMAP.

(sorry no access to my build machine now but I suppose Jason
can soon work out a fix once he sees this. 😊)

> 
> 
> >
> > Signed-off-by: Jason Gunthorpe 
> > ---
> >   drivers/vfio/vfio.c | 7 +++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index a4555014bd1e72..9edad767cfdad3 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -815,6 +815,13 @@ static int __vfio_register_dev(struct vfio_device
> *device,
> >
> >   int vfio_register_group_dev(struct vfio_device *device)
> >   {
> > +   /*
> > +* VFIO always sets IOMMU_CACHE because we offer no way for
> userspace to
> > +* restore cache coherency.
> > +*/
> > +   if (!iommu_capable(device->dev->bus,
> IOMMU_CAP_CACHE_COHERENCY))
> > +   return -EINVAL;
> > +
> > return __vfio_register_dev(device,
> > vfio_group_find_or_alloc(device->dev));
> >   }
> 
> --
> Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-30 Thread Tian, Kevin

> From: Robin Murphy 
> Sent: Thursday, June 30, 2022 4:22 PM
> 
> On 2022-06-29 20:47, Nicolin Chen wrote:
> > On Fri, Jun 24, 2022 at 03:19:43PM -0300, Jason Gunthorpe wrote:
> >> On Fri, Jun 24, 2022 at 06:35:49PM +0800, Yong Wu wrote:
> >>
> > It's not used in VFIO context. "return 0" just satisfy the iommu
> > framework to go ahead. and yes, here we only allow the shared
> > "mapping-domain" (All the devices share a domain created
> > internally).
> >>
> >> What part of the iommu framework is trying to attach a domain and
> >> wants to see success when the domain was not actually attached ?
> >>
>  What prevent this driver from being used in VFIO context?
> >>>
> >>> Nothing prevent this. Just I didn't test.
> >>
> >> This is why it is wrong to return success here.
> >
> > Hi Yong, would you or someone you know be able to confirm whether
> > this "return 0" is still a must or not?
> 
>  From memory, it is unfortunately required, due to this driver being in
> the rare position of having to support multiple devices in a single
> address space on 32-bit ARM. Since the old ARM DMA code doesn't
> understand groups, the driver sets up its own canonical
> dma_iommu_mapping to act like a default domain, but then has to politely
> say "yeah OK" to arm_setup_iommu_dma_ops() for each device so that they
> do all end up with the right DMA ops rather than dying in screaming
> failure (the ARM code's per-device mappings then get leaked, but we
> can't really do any better).
> 
> The whole mess disappears in the proper default domain conversion, but
> in the meantime, it's still safe to assume that nobody's doing VFIO with
> embedded display/video codec/etc. blocks that don't even have reset drivers.
> 

Probably above is worth a comment in mtk code so we don't need
always dig it out from memory when similar question arises in the
the future. 😊
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v1 6/6] iommu/vt-d: Make DMAR_UNITS_SUPPORTED default 1024

2022-06-30 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Saturday, June 25, 2022 8:52 PM
> 
> If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set
> to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
> allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
> remapping doesn't support X2APIC mode x2apic disabled"; and the system
> fails to boot properly.
> 
> To support up to 64 sockets with 10 DMAR units each (640), make the
> value of DMAR_UNITS_SUPPORTED default 1024.
> 
> Signed-off-by: Steve Wahl
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v1 5/6] iommu/vt-d: Remove global g_iommus array

2022-06-30 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Saturday, June 25, 2022 8:52 PM
> 
> The g_iommus is not used anywhere. Remove it to avoid dead code.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v1 4/6] iommu/vt-d: Add VTD_FLAG_IOMMU_PROBED flag

2022-06-30 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Saturday, June 25, 2022 8:52 PM
> 
> In the IOMMU hot-add path, there's a need to check whether an IOMMU
> has been probed. Instead of checking the IOMMU pointer in the global
> list, it's better to allocate a flag bit in iommu->flags for this
> purpose.

Sorry I didn't get the point of original check. This is the hotplug path
hence the caller of this function should already figure out it's a new
iommu before reaching this point?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v1 3/6] iommu/vt-d: Refactor iommu information of each domain

2022-06-30 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Saturday, June 25, 2022 8:52 PM
> 
> +struct iommu_domain_info {
> + struct intel_iommu *iommu;
> + unsigned int refcnt;
> + u16 did;
> +};
> +
>  struct dmar_domain {
>   int nid;/* node id */
> -
> - unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED];
> - /* Refcount of devices per iommu */
> -
> -
> - u16 iommu_did[DMAR_UNITS_SUPPORTED];
> - /* Domain ids per IOMMU. Use u16
> since
> -  * domain ids are 16 bit wide
> according
> -  * to VT-d spec, section 9.3 */

It'd make sense to keep the comments when moving above fields.

> + spin_lock(&iommu->lock);
> + curr = xa_load(&domain->iommu_array, iommu->seq_id);
> + if (curr) {
> + curr->refcnt++;
> + kfree(info);
> + goto success;

Not a fan of adding a label for success. Just putting two lines (unlock+
return) here is clearer.

> + ret = xa_err(xa_store(&domain->iommu_array, iommu->seq_id,
> +   info, GFP_ATOMIC));

check xa_err in separate line.

> 
>  static void domain_detach_iommu(struct dmar_domain *domain,
>   struct intel_iommu *iommu)
>  {
> - int num;
> + struct iommu_domain_info *info;
> 
>   spin_lock(&iommu->lock);
> - domain->iommu_refcnt[iommu->seq_id] -= 1;
> - if (domain->iommu_refcnt[iommu->seq_id] == 0) {
> - num = domain->iommu_did[iommu->seq_id];
> - clear_bit(num, iommu->domain_ids);
> + info = xa_load(&domain->iommu_array, iommu->seq_id);
> + if (--info->refcnt == 0) {
> + clear_bit(info->did, iommu->domain_ids);
> + xa_erase(&domain->iommu_array, iommu->seq_id);
>   domain_update_iommu_cap(domain);
> - domain->iommu_did[iommu->seq_id] = 0;
> + kfree(info);

domain->nid may still point to the node of the removed iommu.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v1 2/6] iommu/vt-d: Use IDA interface to manage iommu sequence id

2022-06-30 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Saturday, June 25, 2022 8:52 PM
> 
> @@ -1062,11 +1040,14 @@ static int alloc_iommu(struct dmar_drhd_unit
> *drhd)
>   if (!iommu)
>   return -ENOMEM;
> 
> - if (dmar_alloc_seq_id(iommu) < 0) {
> + iommu->seq_id = ida_alloc_range(&dmar_seq_ids, 0,
> + DMAR_UNITS_SUPPORTED,

should be "DMAR_UNITS_SUPPORTED - 1"

> GFP_KERNEL);
> + if (iommu->seq_id < 0) {
>   pr_err("Failed to allocate seq_id\n");
>   err = -ENOSPC;
>   goto error;
>   }

ida_alloc_range() returns error code already. No need to change it.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v1 1/6] iommu/vt-d: Remove unused domain_get_iommu()

2022-06-30 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Saturday, June 25, 2022 8:52 PM
> 
> It is not used anywhere. Remove it to avoid dead code.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 04/11] iommu: Add sva iommu_domain support

2022-06-28 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Tuesday, June 28, 2022 7:34 PM
> 
> On 2022/6/28 16:50, Tian, Kevin wrote:
> >> From: Baolu Lu
> >> Sent: Tuesday, June 28, 2022 1:41 PM
> >>>>struct iommu_domain {
> >>>>  unsigned type;
> >>>>  const struct iommu_domain_ops *ops;
> >>>>  unsigned long pgsize_bitmap;/* Bitmap of page sizes in use 
> >>>> */
> >>>> -iommu_fault_handler_t handler;
> >>>> -void *handler_token;
> >>>>  struct iommu_domain_geometry geometry;
> >>>>  struct iommu_dma_cookie *iova_cookie;
> >>>> +union {
> >>>> +struct {/* IOMMU_DOMAIN_DMA */
> >>>> +iommu_fault_handler_t handler;
> >>>> +void *handler_token;
> >>>> +};
> >>> why is it DMA domain specific? What about unmanaged
> >>> domain? Unrecoverable fault can happen on any type
> >>> including SVA. Hence I think above should be domain type
> >>> agnostic.
> >> The report_iommu_fault() should be replaced by the new
> >> iommu_report_device_fault(). Jean has already started this work.
> >>
> >> https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/
> >>
> >> Currently this is only for DMA domains, hence Robin suggested to make it
> >> exclude with the SVA domain things.
> >>
> >> https://lore.kernel.org/linux-iommu/f3170016-4d7f-e78e-db48-
> >> 68305f683...@arm.com/
> > Then it's worthy a comment that those two fields are for
> > some legacy fault reporting stuff and DMA type only.
> 
> The iommu_fault and SVA fields are exclusive. The former is used for
> unrecoverable DMA remapping faults, while the latter is only interested
> in the recoverable page faults.
> 
> I will update the commit message with above explanation. Does this work
> for you?
> 

Not exactly. Your earlier explanation is about old vs. new API thus
leaving the existing fault handler with current only user is fine.

but this is not related to unrecoverable vs. recoverable. As I said
unrecoverable could happen on all domain types. Tying it to
DMA-only doesn't make sense and I think in the end the new
iommu_report_device_fault() will need support both. Is it not the
case?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 10/11] iommu: Per-domain I/O page fault handling

2022-06-28 Thread Tian, Kevin

> From: Jean-Philippe Brucker 
> Sent: Tuesday, June 28, 2022 5:44 PM
> 
> On Tue, Jun 28, 2022 at 08:39:36AM +, Tian, Kevin wrote:
> > > From: Lu Baolu 
> > > Sent: Tuesday, June 21, 2022 10:44 PM
> > >
> > > Tweak the I/O page fault handling framework to route the page faults to
> > > the domain and call the page fault handler retrieved from the domain.
> > > This makes the I/O page fault handling framework possible to serve more
> > > usage scenarios as long as they have an IOMMU domain and install a
> page
> > > fault handler in it. Some unused functions are also removed to avoid
> > > dead code.
> > >
> > > The iommu_get_domain_for_dev_pasid() which retrieves attached
> domain
> > > for a {device, PASID} pair is used. It will be used by the page fault
> > > handling framework which knows {device, PASID} reported from the
> iommu
> > > driver. We have a guarantee that the SVA domain doesn't go away during
> > > IOPF handling, because unbind() waits for pending faults with
> > > iopf_queue_flush_dev() before freeing the domain. Hence, there's no
> need
> > > to synchronize life cycle of the iommu domains between the unbind() and
> > > the interrupt threads.
> >
> > I found iopf_queue_flush_dev() is only called in intel-iommu driver. Did
> > I overlook anything?
> 
> The SMMU driver will need it as well when we upstream PRI support.
> Currently it only supports stall, and that requires the device driver to
> flush all DMA including stalled transactions *before* calling unbind(), so
> ne need for iopf_queue_flush_dev() in this case.
>

then it makes sense. Probably Baolu can add this information in the
commit msg so others with similar question can quickly get the
point here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 07/11] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

2022-06-28 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Tuesday, June 28, 2022 1:54 PM
> >> +u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> >> +{
> >> +  struct iommu_domain *domain =
> >> +  container_of(handle, struct iommu_domain, bond);
> >> +
> >> +  return domain->mm->pasid;
> >> +}
> >> +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> >
> > Looks this is only used by unbind_device. Just open code it.
> 
> It's part of current IOMMU/SVA interfaces for the device drivers, and
> has been used in various drivers.
> 
> $ git grep iommu_sva_get_pasid
> drivers/dma/idxd/cdev.c:pasid = iommu_sva_get_pasid(sva);
> drivers/iommu/iommu-sva-lib.c:  ioasid_t pasid =
> iommu_sva_get_pasid(handle);
> drivers/iommu/iommu-sva-lib.c:u32 iommu_sva_get_pasid(struct
> iommu_sva
> *handle)
> drivers/iommu/iommu-sva-
> lib.c:EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> drivers/misc/uacce/uacce.c: pasid = iommu_sva_get_pasid(handle);
> include/linux/iommu.h:u32 iommu_sva_get_pasid(struct iommu_sva
> *handle);
> include/linux/iommu.h:static inline u32 iommu_sva_get_pasid(struct
> iommu_sva *handle)
> 
> Or, I missed anything?
> 

Forget it. I thought it's a new function introduced in this series. :/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 04/11] iommu: Add sva iommu_domain support

2022-06-28 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Tuesday, June 28, 2022 1:41 PM
> >
> >>   struct iommu_domain {
> >>unsigned type;
> >>const struct iommu_domain_ops *ops;
> >>unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */
> >> -  iommu_fault_handler_t handler;
> >> -  void *handler_token;
> >>struct iommu_domain_geometry geometry;
> >>struct iommu_dma_cookie *iova_cookie;
> >> +  union {
> >> +  struct {/* IOMMU_DOMAIN_DMA */
> >> +  iommu_fault_handler_t handler;
> >> +  void *handler_token;
> >> +  };
> >
> > why is it DMA domain specific? What about unmanaged
> > domain? Unrecoverable fault can happen on any type
> > including SVA. Hence I think above should be domain type
> > agnostic.
> 
> The report_iommu_fault() should be replaced by the new
> iommu_report_device_fault(). Jean has already started this work.
> 
> https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/
> 
> Currently this is only for DMA domains, hence Robin suggested to make it
> exclude with the SVA domain things.
> 
> https://lore.kernel.org/linux-iommu/f3170016-4d7f-e78e-db48-
> 68305f683...@arm.com/

Then it's worthy a comment that those two fields are for
some legacy fault reporting stuff and DMA type only.

> >
> >> +
> >> +  mutex_lock(&group->mutex);
> >> +  curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
> >> GFP_KERNEL);
> >> +  if (curr)
> >> +  goto out_unlock;
> >
> > Need check xa_is_err(old).
> 
> Either
> 
> (1) old entry is a valid pointer, or

return -EBUSY in this case

> (2) xa_is_err(curr)

return xa_err(cur)

> 
> are failure cases. Hence, "curr == NULL" is the only check we need. Did
> I miss anything?
> 

But now you always return -EBUSY for all kinds of xa errors.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 11/11] iommu: Rename iommu-sva-lib.{c,h}

2022-06-28 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
> for SVA implementation in iommu core.
> 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jean-Philippe Brucker 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 10/11] iommu: Per-domain I/O page fault handling

2022-06-28 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> Tweak the I/O page fault handling framework to route the page faults to
> the domain and call the page fault handler retrieved from the domain.
> This makes the I/O page fault handling framework possible to serve more
> usage scenarios as long as they have an IOMMU domain and install a page
> fault handler in it. Some unused functions are also removed to avoid
> dead code.
> 
> The iommu_get_domain_for_dev_pasid() which retrieves attached domain
> for a {device, PASID} pair is used. It will be used by the page fault
> handling framework which knows {device, PASID} reported from the iommu
> driver. We have a guarantee that the SVA domain doesn't go away during
> IOPF handling, because unbind() waits for pending faults with
> iopf_queue_flush_dev() before freeing the domain. Hence, there's no need
> to synchronize life cycle of the iommu domains between the unbind() and
> the interrupt threads.

I found iopf_queue_flush_dev() is only called in intel-iommu driver. Did
I overlook anything?

>  static void iopf_handle_group(struct work_struct *work)
>  {
>   struct iopf_group *group;
> + struct iommu_domain *domain;
>   struct iopf_fault *iopf, *next;
>   enum iommu_page_response_code status =
> IOMMU_PAGE_RESP_SUCCESS;
> 
>   group = container_of(work, struct iopf_group, work);
> + domain = iommu_get_domain_for_dev_pasid(group->dev,
> + group->last_fault.fault.prm.pasid);
> + if (!domain || !domain->iopf_handler)
> + status = IOMMU_PAGE_RESP_INVALID;

Miss a comment on why no refcnt is required on domain as explained
in the commit msg.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 09/11] iommu: Prepare IOMMU domain for IOPF

2022-06-28 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> +/*
> + * I/O page fault handler for SVA
> + */
> +enum iommu_page_response_code
> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
> +{
> + vm_fault_t ret;
> + struct mm_struct *mm;
> + struct vm_area_struct *vma;
> + unsigned int access_flags = 0;
> + struct iommu_domain *domain = data;
> + unsigned int fault_flags = FAULT_FLAG_REMOTE;
> + struct iommu_fault_page_request *prm = &fault->prm;
> + enum iommu_page_response_code status =
> IOMMU_PAGE_RESP_INVALID;
> +
> + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
> + return status;
> +
> + mm = domain->mm;

What about directly passing domain->mm in as the fault data?

The entire logic here is only about mm instead of domain.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 07/11] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

2022-06-27 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
> mm_struct *mm)
> +{
> + struct iommu_domain *domain;
> + ioasid_t max_pasids;
> + int ret = -EINVAL;
> +
> + /* Allocate mm->pasid if necessary. */

this comment is for iommu_sva_alloc_pasid()

> + max_pasids = dev->iommu->max_pasids;
> + if (!max_pasids)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
> + if (ret)
> + return ERR_PTR(ret);
> +


...
> +void iommu_sva_unbind_device(struct iommu_sva *handle)
> +{
> + struct device *dev = handle->dev;
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> + ioasid_t pasid = iommu_sva_get_pasid(handle);
> +
> + mutex_lock(&iommu_sva_lock);
> + if (refcount_dec_and_test(&domain->bond.users)) {
> + iommu_detach_device_pasid(domain, dev, pasid);
> + iommu_domain_free(domain);
> + }
> + mutex_unlock(&iommu_sva_lock);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> +
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> +
> + return domain->mm->pasid;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);

Looks this is only used by unbind_device. Just open code it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 05/11] iommu/vt-d: Add SVA domain support

2022-06-27 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> Add support for SVA domain allocation and provide an SVA-specific
> iommu_domain_ops.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 04/11] iommu: Add sva iommu_domain support

2022-06-27 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> The sva iommu_domain represents a hardware pagetable that the IOMMU
> hardware could use for SVA translation. This adds some infrastructure
> to support SVA domain in the iommu common layer. It includes:
> 
> - Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA
> domain
>   type. The IOMMU drivers that support SVA should provide the sva
>   domain specific iommu_domain_ops.
> - Add a helper to allocate an SVA domain. The iommu_domain_free()
>   is still used to free an SVA domain.
> - Add helpers to attach an SVA domain to a device and the reverse
>   operation.
> 
> Some buses, like PCI, route packets without considering the PASID value.
> Thus a DMA target address with PASID might be treated as P2P if the
> address falls into the MMIO BAR of other devices in the group. To make
> things simple, the attach/detach interfaces only apply to devices
> belonging to the singleton groups, and the singleton is immutable in
> fabric i.e. not affected by hotplug.
> 
> The iommu_attach/detach_device_pasid() can be used for other purposes,
> such as kernel DMA with pasid, mediation device, etc.

I'd split this into two patches. One for adding iommu_attach/
detach_device_pasid() and set/block_dev_pasid ops, and the
other for adding SVA.

>  struct iommu_domain {
>   unsigned type;
>   const struct iommu_domain_ops *ops;
>   unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */
> - iommu_fault_handler_t handler;
> - void *handler_token;
>   struct iommu_domain_geometry geometry;
>   struct iommu_dma_cookie *iova_cookie;
> + union {
> + struct {/* IOMMU_DOMAIN_DMA */
> + iommu_fault_handler_t handler;
> + void *handler_token;
> + };

why is it DMA domain specific? What about unmanaged 
domain? Unrecoverable fault can happen on any type
including SVA. Hence I think above should be domain type
agnostic.

> + struct {/* IOMMU_DOMAIN_SVA */
> + struct mm_struct *mm;
> + };
> + };
>  };
> 



> +
> +struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
> + struct mm_struct *mm)
> +{
> + const struct iommu_ops *ops = dev_iommu_ops(dev);
> + struct iommu_domain *domain;
> +
> + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
> + if (!domain)
> + return NULL;
> +
> + domain->type = IOMMU_DOMAIN_SVA;

It's a bit weird that the type has been specified when calling
ops->domain_alloc while it still leaves to the caller to set the
type. But this is not caused by this series. could be cleaned
up separately.

> +
> + mutex_lock(&group->mutex);
> + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
> GFP_KERNEL);
> + if (curr)
> + goto out_unlock;

Need check xa_is_err(old).
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 02/11] iommu: Add max_pasids field in struct dev_iommu

2022-06-27 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> Use this field to save the number of PASIDs that a device is able to
> consume. It is a generic attribute of a device and lifting it into the
> per-device dev_iommu struct could help to avoid the boilerplate code
> in various IOMMU drivers.
> 
> Signed-off-by: Lu Baolu 
> ---
>  include/linux/iommu.h |  2 ++
>  drivers/iommu/iommu.c | 20 
>  2 files changed, 22 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 03fbb1b71536..d50afb2c9a09 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -364,6 +364,7 @@ struct iommu_fault_param {
>   * @fwspec:   IOMMU fwspec data
>   * @iommu_dev:IOMMU device this device is linked to
>   * @priv: IOMMU Driver private data
> + * @max_pasids:  number of PASIDs device can consume

... PASIDs *this* device can consume

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 01/11] iommu: Add max_pasids field in struct iommu_device

2022-06-27 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> Use this field to keep the number of supported PASIDs that an IOMMU
> hardware is able to support. This is a generic attribute of an IOMMU
> and lifting it into the per-IOMMU device structure makes it possible
> to allocate a PASID for device without calls into the IOMMU drivers.
> Any iommu driver that supports PASID related features should set this
> field before enabling them on the devices.
> 
> In the Intel IOMMU driver, intel_iommu_sm is moved to
> CONFIG_INTEL_IOMMU
> enclave so that the pasid_supported() helper could be used in dmar.c
> without compilation errors.
> 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jean-Philippe Brucker 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 2/2] vfio: Use device_iommu_capable()

2022-06-27 Thread Tian, Kevin

> From: Robin Murphy
> Sent: Saturday, June 25, 2022 2:00 AM
> 
> Use the new interface to check the capabilities for our device
> specifically.
> 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Robin Murphy 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 1/2] vfio/type1: Simplify bus_type determination

2022-06-27 Thread Tian, Kevin

> From: Robin Murphy
> Sent: Saturday, June 25, 2022 1:52 AM
> 
> Since IOMMU groups are mandatory for drivers to support, it stands to
> reason that any device which has been successfully added to a group
> must be on a bus supported by that IOMMU driver, and therefore a domain
> viable for any device in the group must be viable for all devices in
> the group. This already has to be the case for the IOMMU API's internal
> default domain, for instance. Thus even if the group contains devices on
> different buses, that can only mean that the IOMMU driver actually
> supports such an odd topology, and so without loss of generality we can
> expect the bus type of any device in a group to be suitable for IOMMU
> API calls.
> 
> Furthermore, scrutiny reveals a lack of protection for the bus being
> removed while vfio_iommu_type1_attach_group() is using it; the reference
> that VFIO holds on the iommu_group ensures that data remains valid, but
> does not prevent the group's membership changing underfoot.
> 
> We can address both concerns by recycling vfio_bus_type() into some
> superficially similar logic to indirect the IOMMU API calls themselves.
> Each call is thus protected from races by the IOMMU group's own locking,
> and we no longer need to hold group-derived pointers beyond that scope.
> It also gives us an easy path for the IOMMU API's migration of bus-based
> interfaces to device-based, of which we can already take the first step
> with device_iommu_capable(). As with domains, any capability must in
> practice be consistent for devices in a given group - and after all it's
> still the same capability which was expected to be consistent across an
> entire bus! - so there's no need for any complicated validation.
> 
> Signed-off-by: Robin Murphy 
> ---
> 
> v3: Complete rewrite yet again, and finally it doesn't feel like we're
> stretching any abstraction boundaries the wrong way, and the diffstat
> looks right too. I did think about embedding IOMMU_CAP_INTR_REMAP
> directly in the callback, but decided I like the consistency of minimal
> generic wrappers. And yes, if the capability isn't supported then it
> does end up getting tested for the whole group, but meh, it's harmless.
> 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 5/5] vfio/iommu_type1: Simplify group attachment

2022-06-26 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Friday, June 24, 2022 4:00 AM
> 
> Un-inline the domain specific logic from the attach/detach_group ops into
> two paired functions vfio_iommu_alloc_attach_domain() and
> vfio_iommu_detach_destroy_domain() that strictly deal with creating and
> destroying struct vfio_domains.
> 
> Add the logic to check for EMEDIUMTYPE return code of
> iommu_attach_group()
> and avoid the extra domain allocations and attach/detach sequences of the
> old code. This allows properly detecting an actual attach error, like
> -ENOMEM, vs treating all attach errors as an incompatible domain.
> 
> Co-developed-by: Jason Gunthorpe 
> Signed-off-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 1/1] iommu/vt-d: Fix RID2PASID setup/teardown failure

2022-06-23 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Thursday, June 23, 2022 2:57 PM
> 
> The IOMMU driver shares the pasid table for PCI alias devices. When the
> RID2PASID entry of the shared pasid table has been filled by the first
> device, the subsequent device will encounter the "DMAR: Setup RID2PASID
> failed" failure as the pasid entry has already been marked as present.
> As the result, the IOMMU probing process will be aborted.
> 
> On the contrary, when any alias device is hot-removed from the system,
> for example, by writing to /sys/bus/pci/devices/.../remove, the shared
> RID2PASID will be cleared without any notifications to other devices.
> As the result, any DMAs from those rest devices are blocked.
> 
> Sharing pasid table among PCI alias devices could save two memory pages
> for devices underneath the PCIe-to-PCI bridges. Anyway, considering that
> those devices are rare on modern platforms that support VT-d in scalable
> mode and the saved memory is negligible, it's reasonable to remove this
> part of immature code to make the driver feasible and stable.
> 
> Fixes: ef848b7e5a6a0 ("iommu/vt-d: Setup pasid entry for RID2PASID
> support")
> Reported-by: Chenyi Qiang 
> Reported-by: Ethan Zhao 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Lu Baolu 

Looks good.

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-23 Thread Tian, Kevin

> From: Yong Wu
> Sent: Friday, June 24, 2022 1:39 PM
> 
> On Thu, 2022-06-23 at 19:44 -0700, Nicolin Chen wrote:
> > On Fri, Jun 24, 2022 at 09:35:49AM +0800, Baolu Lu wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On 2022/6/24 04:00, Nicolin Chen wrote:
> > > > diff --git a/drivers/iommu/mtk_iommu_v1.c
> > > > b/drivers/iommu/mtk_iommu_v1.c
> > > > index e1cb51b9866c..5386d889429d 100644
> > > > --- a/drivers/iommu/mtk_iommu_v1.c
> > > > +++ b/drivers/iommu/mtk_iommu_v1.c
> > > > @@ -304,7 +304,7 @@ static int mtk_iommu_v1_attach_device(struct
> > > > iommu_domain *domain, struct device
> > > >   /* Only allow the domain created internally. */
> > > >   mtk_mapping = data->mapping;
> > > >   if (mtk_mapping->domain != domain)
> > > > - return 0;
> > > > + return -EMEDIUMTYPE;
> > > >
> > > >   if (!data->m4u_dom) {
> > > >   data->m4u_dom = dom;
> > >
> > > This change looks odd. It turns the return value from success to
> > > failure. Is it a bug? If so, it should go through a separated fix
> > > patch.
> 
> Thanks for the review:)
> 
> >
> > Makes sense.
> >
> > I read the commit log of the original change:
> >
> https://lore.kernel.org/r/1589530123-30240-1-git-send-email-
> yong...@mediatek.com
> >
> > It doesn't seem to allow devices to get attached to different
> > domains other than the shared mapping->domain, created in the
> > in the mtk_iommu_probe_device(). So it looks like returning 0
> > is intentional. Though I am still very confused by this return
> > value here, I doubt it has ever been used in a VFIO context.
> 
> It's not used in VFIO context. "return 0" just satisfy the iommu
> framework to go ahead. and yes, here we only allow the shared "mapping-
> >domain" (All the devices share a domain created internally).
> 
> thus I think we should still keep "return 0" here.
> 

What prevent this driver from being used in VFIO context?

and why would we want to go ahead when an obvious error occurs
i.e. when a device is attached to an unexpected domain?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Tian, Kevin

> From: Alex Williamson 
> Sent: Thursday, June 23, 2022 6:17 AM
> 
> >
> > ret = -EIO;
> > -   domain->domain = iommu_domain_alloc(bus);
> > +   domain->domain = iommu_domain_alloc(iommu_api_dev->dev-
> >bus);
> 
> It makes sense to move away from a bus centric interface to iommu ops
> and I can see that having a device interface when we have device level
> address-ability within a group makes sense, but does it make sense to
> only have that device level interface?  For example, if an iommu_group
> is going to remain an aspect of the iommu subsystem, shouldn't we be
> able to allocate a domain and test capabilities based on the group and
> the iommu driver should have enough embedded information reachable
> from
> the struct iommu_group to do those things?  This "perform group level
> operations based on an arbitrary device in the group" is pretty klunky.
> Thanks,
> 

This sounds a right thing to do.

btw another alternative which I'm thinking of is whether vfio_group
can record the bus info when the first device is added to it in
__vfio_register_dev(). Then we don't need a group interface from
iommu to test if vfio is the only user having such requirement.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-22 Thread Tian, Kevin

> From: Robin Murphy 
> Sent: Wednesday, June 22, 2022 3:55 PM
> 
> On 2022-06-16 23:23, Nicolin Chen wrote:
> > On Thu, Jun 16, 2022 at 06:40:14AM +, Tian, Kevin wrote:
> >
> >>> The domain->ops validation was added, as a precaution, for mixed-
> driver
> >>> systems. However, at this moment only one iommu driver is possible. So
> >>> remove it.
> >>
> >> It's true on a physical platform. But I'm not sure whether a virtual
> platform
> >> is allowed to include multiple e.g. one virtio-iommu alongside a virtual 
> >> VT-
> d
> >> or a virtual smmu. It might be clearer to claim that (as Robin pointed out)
> >> there is plenty more significant problems than this to solve instead of
> simply
> >> saying that only one iommu driver is possible if we don't have explicit
> code
> >> to reject such configuration. 😊
> >
> > Will edit this part. Thanks!
> 
> Oh, physical platforms with mixed IOMMUs definitely exist already. The
> main point is that while bus_set_iommu still exists, the core code
> effectively *does* prevent multiple drivers from registering - even in
> emulated cases like the example above, virtio-iommu and VT-d would both
> try to bus_set_iommu(&pci_bus_type), and one of them will lose. The
> aspect which might warrant clarification is that there's no combination
> of supported drivers which claim non-overlapping buses *and* could
> appear in the same system - even if you tried to contrive something by
> emulating, say, VT-d (PCI) alongside rockchip-iommu (platform), you
> could still only describe one or the other due to ACPI vs. Devicetree.
> 

This explanation is much clearer! thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 1/1] iommu/vt-d: Fix RID2PASID setup failure

2022-06-21 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Wednesday, June 22, 2022 12:41 PM
> 
> The IOMMU driver shares the pasid table for PCI alias devices. When the
> RID2PASID entry of the shared pasid table has been filled by the first
> device, the subsequent devices will encounter the "DMAR: Setup RID2PASID
> failed" failure as the pasid entry has already been marked as present. As
> the result, the IOMMU probing process will be aborted.
> 
> This fixes it by skipping RID2PASID setting if the pasid entry has been
> populated. This works because the IOMMU core ensures that only the same
> IOMMU domain can be attached to all PCI alias devices at the same time.
> Therefore the subsequent devices just try to setup the RID2PASID entry
> with the same domain, which is negligible. This also adds domain validity
> checks for more confidence anyway.
> 
> Fixes: ef848b7e5a6a0 ("iommu/vt-d: Setup pasid entry for RID2PASID
> support")
> Reported-by: Chenyi Qiang 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 1/1] iommu/vt-d: Fix RID2PASID setup failure

2022-06-21 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Wednesday, June 22, 2022 11:28 AM
> 
> On 2022/6/22 11:06, Tian, Kevin wrote:
> >> From: Baolu Lu
> >> Sent: Tuesday, June 21, 2022 5:04 PM
> >>
> >> On 2022/6/21 13:48, Tian, Kevin wrote:
> >>>> From: Baolu Lu
> >>>> Sent: Tuesday, June 21, 2022 12:28 PM
> >>>>
> >>>> On 2022/6/21 11:46, Tian, Kevin wrote:
> >>>>>> From: Baolu Lu
> >>>>>> Sent: Tuesday, June 21, 2022 11:39 AM
> >>>>>>
> >>>>>> On 2022/6/21 10:54, Tian, Kevin wrote:
> >>>>>>>> From: Lu Baolu
> >>>>>>>> Sent: Monday, June 20, 2022 4:17 PM
> >>>>>>>> @@ -2564,7 +2564,7 @@ static int domain_add_dev_info(struct
> >>>>>>>> dmar_domain *domain, struct device *dev)
> >>>>>>>>  ret = 
> >>>>>>>> intel_pasid_setup_second_level(iommu,
> >>>>>>>> domain,
> >>>>>>>>  dev, PASID_RID2PASID);
> >>>>>>>>  spin_unlock_irqrestore(&iommu->lock, flags);
> >>>>>>>> -if (ret) {
> >>>>>>>> +if (ret && ret != -EBUSY) {
> >>>>>>>>  dev_err(dev, "Setup RID2PASID 
> >>>>>>>> failed\n");
> >>>>>>>>  dmar_remove_one_dev_info(dev);
> >>>>>>>>  return ret;
> >>>>>>>> --
> >>>>>>>> 2.25.1
> >>>>>>> It's cleaner to avoid this error at the first place, i.e. only do the
> >>>>>>> setup when the first device is attached to the pasid table.
> >>>>>> The logic that identifies the first device might introduce additional
> >>>>>> unnecessary complexity. Devices that share a pasid table are rare. I
> >>>>>> even prefer to give up sharing tables so that the code can be
> >>>>>> simpler.:-)
> >>>>>>
> >>>>> It's not that complex if you simply move device_attach_pasid_table()
> >>>>> out of intel_pasid_alloc_table(). Then do the setup if
> >>>>> list_empty(&pasid_table->dev) and then attach device to the
> >>>>> pasid table in domain_add_dev_info().
> >>>> The pasid table is part of the device, hence a better place to
> >>>> allocate/free the pasid table is in the device probe/release paths.
> >>>> Things will become more complicated if we change relationship
> between
> >>>> device and it's pasid table when attaching/detaching a domain. That's
> >>>> the reason why I thought it was additional complexity.
> >>>>
> >>> If you do want to follow current route it’s still cleaner to check
> >>> whether the pasid entry has pointed to the domain in the individual
> >>> setup function instead of blindly returning -EBUSY and then ignoring
> >>> it even if a real busy condition occurs. The setup functions can
> >>> just return zero for this benign alias case.
> >> Kevin, how do you like this one?
> >>
> >> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> >> index cb4c1d0cf25c..ecffd0129b2b 100644
> >> --- a/drivers/iommu/intel/pasid.c
> >> +++ b/drivers/iommu/intel/pasid.c
> >> @@ -575,6 +575,16 @@ static inline int pasid_enable_wpe(struct
> >> pasid_entry *pte)
> >>return 0;
> >>};
> >>
> >> +/*
> >> + * Return true if @pasid is RID2PASID and the domain @did has already
> >> + * been setup to the @pte. Otherwise, return false.
> >> + */
> >> +static inline bool
> >> +rid2pasid_domain_valid(struct pasid_entry *pte, u32 pasid, u16 did)
> >> +{
> >> +  return pasid == PASID_RID2PASID && pasid_get_domain_id(pte) ==
> >> did;
> >> +}
> > better this is not restricted to RID2PASID only, e.g.
> pasid_pte_match_domain()
> > and then read pasid from the pte to compare with the pasid argument.
> >
> 
> The pasid value is not encoded in the pasid table entry. This validity
> check is only for RID2PASID as alias devices share the single RID2PASID
> entry. For other cases, we should always return -EBUSY as what the code
> is doing now.
> 

You are right.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 1/1] iommu/vt-d: Fix RID2PASID setup failure

2022-06-21 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Tuesday, June 21, 2022 5:04 PM
> 
> On 2022/6/21 13:48, Tian, Kevin wrote:
> >> From: Baolu Lu 
> >> Sent: Tuesday, June 21, 2022 12:28 PM
> >>
> >> On 2022/6/21 11:46, Tian, Kevin wrote:
> >>>> From: Baolu Lu 
> >>>> Sent: Tuesday, June 21, 2022 11:39 AM
> >>>>
> >>>> On 2022/6/21 10:54, Tian, Kevin wrote:
> >>>>>> From: Lu Baolu 
> >>>>>> Sent: Monday, June 20, 2022 4:17 PM
> >>>>>> @@ -2564,7 +2564,7 @@ static int domain_add_dev_info(struct
> >>>>>> dmar_domain *domain, struct device *dev)
> >>>>>>ret = intel_pasid_setup_second_level(iommu,
> >>>>>> domain,
> >>>>>>dev, PASID_RID2PASID);
> >>>>>>spin_unlock_irqrestore(&iommu->lock, flags);
> >>>>>> -  if (ret) {
> >>>>>> +  if (ret && ret != -EBUSY) {
> >>>>>>dev_err(dev, "Setup RID2PASID failed\n");
> >>>>>>dmar_remove_one_dev_info(dev);
> >>>>>>return ret;
> >>>>>> --
> >>>>>> 2.25.1
> >>>>>
> >>>>> It's cleaner to avoid this error at the first place, i.e. only do the
> >>>>> setup when the first device is attached to the pasid table.
> >>>>
> >>>> The logic that identifies the first device might introduce additional
> >>>> unnecessary complexity. Devices that share a pasid table are rare. I
> >>>> even prefer to give up sharing tables so that the code can be
> >>>> simpler.:-)
> >>>>
> >>>
> >>> It's not that complex if you simply move device_attach_pasid_table()
> >>> out of intel_pasid_alloc_table(). Then do the setup if
> >>> list_empty(&pasid_table->dev) and then attach device to the
> >>> pasid table in domain_add_dev_info().
> >>
> >> The pasid table is part of the device, hence a better place to
> >> allocate/free the pasid table is in the device probe/release paths.
> >> Things will become more complicated if we change relationship between
> >> device and it's pasid table when attaching/detaching a domain. That's
> >> the reason why I thought it was additional complexity.
> >>
> >
> > If you do want to follow current route it’s still cleaner to check
> > whether the pasid entry has pointed to the domain in the individual
> > setup function instead of blindly returning -EBUSY and then ignoring
> > it even if a real busy condition occurs. The setup functions can
> > just return zero for this benign alias case.
> 
> Kevin, how do you like this one?
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index cb4c1d0cf25c..ecffd0129b2b 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -575,6 +575,16 @@ static inline int pasid_enable_wpe(struct
> pasid_entry *pte)
>   return 0;
>   };
> 
> +/*
> + * Return true if @pasid is RID2PASID and the domain @did has already
> + * been setup to the @pte. Otherwise, return false.
> + */
> +static inline bool
> +rid2pasid_domain_valid(struct pasid_entry *pte, u32 pasid, u16 did)
> +{
> + return pasid == PASID_RID2PASID && pasid_get_domain_id(pte) ==
> did;
> +}

better this is not restricted to RID2PASID only, e.g. pasid_pte_match_domain()
and then read pasid from the pte to compare with the pasid argument.

> +
>   /*
>* Set up the scalable mode pasid table entry for first only
>* translation type.
> @@ -595,9 +605,8 @@ int intel_pasid_setup_first_level(struct intel_iommu
> *iommu,
>   if (WARN_ON(!pte))
>   return -EINVAL;
> 
> - /* Caller must ensure PASID entry is not in use. */
>   if (pasid_pte_is_present(pte))
> - return -EBUSY;
> + return rid2pasid_domain_valid(pte, pasid, did) ? 0: -EBUSY;
> 
>   pasid_clear_entry(pte);
> 
> @@ -698,9 +707,8 @@ int intel_pasid_setup_second_level(struct
> intel_iommu *iommu,
>   return -ENODEV;
>   }
> 
> - /* Caller must ensure PASID entry is not in use. */
>   if (pasid_pte_is_present(pte))
> - return -EBUSY;
> + return rid2pasid_domain_valid(pte, pasid, did) ? 0: -EBUSY;
> 
>   pasid_clear_entry(pte);
>   pasid_set_domain_id(pte, did);
> @@ -738,9 +746,8 @@ int intel_pasid_setup_pass_through(struct
> intel_iommu *iommu,
>   return -ENODEV;
>   }
> 
> - /* Caller must ensure PASID entry is not in use. */
>   if (pasid_pte_is_present(pte))
> - return -EBUSY;
> + return rid2pasid_domain_valid(pte, pasid, did) ? 0: -EBUSY;
> 
>   pasid_clear_entry(pte);
>   pasid_set_domain_id(pte, did);
> 
> --
> Best regards,
> baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 1/1] iommu/vt-d: Fix RID2PASID setup failure

2022-06-20 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Tuesday, June 21, 2022 12:28 PM
> 
> On 2022/6/21 11:46, Tian, Kevin wrote:
> >> From: Baolu Lu 
> >> Sent: Tuesday, June 21, 2022 11:39 AM
> >>
> >> On 2022/6/21 10:54, Tian, Kevin wrote:
> >>>> From: Lu Baolu 
> >>>> Sent: Monday, June 20, 2022 4:17 PM
> >>>> @@ -2564,7 +2564,7 @@ static int domain_add_dev_info(struct
> >>>> dmar_domain *domain, struct device *dev)
> >>>>  ret = intel_pasid_setup_second_level(iommu,
> >>>> domain,
> >>>>  dev, PASID_RID2PASID);
> >>>>  spin_unlock_irqrestore(&iommu->lock, flags);
> >>>> -if (ret) {
> >>>> +if (ret && ret != -EBUSY) {
> >>>>  dev_err(dev, "Setup RID2PASID failed\n");
> >>>>  dmar_remove_one_dev_info(dev);
> >>>>  return ret;
> >>>> --
> >>>> 2.25.1
> >>>
> >>> It's cleaner to avoid this error at the first place, i.e. only do the
> >>> setup when the first device is attached to the pasid table.
> >>
> >> The logic that identifies the first device might introduce additional
> >> unnecessary complexity. Devices that share a pasid table are rare. I
> >> even prefer to give up sharing tables so that the code can be
> >> simpler.:-)
> >>
> >
> > It's not that complex if you simply move device_attach_pasid_table()
> > out of intel_pasid_alloc_table(). Then do the setup if
> > list_empty(&pasid_table->dev) and then attach device to the
> > pasid table in domain_add_dev_info().
> 
> The pasid table is part of the device, hence a better place to
> allocate/free the pasid table is in the device probe/release paths.
> Things will become more complicated if we change relationship between
> device and it's pasid table when attaching/detaching a domain. That's
> the reason why I thought it was additional complexity.
> 

If you do want to follow current route it’s still cleaner to check
whether the pasid entry has pointed to the domain in the individual
setup function instead of blindly returning -EBUSY and then ignoring
it even if a real busy condition occurs. The setup functions can
just return zero for this benign alias case.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 1/1] iommu/vt-d: Fix RID2PASID setup failure

2022-06-20 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Tuesday, June 21, 2022 11:39 AM
> 
> On 2022/6/21 10:54, Tian, Kevin wrote:
> >> From: Lu Baolu 
> >> Sent: Monday, June 20, 2022 4:17 PM
> >> @@ -2564,7 +2564,7 @@ static int domain_add_dev_info(struct
> >> dmar_domain *domain, struct device *dev)
> >>ret = intel_pasid_setup_second_level(iommu,
> >> domain,
> >>dev, PASID_RID2PASID);
> >>spin_unlock_irqrestore(&iommu->lock, flags);
> >> -  if (ret) {
> >> +  if (ret && ret != -EBUSY) {
> >>dev_err(dev, "Setup RID2PASID failed\n");
> >>dmar_remove_one_dev_info(dev);
> >>return ret;
> >> --
> >> 2.25.1
> >
> > It's cleaner to avoid this error at the first place, i.e. only do the
> > setup when the first device is attached to the pasid table.
> 
> The logic that identifies the first device might introduce additional
> unnecessary complexity. Devices that share a pasid table are rare. I
> even prefer to give up sharing tables so that the code can be
> simpler.:-)
> 

It's not that complex if you simply move device_attach_pasid_table()
out of intel_pasid_alloc_table(). Then do the setup if
list_empty(&pasid_table->dev) and then attach device to the
pasid table in domain_add_dev_info().
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 1/1] iommu/vt-d: Fix RID2PASID setup failure

2022-06-20 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Monday, June 20, 2022 4:17 PM
> @@ -2564,7 +2564,7 @@ static int domain_add_dev_info(struct
> dmar_domain *domain, struct device *dev)
>   ret = intel_pasid_setup_second_level(iommu,
> domain,
>   dev, PASID_RID2PASID);
>   spin_unlock_irqrestore(&iommu->lock, flags);
> - if (ret) {
> + if (ret && ret != -EBUSY) {
>   dev_err(dev, "Setup RID2PASID failed\n");
>   dmar_remove_one_dev_info(dev);
>   return ret;
> --
> 2.25.1

It's cleaner to avoid this error at the first place, i.e. only do the
setup when the first device is attached to the pasid table.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v8 05/11] iommu/vt-d: Add SVA domain support

2022-06-17 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 7, 2022 9:50 AM
> 
> +
> +static const struct iommu_domain_ops intel_svm_domain_ops = {
> + .set_dev_pasid  = intel_svm_attach_dev_pasid,
> + .block_dev_pasid= intel_svm_detach_dev_pasid,
> + .free   = intel_svm_domain_free,
> +};
> +

It's clearer to use set/block for intel callbacks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v8 04/11] iommu: Add sva iommu_domain support

2022-06-17 Thread Tian, Kevin

> From: Baolu Lu
> Sent: Friday, June 10, 2022 3:16 PM
> >
> >> +#define __IOMMU_DOMAIN_HOST_VA(1U << 5)  /* Host CPU virtual
> address */
> >
> > Do you mean general CPU VA? or Host CPU VA, I'm reading the latter as
> 2nd
> > stage?
> 
> Host CPU VA. In the near future, we will add another flag _GUEST_VA, so
> that the shared page table types are distiguished.

How does the kernel knows that an user page table translates guest VA?
IMHO I don't think the kernel should care about it except managing
all the aspects related to the user page table itself...

> 
> >
> >> +
> >>   /*
> >>* This are the possible domain-types
> >>*
> >> @@ -86,15 +89,24 @@ struct iommu_domain_geometry {
> >>   #define IOMMU_DOMAIN_DMA_FQ
>   (__IOMMU_DOMAIN_PAGING |\
> >> __IOMMU_DOMAIN_DMA_API |   \
> >> __IOMMU_DOMAIN_DMA_FQ)
> >> +#define IOMMU_DOMAIN_SVA  (__IOMMU_DOMAIN_SHARED |
>   \
> >> +   __IOMMU_DOMAIN_HOST_VA)
> >
> > Doesn't shared automatically mean CPU VA? Do we need another flag?
> 
> Yes. Shared means CPU VA, but there're many types. Besides above two, we
> also see the shared KVM/EPT.
> 

Will the two sharing scenarios share any common code? If not then
having a separate flag bit is meaningless.

It might be more straightforward to be:

#define IOMMU_DOMAIN_SVA__IOMMU_DOMAIN_SVA
#define IOMMU_DOMAIN_KVM __IOMMU_DOMAIN_KVM
#define IOMMU_DOMAIN_USER __IOMMU_DOMAIN_USER

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 5/5] vfio/iommu_type1: Simplify group attachment

2022-06-16 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Friday, June 17, 2022 6:41 AM
> 
> > ...
> > > - if (resv_msi) {
> > > + if (resv_msi && !domain->msi_cookie) {
> > >   ret = iommu_get_msi_cookie(domain->domain,
> > > resv_msi_base);
> > >   if (ret && ret != -ENODEV)
> > >   goto out_detach;
> > > + domain->msi_cookie = true;
> > >   }
> >
> > why not moving to alloc_attach_domain() then no need for the new
> > domain field? It's required only when a new domain is allocated.
> 
> When reusing an existing domain that doesn't have an msi_cookie,
> we can do iommu_get_msi_cookie() if resv_msi is found. So it is
> not limited to a new domain.

Looks msi_cookie requirement is per platform (currently only
for smmu. see arm_smmu_get_resv_regions()). If there is
no mixed case then above check is not required.

But let's hear whether Robin has a different thought here.

> 
> > ...
> > > - if (list_empty(&domain->group_list)) {
> > > - if (list_is_singular(&iommu->domain_list)) {
> > > - if (list_empty(&iommu-
> > > >emulated_iommu_groups)) {
> > > - WARN_ON(iommu->notifier.head);
> > > -
> > >   vfio_iommu_unmap_unpin_all(iommu);
> > > - } else {
> > > -
> > >   vfio_iommu_unmap_unpin_reaccount(iommu);
> > > - }
> > > - }
> > > - iommu_domain_free(domain->domain);
> > > - list_del(&domain->next);
> > > - kfree(domain);
> > > - vfio_iommu_aper_expand(iommu, &iova_copy);
> >
> > Previously the aperture is adjusted when a domain is freed...
> >
> > > - vfio_update_pgsize_bitmap(iommu);
> > > - }
> > > - /*
> > > -  * Removal of a group without dirty tracking may allow
> > > -  * the iommu scope to be promoted.
> > > -  */
> > > - if (!group->pinned_page_dirty_scope) {
> > > - iommu->num_non_pinned_groups--;
> > > - if (iommu->dirty_page_tracking)
> > > - vfio_iommu_populate_bitmap_full(iommu);
> > > - }
> > > + vfio_iommu_detach_destroy_domain(domain, iommu,
> > > group);
> > >   kfree(group);
> > >   break;
> > >   }
> > >
> > > + vfio_iommu_aper_expand(iommu, &iova_copy);
> >
> > but now it's done for every group detach. The aperture is decided
> > by domain geometry which is not affected by attached groups.
> 
> Yea, I've noticed this part. Actually Jason did this change for
> simplicity, and I think it'd be safe to do so?

Perhaps detach_destroy() can return a Boolean to indicate whether
a domain is destroyed.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 5/5] vfio/iommu_type1: Simplify group attachment

2022-06-16 Thread Tian, Kevin

> From: Nicolin Chen
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> Un-inline the domain specific logic from the attach/detach_group ops into
> two paired functions vfio_iommu_alloc_attach_domain() and
> vfio_iommu_detach_destroy_domain() that strictly deal with creating and
> destroying struct vfio_domains.
> 
> Add the logic to check for EMEDIUMTYPE return code of
> iommu_attach_group()
> and avoid the extra domain allocations and attach/detach sequences of the
> old code. This allows properly detecting an actual attach error, like
> -ENOMEM, vs treating all attach errors as an incompatible domain.
> 
> Co-developed-by: Jason Gunthorpe 
> Signed-off-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 298 +---
>  1 file changed, 163 insertions(+), 135 deletions(-)
> 

...
> +static struct vfio_domain *
> +vfio_iommu_alloc_attach_domain(struct bus_type *bus, struct vfio_iommu
> *iommu,
> +struct vfio_iommu_group *group)
> +{
> + struct iommu_domain *new_domain;
> + struct vfio_domain *domain;
> + int ret = 0;
> +
> + /* Try to match an existing compatible domain */
> + list_for_each_entry (domain, &iommu->domain_list, next) {
> + ret = iommu_attach_group(domain->domain, group-
> >iommu_group);
> + if (ret == -EMEDIUMTYPE)
> + continue;

Probably good to add one line comment here for what EMEDIUMTYPE
represents. It's not a widely-used retry type like EAGAIN. A comment
can save the time of digging out the fact by jumping to iommu file.

...
> - if (resv_msi) {
> + if (resv_msi && !domain->msi_cookie) {
>   ret = iommu_get_msi_cookie(domain->domain,
> resv_msi_base);
>   if (ret && ret != -ENODEV)
>   goto out_detach;
> + domain->msi_cookie = true;
>   }

why not moving to alloc_attach_domain() then no need for the new
domain field? It's required only when a new domain is allocated.

...
> - if (list_empty(&domain->group_list)) {
> - if (list_is_singular(&iommu->domain_list)) {
> - if (list_empty(&iommu-
> >emulated_iommu_groups)) {
> - WARN_ON(iommu->notifier.head);
> -
>   vfio_iommu_unmap_unpin_all(iommu);
> - } else {
> -
>   vfio_iommu_unmap_unpin_reaccount(iommu);
> - }
> - }
> - iommu_domain_free(domain->domain);
> - list_del(&domain->next);
> - kfree(domain);
> - vfio_iommu_aper_expand(iommu, &iova_copy);

Previously the aperture is adjusted when a domain is freed...

> - vfio_update_pgsize_bitmap(iommu);
> - }
> - /*
> -  * Removal of a group without dirty tracking may allow
> -  * the iommu scope to be promoted.
> -  */
> - if (!group->pinned_page_dirty_scope) {
> - iommu->num_non_pinned_groups--;
> - if (iommu->dirty_page_tracking)
> - vfio_iommu_populate_bitmap_full(iommu);
> - }
> + vfio_iommu_detach_destroy_domain(domain, iommu,
> group);
>   kfree(group);
>   break;
>   }
> 
> + vfio_iommu_aper_expand(iommu, &iova_copy);

but now it's done for every group detach. The aperture is decided
by domain geometry which is not affected by attached groups.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 4/5] vfio/iommu_type1: Clean up update_dirty_scope in detach_group()

2022-06-15 Thread Tian, Kevin

> From: Nicolin Chen
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> All devices in emulated_iommu_groups have pinned_page_dirty_scope
> set, so the update_dirty_scope in the first list_for_each_entry
> is always false. Clean it up, and move the "if update_dirty_scope"
> part from the detach_group_done routine to the domain_list part.
> 
> Rename the "detach_group_done" goto label accordingly.
> 
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian , with one nit:

> @@ -2469,7 +2467,7 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
>   WARN_ON(iommu->notifier.head);
>   vfio_iommu_unmap_unpin_all(iommu);
>   }
> - goto detach_group_done;
> + goto out_unlock;
...
> +out_unlock:
>   mutex_unlock(&iommu->lock);
>  }
> 

I'd just replace the goto with a direct unlock and then return there. 
the readability is slightly better.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-15 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> The domain->ops validation was added, as a precaution, for mixed-driver
> systems. However, at this moment only one iommu driver is possible. So
> remove it.

It's true on a physical platform. But I'm not sure whether a virtual platform
is allowed to include multiple e.g. one virtio-iommu alongside a virtual VT-d
or a virtual smmu. It might be clearer to claim that (as Robin pointed out)
there is plenty more significant problems than this to solve instead of simply
saying that only one iommu driver is possible if we don't have explicit code
to reject such configuration. 😊

> 
> Per discussion with Robin, in future when many can be permitted we will
> rely on the IOMMU core code to check the domain->ops:
> https://lore.kernel.org/linux-iommu/6575de6d-94ba-c427-5b1e-
> 967750ddf...@arm.com/
> 
> Signed-off-by: Nicolin Chen 

Apart from that,

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 2/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-06-15 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> From: Jason Gunthorpe 
> 
> The KVM mechanism for controlling wbinvd is based on OR of the coherency
> property of all devices attached to a guest, no matter those devices are
> attached to a single domain or multiple domains.
> 
> So, there is no value in trying to push a device that could do enforced
> cache coherency to a dedicated domain vs re-using an existing domain
> which is non-coherent since KVM won't be able to take advantage of it.
> This just wastes domain memory.
> 
> Simplify this code and eliminate the test. This removes the only logic
> that needed to have a dummy domain attached prior to searching for a
> matching domain and simplifies the next patches.
> 
> It's unclear whether we want to further optimize the Intel driver to
> update the domain coherency after a device is detached from it, at
> least not before KVM can be verified to handle such dynamics in related
> emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality
> we don't see an usage requiring such optimization as the only device
> which imposes such non-coherency is Intel GPU which even doesn't
> support hotplug/hot remove.
> 
> Signed-off-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-15 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> Cases like VFIO wish to attach a device to an existing domain that was
> not allocated specifically from the device. This raises a condition
> where the IOMMU driver can fail the domain attach because the domain and
> device are incompatible with each other.
> 
> This is a soft failure that can be resolved by using a different domain.
> 
> Provide a dedicated errno from the IOMMU driver during attach that the
> reason attached failed is because of domain incompatability. EMEDIUMTYPE
> is chosen because it is never used within the iommu subsystem today and
> evokes a sense that the 'medium' aka the domain is incompatible.
> 
> VFIO can use this to know attach is a soft failure and it should continue
> searching. Otherwise the attach will be a hard failure and VFIO will
> return the code to userspace.
> 
> Update all drivers to return EMEDIUMTYPE in their failure paths that are
> related to domain incompatability. Also turn adjacent error prints into
> debug prints, for these soft failures, to prevent a kernel log spam.
> 
> Add kdocs describing this behavior.
> 
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 03/12] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()

2022-06-15 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Wednesday, June 15, 2022 9:10 PM
> 
> On 2022/6/15 14:22, Tian, Kevin wrote:
> >> From: Baolu Lu 
> >> Sent: Tuesday, June 14, 2022 3:21 PM
> >>
> >> On 2022/6/14 14:49, Tian, Kevin wrote:
> >>>> From: Lu Baolu
> >>>> Sent: Tuesday, June 14, 2022 10:51 AM
> >>>>
> >>>> The disable_dmar_iommu() is called when IOMMU initialization fails or
> >>>> the IOMMU is hot-removed from the system. In both cases, there is no
> >>>> need to clear the IOMMU translation data structures for devices.
> >>>>
> >>>> On the initialization path, the device probing only happens after the
> >>>> IOMMU is initialized successfully, hence there're no translation data
> >>>> structures.
> >>> Out of curiosity. With kexec the IOMMU may contain stale mappings
> >>> from the old kernel. Then is it meaningful to disable IOMMU after the
> >>> new kernel fails to initialize it properly?
> >>
> >> For kexec kernel, if the IOMMU is detected to be pre-enabled, the IOMMU
> >> driver will try to copy tables from the old kernel. If copying table
> >> fails, the IOMMU driver will disable IOMMU and do the normal
> >> initialization.
> >>
> >
> > What about an error occurred after copying table in the initialization
> > path? The new kernel will be in a state assuming iommu is disabled
> > but it is still enabled using an old mapping for certain devices...
> >
> 
> If copying table failed, the translation will be disabled and a clean
> root table will be used.
> 
> if (translation_pre_enabled(iommu)) {
>  pr_info("Translation already enabled - trying to copy
> translation structures\n");
> 
>  ret = copy_translation_tables(iommu);
>  if (ret) {
>  /*
>   * We found the IOMMU with translation
>   * enabled - but failed to copy over the
>   * old root-entry table. Try to proceed
>   * by disabling translation now and
>   * allocating a clean root-entry table.
>   * This might cause DMAR faults, but
>   * probably the dump will still succeed.
>   */
>  pr_err("Failed to copy translation tables from previous
> kernel for %s\n",
> iommu->name);
>  iommu_disable_translation(iommu);
>  clear_translation_pre_enabled(iommu);
>  } else {
>  pr_info("Copied translation tables from previous kernel
> for %s\n",
>  iommu->name);
>  }
> }
> 

I meant copying table succeeds but another error occurs in the
remaining path of initialization...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 3/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-06-15 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Wednesday, June 15, 2022 4:45 AM
> 
> Hi Kevin,
> 
> On Wed, Jun 08, 2022 at 11:48:27PM +, Tian, Kevin wrote:
> > > > > The KVM mechanism for controlling wbinvd is only triggered during
> > > > > kvm_vfio_group_add(), meaning it is a one-shot test done once the
> > > devices
> > > > > are setup.
> > > >
> > > > It's not one-shot. kvm_vfio_update_coherency() is called in both
> > > > group_add() and group_del(). Then the coherency property is
> > > > checked dynamically in wbinvd emulation:
> > >
> > > From the perspective of managing the domains that is still
> > > one-shot. It doesn't get updated when individual devices are
> > > added/removed to domains.
> >
> > It's unchanged per-domain but dynamic per-vm when multiple
> > domains are added/removed (i.e. kvm->arch.noncoherent_dma_count).
> > It's the latter being checked in the kvm.
> 
> I am going to send a v2, yet not quite getting the point here.
> Meanwhile, Jason is on leave.
> 
> What, in your opinion, would be an accurate description here?
> 

Something like below:
--
The KVM mechanism for controlling wbinvd is based on OR of
the coherency property of all devices attached to a guest, no matter
those devices  are attached to a single domain or multiple domains.

So, there is no value in trying to push a device that could do enforced
cache coherency to a dedicated domain vs re-using an existing domain
which is non-coherent since KVM won't be able to take advantage of it. 
This just wastes domain memory.

Simplify this code and eliminate the test. This removes the only logic
that needed to have a dummy domain attached prior to searching for a
matching domain and simplifies the next patches.

It's unclear whether we want to further optimize the Intel driver to
update the domain coherency after a device is detached from it, at
least not before KVM can be verified to handle such dynamics in related
emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality
we don't see an usage requiring such optimization as the only device
which imposes such non-coherency is Intel GPU which even doesn't
support hotplug/hot remove.
--

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 03/12] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()

2022-06-14 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Tuesday, June 14, 2022 3:21 PM
> 
> On 2022/6/14 14:49, Tian, Kevin wrote:
> >> From: Lu Baolu
> >> Sent: Tuesday, June 14, 2022 10:51 AM
> >>
> >> The disable_dmar_iommu() is called when IOMMU initialization fails or
> >> the IOMMU is hot-removed from the system. In both cases, there is no
> >> need to clear the IOMMU translation data structures for devices.
> >>
> >> On the initialization path, the device probing only happens after the
> >> IOMMU is initialized successfully, hence there're no translation data
> >> structures.
> > Out of curiosity. With kexec the IOMMU may contain stale mappings
> > from the old kernel. Then is it meaningful to disable IOMMU after the
> > new kernel fails to initialize it properly?
> 
> For kexec kernel, if the IOMMU is detected to be pre-enabled, the IOMMU
> driver will try to copy tables from the old kernel. If copying table
> fails, the IOMMU driver will disable IOMMU and do the normal
> initialization.
>

What about an error occurred after copying table in the initialization
path? The new kernel will be in a state assuming iommu is disabled
but it is still enabled using an old mapping for certain devices...
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 01/12] iommu/vt-d: debugfs: Remove device_domain_lock usage

2022-06-14 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Wednesday, June 15, 2022 9:54 AM
> 
> On 2022/6/14 14:43, Tian, Kevin wrote:
> >> From: Lu Baolu
> >> Sent: Tuesday, June 14, 2022 10:51 AM
> >>
> >> The domain_translation_struct debugfs node is used to dump the DMAR
> >> page
> >> tables for the PCI devices. It potentially races with setting domains to
> >> devices. The existing code uses a global spinlock device_domain_lock to
> >> avoid the races, but this is problematical as this lock is only used to
> >> protect the device tracking lists of each domain.
> > is it really problematic at this point? Before following patches are applied
> > using device_domain_lock should have similar effect as holding the group
> > lock.
> >
> > Here it might make more sense to just focus on removing the use of
> > device_domain_lock outside of iommu.c. Just that using group lock is
> > cleaner and more compatible to following cleanups.
> >
> > and it's worth mentioning that racing with page table updates is out
> > of the scope of this series. Probably also add a comment in the code
> > to clarify this point.
> >
> 
> Hi Kevin,
> 
> How do you like below updated patch?

Yes, this is better.

> 
>  From cecc9a0623780a11c4ea4d0a15aa6187f01541c4 Mon Sep 17 00:00:00
> 2001
> From: Lu Baolu 
> Date: Sun, 29 May 2022 10:18:56 +0800
> Subject: [PATCH 1/1] iommu/vt-d: debugfs: Remove device_domain_lock
> usage
> 
> The domain_translation_struct debugfs node is used to dump the DMAR
> page
> tables for the PCI devices. It potentially races with setting domains to
> devices. The existing code uses the global spinlock device_domain_lock to
> avoid the races.
> 
> This removes the use of device_domain_lock outside of iommu.c by replacing
> it with the group mutex lock. Using the group mutex lock is cleaner and
> more compatible to following cleanups.
> 
> Signed-off-by: Lu Baolu 
> ---
>   drivers/iommu/intel/debugfs.c | 42 +--
>   drivers/iommu/intel/iommu.c   |  2 +-
>   drivers/iommu/intel/iommu.h   |  1 -
>   3 files changed, 31 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
> index d927ef10641b..f4acd8993f60 100644
> --- a/drivers/iommu/intel/debugfs.c
> +++ b/drivers/iommu/intel/debugfs.c
> @@ -342,13 +342,13 @@ static void pgtable_walk_level(struct seq_file *m,
> struct dma_pte *pde,
>   }
>   }
> 
> -static int show_device_domain_translation(struct device *dev, void *data)
> +static int __show_device_domain_translation(struct device *dev, void *data)
>   {
> - struct device_domain_info *info = dev_iommu_priv_get(dev);
> - struct dmar_domain *domain = info->domain;
> + struct dmar_domain *domain;
>   struct seq_file *m = data;
>   u64 path[6] = { 0 };
> 
> + domain = to_dmar_domain(iommu_get_domain_for_dev(dev));
>   if (!domain)
>   return 0;
> 
> @@ -359,20 +359,38 @@ static int show_device_domain_translation(struct
> device *dev, void *data)
>   pgtable_walk_level(m, domain->pgd, domain->agaw + 2, 0, path);
>   seq_putc(m, '\n');
> 
> - return 0;
> + return 1;
>   }
> 
> -static int domain_translation_struct_show(struct seq_file *m, void *unused)
> +static int show_device_domain_translation(struct device *dev, void *data)
>   {
> - unsigned long flags;
> - int ret;
> + struct iommu_group *group;
> 
> - spin_lock_irqsave(&device_domain_lock, flags);
> - ret = bus_for_each_dev(&pci_bus_type, NULL, m,
> -show_device_domain_translation);
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> + group = iommu_group_get(dev);
> + if (group) {
> + /*
> +  * The group->mutex is held across the callback, which will
> +  * block calls to iommu_attach/detach_group/device. Hence,
> +  * the domain of the device will not change during traversal.
> +  *
> +  * All devices in an iommu group share a single domain,
> hence
> +  * we only dump the domain of the first device. Even though,

bus_for_each_dev() will still lead to duplicated dump in the same group
but probably we can leave with it for a debug interface.

> +  * this code still possibly races with the iommu_unmap()
> +  * interface. This could be solved by RCU-freeing the page
> +  * table pages in the iommu_unmap() path.
> +  */
> + iommu_group_for_each_dev(group, data,
> +

RE: [PATCH 1/1] iommu/vt-d: Add set_dev_pasid callbacks for default domain

2022-06-14 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Tuesday, June 14, 2022 2:13 PM
> 
> On 2022/6/14 13:36, Tian, Kevin wrote:
> >> From: Baolu Lu
> >> Sent: Tuesday, June 14, 2022 12:48 PM
> >>
> >> On 2022/6/14 12:02, Tian, Kevin wrote:
> >>>> From: Lu Baolu
> >>>> Sent: Tuesday, June 14, 2022 11:44 AM
> >>>>
> >>>> This allows the upper layers to set a domain to a PASID of a device
> >>>> if the PASID feature is supported by the IOMMU hardware. The typical
> >>>> use cases are, for example, kernel DMA with PASID and hardware
> >>>> assisted mediated device drivers.
> >>>>
> >>> why is it not part of the series for those use cases? There is no consumer
> >>> of added callbacks in this patch...
> >> It could be. I just wanted to maintain the integrity of Intel IOMMU
> >> driver implementation.
> > but let's not add dead code. and this patch is actually a right step
> > simply from set_dev_pasid() p.o.v hence you should include in any
> > series which first tries to use that interface.
> >
> 
> Yes, that's my intention. If it reviews well, we can include it in the
> driver's implementation.
> 

Then you should make it clear in the first place. otherwise a patch
like this implies a review for merge. 😊
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 11/12] iommu/vt-d: Use device_domain_lock accurately

2022-06-14 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 14, 2022 10:52 AM
> 
> The device_domain_lock is used to protect the device tracking list of
> a domain. Remove unnecessary spin_lock/unlock()'s and move the necessary
> ones around the list access.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 68 +++--
>  1 file changed, 27 insertions(+), 41 deletions(-)
> 
[...]
> +iommu_support_dev_iotlb(struct dmar_domain *domain, struct
> intel_iommu *iommu,
> + u8 bus, u8 devfn)
>  {
> - struct device_domain_info *info;
> -
> - assert_spin_locked(&device_domain_lock);
> + struct device_domain_info *info = NULL, *tmp;
> + unsigned long flags;
> 
>   if (!iommu->qi)
>   return NULL;
> 
> - list_for_each_entry(info, &domain->devices, link)
> - if (info->iommu == iommu && info->bus == bus &&
> - info->devfn == devfn) {
> - if (info->ats_supported && info->dev)
> - return info;
> + spin_lock_irqsave(&device_domain_lock, flags);
> + list_for_each_entry(tmp, &domain->devices, link) {
> + if (tmp->iommu == iommu && tmp->bus == bus &&
> + tmp->devfn == devfn) {
> + if (tmp->ats_supported)
> + info = tmp;

Directly returning with unlock here is clearer than adding
another tmp variable...

> @@ -2460,15 +2450,14 @@ static int domain_add_dev_info(struct
> dmar_domain *domain, struct device *dev)
>   if (!iommu)
>   return -ENODEV;
> 
> - spin_lock_irqsave(&device_domain_lock, flags);
> - info->domain = domain;
>   ret = domain_attach_iommu(domain, iommu);
> - if (ret) {
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> + if (ret)
>   return ret;
> - }
> +
> + spin_lock_irqsave(&device_domain_lock, flags);
>   list_add(&info->link, &domain->devices);
>   spin_unlock_irqrestore(&device_domain_lock, flags);
> + info->domain = domain;
> 

This is incorrect. You need fully initialize the object before adding
it to the list. Otherwise a search right after above unlock and
before assigning info->domain will get a wrong data
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 10/12] iommu/vt-d: Fold __dmar_remove_one_dev_info() into its caller

2022-06-14 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 14, 2022 10:52 AM
> 
> Fold __dmar_remove_one_dev_info() into dmar_remove_one_dev_info()
> which
> is its only caller. Make the spin lock critical range only cover the
> device list change code and remove some unnecessary checks.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 34 +-
>  1 file changed, 9 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index af22690f44c8..8345e0c0824c 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -295,7 +295,6 @@ static LIST_HEAD(dmar_satc_units);
>  static int g_num_of_iommus;
> 
>  static void dmar_remove_one_dev_info(struct device *dev);
> -static void __dmar_remove_one_dev_info(struct device_domain_info *info);
> 
>  int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
>  int intel_iommu_sm =
> IS_ENABLED(CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON);
> @@ -4141,20 +4140,14 @@ static void domain_context_clear(struct
> device_domain_info *info)
>  &domain_context_clear_one_cb, info);
>  }
> 
> -static void __dmar_remove_one_dev_info(struct device_domain_info *info)
> +static void dmar_remove_one_dev_info(struct device *dev)
>  {
> - struct dmar_domain *domain;
> - struct intel_iommu *iommu;
> -
> - assert_spin_locked(&device_domain_lock);
> -
> - if (WARN_ON(!info))
> - return;
> -
> - iommu = info->iommu;
> - domain = info->domain;
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct dmar_domain *domain = info->domain;

this local variable is not required as there is just one reference
to info->domain.

btw I didn't see info->domain is cleared in this path. Is it correct?

> + struct intel_iommu *iommu = info->iommu;
> + unsigned long flags;
> 
> - if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
> + if (!dev_is_real_dma_subdevice(info->dev)) {
>   if (dev_is_pci(info->dev) && sm_supported(iommu))
>   intel_pasid_tear_down_entry(iommu, info->dev,
>   PASID_RID2PASID, false);
> @@ -4164,20 +4157,11 @@ static void
> __dmar_remove_one_dev_info(struct device_domain_info *info)
>   intel_pasid_free_table(info->dev);
>   }
> 
> - list_del(&info->link);
> - domain_detach_iommu(domain, iommu);
> -}
> -
> -static void dmar_remove_one_dev_info(struct device *dev)
> -{
> - struct device_domain_info *info;
> - unsigned long flags;
> -
>   spin_lock_irqsave(&device_domain_lock, flags);
> - info = dev_iommu_priv_get(dev);
> - if (info)
> - __dmar_remove_one_dev_info(info);
> + list_del(&info->link);
>   spin_unlock_irqrestore(&device_domain_lock, flags);
> +
> + domain_detach_iommu(domain, iommu);
>  }
> 
>  static int md_domain_init(struct dmar_domain *domain, int guest_width)
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 09/12] iommu/vt-d: Check device list of domain in domain free path

2022-06-13 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 14, 2022 10:52 AM
> 
> When the IOMMU domain is about to be freed, it should not be set on any
> device. Instead of silently dealing with some bug cases, it's better to
> trigger a warning to report and fix any potential bugs at the first time.
> 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 08/12] iommu/vt-d: Replace spin_lock_irqsave() with spin_lock()

2022-06-13 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 14, 2022 10:52 AM
> 
> The iommu->lock is used to protect changes in root/context/pasid tables
> and domain ID allocation. There's no use case to change these resources
> in any interrupt context. Hence there's no need to disable interrupts
> when helding the spinlock.
> 
> Signed-off-by: Lu Baolu 

with this as one-place to changing irqsave for all previous patches:

Reviewed-by: Kevin Tian 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 06/12] iommu/vt-d: Acquiring lock in domain ID allocation helpers

2022-06-13 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 14, 2022 10:52 AM
> 
> The iommu->lock is used to protect the per-IOMMU domain ID resource.
> Moving the lock into the ID alloc/free helpers makes the code more
> compact. At the same time, the device_domain_lock is irrelevant to
> the domain ID resource, remove its assertion as well.
> 
> On the other hand, the iommu->lock is never used in interrupt context,
> there's no need to use the irqsave variant of the spinlock calls.

I still prefer to separating reduction of lock ranges from changing irqsave.
Locking is tricky. From bisect p.o.v. it will be a lot easier if we just change
one logic in one patch.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 03/12] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()

2022-06-13 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 14, 2022 10:51 AM
> 
> The disable_dmar_iommu() is called when IOMMU initialization fails or
> the IOMMU is hot-removed from the system. In both cases, there is no
> need to clear the IOMMU translation data structures for devices.
> 
> On the initialization path, the device probing only happens after the
> IOMMU is initialized successfully, hence there're no translation data
> structures.

Out of curiosity. With kexec the IOMMU may contain stale mappings
from the old kernel. Then is it meaningful to disable IOMMU after the
new kernel fails to initialize it properly?

> 
> On the hot-remove path, there is no real use case where the IOMMU is
> hot-removed, but the devices that it manages are still alive in the
> system. The translation data structures were torn down during device
> release, hence there's no need to repeat it in IOMMU hot-remove path
> either. This removes the unnecessary code and only leaves a check.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/pasid.h |  1 +
>  drivers/iommu/intel/iommu.c | 21 +++--
>  2 files changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index 583ea67fc783..2afbb2afe8cc 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -39,6 +39,7 @@
>   * only and pass-through transfer modes.
>   */
>  #define FLPT_DEFAULT_DID 1
> +#define NUM_RESERVED_DID 2
> 
>  /*
>   * The SUPERVISOR_MODE flag indicates a first level translation which
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index ff6018f746e0..695aed474e5d 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1715,23 +1715,16 @@ static int iommu_init_domains(struct
> intel_iommu *iommu)
> 
>  static void disable_dmar_iommu(struct intel_iommu *iommu)
>  {
> - struct device_domain_info *info, *tmp;
> - unsigned long flags;
> -
>   if (!iommu->domain_ids)
>   return;
> 
> - spin_lock_irqsave(&device_domain_lock, flags);
> - list_for_each_entry_safe(info, tmp, &device_domain_list, global) {
> - if (info->iommu != iommu)
> - continue;
> -
> - if (!info->dev || !info->domain)
> - continue;
> -
> - __dmar_remove_one_dev_info(info);
> - }
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> + /*
> +  * All iommu domains must have been detached from the devices,
> +  * hence there should be no domain IDs in use.
> +  */
> + if (WARN_ON(bitmap_weight(iommu->domain_ids,
> cap_ndoms(iommu->cap))
> + != NUM_RESERVED_DID))
> + return;
> 
>   if (iommu->gcmd & DMA_GCMD_TE)
>   iommu_disable_translation(iommu);
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 01/12] iommu/vt-d: debugfs: Remove device_domain_lock usage

2022-06-13 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 14, 2022 10:51 AM
> 
> The domain_translation_struct debugfs node is used to dump the DMAR
> page
> tables for the PCI devices. It potentially races with setting domains to
> devices. The existing code uses a global spinlock device_domain_lock to
> avoid the races, but this is problematical as this lock is only used to
> protect the device tracking lists of each domain.

is it really problematic at this point? Before following patches are applied
using device_domain_lock should have similar effect as holding the group
lock.

Here it might make more sense to just focus on removing the use of
device_domain_lock outside of iommu.c. Just that using group lock is
cleaner and more compatible to following cleanups.

and it's worth mentioning that racing with page table updates is out
of the scope of this series. Probably also add a comment in the code
to clarify this point.

> 
> This replaces device_domain_lock with group->mutex to protect page tables
> from setting a new domain. This also makes device_domain_lock static as
> it is now only used inside the file.

s/the file/iommu.c/

> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.h   |  1 -
>  drivers/iommu/intel/debugfs.c | 49 +--
>  drivers/iommu/intel/iommu.c   |  2 +-
>  3 files changed, 36 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index a22adfbdf870..8a6d64d726c0 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -480,7 +480,6 @@ enum {
>  #define VTD_FLAG_SVM_CAPABLE (1 << 2)
> 
>  extern int intel_iommu_sm;
> -extern spinlock_t device_domain_lock;
> 
>  #define sm_supported(iommu)  (intel_iommu_sm &&
> ecap_smts((iommu)->ecap))
>  #define pasid_supported(iommu)   (sm_supported(iommu) &&
>   \
> diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
> index d927ef10641b..5ebfe32265d5 100644
> --- a/drivers/iommu/intel/debugfs.c
> +++ b/drivers/iommu/intel/debugfs.c
> @@ -342,15 +342,23 @@ static void pgtable_walk_level(struct seq_file *m,
> struct dma_pte *pde,
>   }
>  }
> 
> -static int show_device_domain_translation(struct device *dev, void *data)
> +struct show_domain_opaque {
> + struct device *dev;
> + struct seq_file *m;
> +};

Sounds redundant as both bus_for_each_dev() and
iommu_group_for_each_dev() declares the same fn type which
accepts a device pointer and void *data. 

> +
> +static int __show_device_domain_translation(struct device *dev, void *data)
>  {
> - struct device_domain_info *info = dev_iommu_priv_get(dev);
> - struct dmar_domain *domain = info->domain;
> - struct seq_file *m = data;
> + struct show_domain_opaque *opaque = data;
> + struct device_domain_info *info;
> + struct seq_file *m = opaque->m;
> + struct dmar_domain *domain;
>   u64 path[6] = { 0 };
> 
> - if (!domain)
> + if (dev != opaque->dev)
>   return 0;

not required.

> + info = dev_iommu_priv_get(dev);
> + domain = info->domain;
> 
>   seq_printf(m, "Device %s @0x%llx\n", dev_name(dev),
>  (u64)virt_to_phys(domain->pgd));
> @@ -359,20 +367,33 @@ static int show_device_domain_translation(struct
> device *dev, void *data)
>   pgtable_walk_level(m, domain->pgd, domain->agaw + 2, 0, path);
>   seq_putc(m, '\n');
> 
> - return 0;
> + return 1;
>  }
> 
> -static int domain_translation_struct_show(struct seq_file *m, void *unused)
> +static int show_device_domain_translation(struct device *dev, void *data)
>  {
> - unsigned long flags;
> - int ret;
> + struct show_domain_opaque opaque = {dev, data};
> + struct iommu_group *group;
> 
> - spin_lock_irqsave(&device_domain_lock, flags);
> - ret = bus_for_each_dev(&pci_bus_type, NULL, m,
> -show_device_domain_translation);
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> + group = iommu_group_get(dev);
> + if (group) {
> + /*
> +  * The group->mutex is held across the callback, which will
> +  * block calls to iommu_attach/detach_group/device. Hence,
> +  * the domain of the device will not change during traversal.
> +  */
> + iommu_group_for_each_dev(group, &opaque,
> +  __show_device_domain_translation);
> + iommu_group_put(group);
> + }
> 
> - return ret;
> + return 0;
> +}
> +
> +static int domain_translation_struct_show(struct seq_file *m, void *unused)
> +{
> + return bus_for_each_dev(&pci_bus_type, NULL, m,
> + show_device_domain_translation);
>  }
>  DEFINE_SHOW_ATTRIBUTE(domain_translation_struct);
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 19024dc52735..a39d72a9d1cf 100644
> --- a/drivers/iommu/intel/iommu.c
> +

RE: [PATCH 1/1] iommu/vt-d: Add set_dev_pasid callbacks for default domain

2022-06-13 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Tuesday, June 14, 2022 12:48 PM
> 
> On 2022/6/14 12:02, Tian, Kevin wrote:
> >> From: Lu Baolu 
> >> Sent: Tuesday, June 14, 2022 11:44 AM
> >>
> >> This allows the upper layers to set a domain to a PASID of a device
> >> if the PASID feature is supported by the IOMMU hardware. The typical
> >> use cases are, for example, kernel DMA with PASID and hardware
> >> assisted mediated device drivers.
> >>
> >
> > why is it not part of the series for those use cases? There is no consumer
> > of added callbacks in this patch...
> 
> It could be. I just wanted to maintain the integrity of Intel IOMMU
> driver implementation.

but let's not add dead code. and this patch is actually a right step
simply from set_dev_pasid() p.o.v hence you should include in any
series which first tries to use that interface.

> 
> >
> >> +/* PCI domain-subdevice relationship */
> >> +struct subdev_domain_info {
> >> +  struct list_head link_domain;   /* link to domain siblings */
> >> +  struct device *dev; /* physical device derived from */
> >> +  ioasid_t pasid; /* PASID on physical device */
> >> +};
> >> +
> >
> > It's not subdev. Just dev+pasid in iommu's context.
> 
> How about struct device_pasid_info?
> 

this is better.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 1/1] iommu/vt-d: Add set_dev_pasid callbacks for default domain

2022-06-13 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Tuesday, June 14, 2022 11:44 AM
> 
> This allows the upper layers to set a domain to a PASID of a device
> if the PASID feature is supported by the IOMMU hardware. The typical
> use cases are, for example, kernel DMA with PASID and hardware
> assisted mediated device drivers.
> 

why is it not part of the series for those use cases? There is no consumer
of added callbacks in this patch...

> +/* PCI domain-subdevice relationship */
> +struct subdev_domain_info {
> + struct list_head link_domain;   /* link to domain siblings */
> + struct device *dev; /* physical device derived from */
> + ioasid_t pasid; /* PASID on physical device */
> +};
> +

It's not subdev. Just dev+pasid in iommu's context.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v8 02/11] iommu: Add max_pasids field in struct dev_iommu

2022-06-10 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Friday, June 10, 2022 2:47 PM
> 
> On 2022/6/10 03:01, Raj, Ashok wrote:
> > On Tue, Jun 07, 2022 at 09:49:33AM +0800, Lu Baolu wrote:
> >> @@ -218,6 +219,30 @@ static void dev_iommu_free(struct device *dev)
> >>kfree(param);
> >>   }
> >>
> >> +static u32 dev_iommu_get_max_pasids(struct device *dev)
> >> +{
> >> +  u32 max_pasids = dev->iommu->iommu_dev->max_pasids;
> >> +  u32 num_bits;
> >> +  int ret;
> >> +
> >> +  if (!max_pasids)
> >> +  return 0;
> >> +
> >> +  if (dev_is_pci(dev)) {
> >> +  ret = pci_max_pasids(to_pci_dev(dev));
> >> +  if (ret < 0)
> >> +  return 0;
> >> +
> >> +  return min_t(u32, max_pasids, ret);
> >
> > Ah.. that answers my other question to consider device pasid-max. I guess
> > if we need any enforcement of restricting devices that aren't supporting
> > the full PASID, that will be done by some higher layer?
> 
> The mm->pasid style of SVA is explicitly enabled through
> iommu_dev_enable_feature(IOMMU_DEV_FEAT_SVA). The IOMMU driver
> specific
> restriction might be put there?
> 
> >
> > too many returns in this function, maybe setup all returns to the end of
> > the function might be elegant?
> 
> I didn't find cleanup room after a quick scan of the code. But sure, let
> me go through code again offline.
>

If we do care:

+static u32 dev_iommu_get_max_pasids(struct device *dev)
+{
+   u32 max_pasids = 0, 
+   u32 num_bits = 0;
+   int ret;
+
+   if (dev_is_pci(dev)) {
+   ret = pci_max_pasids(to_pci_dev(dev));
+   if (ret > 0)
+   max_pasids = ret;
+   } else {
+   ret = device_property_read_u32(dev, "pasid-num-bits", 
&num_bits);
+   if (!ret)
+   max_pasids = 1UL << num_bits;
+   }
+
+   return min_t(u32, max_pasids, dev->iommu->iommu_dev->max_pasids);
+}

 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v8 01/11] iommu: Add max_pasids field in struct iommu_device

2022-06-10 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Friday, June 10, 2022 7:53 AM
> 
> On Thu, Jun 09, 2022 at 05:25:42PM +, Raj, Ashok wrote:
> >
> > On Tue, Jun 07, 2022 at 09:49:32AM +0800, Lu Baolu wrote:
> > > Use this field to keep the number of supported PASIDs that an IOMMU
> > > hardware is able to support. This is a generic attribute of an IOMMU
> > > and lifting it into the per-IOMMU device structure makes it possible
> >
> > There is also a per-device attribute that tells what width is supported on
> > each device. When a device enables SVM, for simplicity we were proposing
> to
> > disable SVM on devices that don't support the full width, since it adds
> > additional complexity on the allocation interface. Is that factored into
> > this?
> 
> I would like to see the concept of a 'global PASID' and this is the
> only place we'd union all the per-device limits together. SVM can
> optionally use a global PASID and ENQCMD requires it, but I don't want
> to see the core code assuming everything is ENQCMD.
> 

Agree. and I think this is what this v8 is leaning toward. The core
code simply populates the pasid entry of the target device w/o
assuming the pasid is 'local' or 'global'. Then sva helpers actually
decides how the pasid is allocated.

Currently only global pasids are supported which is how sva works
before. We don't plan to change it in this series.

In parallel Jacob is working on per-device local pasids which will
then be used by his DMA API pasid work and also iommufd.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 3/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-06-08 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Wednesday, June 8, 2022 7:17 PM
> 
> On Wed, Jun 08, 2022 at 08:28:03AM +, Tian, Kevin wrote:
> > > From: Nicolin Chen
> > > Sent: Monday, June 6, 2022 2:19 PM
> > >
> > > From: Jason Gunthorpe 
> > >
> > > The KVM mechanism for controlling wbinvd is only triggered during
> > > kvm_vfio_group_add(), meaning it is a one-shot test done once the
> devices
> > > are setup.
> >
> > It's not one-shot. kvm_vfio_update_coherency() is called in both
> > group_add() and group_del(). Then the coherency property is
> > checked dynamically in wbinvd emulation:
> 
> From the perspective of managing the domains that is still
> one-shot. It doesn't get updated when individual devices are
> added/removed to domains.

It's unchanged per-domain but dynamic per-vm when multiple
domains are added/removed (i.e. kvm->arch.noncoherent_dma_count).
It's the latter being checked in the kvm.

> 
> > given that I'm fine with the change in this patch. Even more probably
> > we really want an explicit one-shot model so KVM can lock down
> > the property once it starts to consume it then further adding a new
> > group which would change the coherency is explicitly rejected and
> > removing an existing group leaves it intact.
> 
> Why? Once wbinvd is enabled it is compatible with all domain
> configurations, so just leave it on and ignore everything at that
> point.
> 

More than that. My point was to make it a static policy so even if
wbinvd is disabled in the start we want to leave it off and not affected
by adding a device which doesn't have coherency. 'wbinvd off' is not
a compatible configuration hence imo need a way to reject adding
incompatible device.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 4/5] vfio/iommu_type1: Clean up update_dirty_scope in detach_group()

2022-06-08 Thread Tian, Kevin

> From: Nicolin Chen
> Sent: Monday, June 6, 2022 2:19 PM
> 
> All devices in emulated_iommu_groups have pinned_page_dirty_scope
> set, so the update_dirty_scope in the first list_for_each_entry
> is always false. Clean it up, and move the "if update_dirty_scope"
> part from the detach_group_done routine to the domain_list part.
> 
> Rename the "detach_group_done" goto label accordingly.
> 
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 27 ---
>  1 file changed, 12 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index f4e3b423a453..b45b1cc118ef 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2463,14 +2463,12 @@ static void
> vfio_iommu_type1_detach_group(void *iommu_data,
>   struct vfio_iommu *iommu = iommu_data;
>   struct vfio_domain *domain;
>   struct vfio_iommu_group *group;
> - bool update_dirty_scope = false;
>   LIST_HEAD(iova_copy);
> 
>   mutex_lock(&iommu->lock);
>   list_for_each_entry(group, &iommu->emulated_iommu_groups,
> next) {
>   if (group->iommu_group != iommu_group)
>   continue;
> - update_dirty_scope = !group->pinned_page_dirty_scope;
>   list_del(&group->next);
>   kfree(group);
> 
> @@ -2479,7 +2477,7 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
>   WARN_ON(iommu->notifier.head);
>   vfio_iommu_unmap_unpin_all(iommu);
>   }
> - goto detach_group_done;
> + goto out_unlock;
>   }
> 
>   /*
> @@ -2495,9 +2493,7 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
>   continue;
> 
>   iommu_detach_group(domain->domain, group-
> >iommu_group);
> - update_dirty_scope = !group->pinned_page_dirty_scope;
>   list_del(&group->next);
> - kfree(group);
>   /*
>* Group ownership provides privilege, if the group list is
>* empty, the domain goes away. If it's the last domain with
> @@ -2519,7 +2515,17 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
>   kfree(domain);
>   vfio_iommu_aper_expand(iommu, &iova_copy);
>   vfio_update_pgsize_bitmap(iommu);
> + /*
> +  * Removal of a group without dirty tracking may
> allow
> +  * the iommu scope to be promoted.
> +  */
> + if (!group->pinned_page_dirty_scope) {
> + iommu->num_non_pinned_groups--;
> + if (iommu->dirty_page_tracking)
> +
>   vfio_iommu_populate_bitmap_full(iommu);

This doesn't look correct. The old code decrements
num_non_pinned_groups for every detach group without dirty
tracking. But now it's only done when the domain is about to
be released...

> + }
>   }
> + kfree(group);
>   break;
>   }
> 
> @@ -2528,16 +2534,7 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
>   else
>   vfio_iommu_iova_free(&iova_copy);
> 
> -detach_group_done:
> - /*
> -  * Removal of a group without dirty tracking may allow the iommu
> scope
> -  * to be promoted.
> -  */
> - if (update_dirty_scope) {
> - iommu->num_non_pinned_groups--;
> - if (iommu->dirty_page_tracking)
> - vfio_iommu_populate_bitmap_full(iommu);
> - }
> +out_unlock:
>   mutex_unlock(&iommu->lock);
>  }
> 
> --
> 2.17.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 3/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-06-08 Thread Tian, Kevin

> From: Nicolin Chen
> Sent: Monday, June 6, 2022 2:19 PM
> 
> From: Jason Gunthorpe 
> 
> The KVM mechanism for controlling wbinvd is only triggered during
> kvm_vfio_group_add(), meaning it is a one-shot test done once the devices
> are setup.

It's not one-shot. kvm_vfio_update_coherency() is called in both
group_add() and group_del(). Then the coherency property is
checked dynamically in wbinvd emulation:

kvm_emulate_wbinvd()
  kvm_emulate_wbinvd_noskip()
need_emulate_wbinvd()
  kvm_arch_has_noncoherent_dma()

It's also checked when a vcpu is scheduled to a new cpu for
tracking dirty cpus which requires cache flush when emulating
wbinvd on that vcpu. See kvm_arch_vcpu_load().

/* Address WBINVD may be executed by guest */
if (need_emulate_wbinvd(vcpu)) {
if (static_call(kvm_x86_has_wbinvd_exit)())
cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);

In addition, it's also checked when deciding the effective memory
type of EPT entry. See vmx_get_mt_mask().

if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | 
VMX_EPT_IPAT_BIT;

But I doubt above can work reliably when the property is changed
in the fly given above paths are triggered at different points. The
guest may end up in a mixed state where inconsistent coherency 
is assumed in different emulation paths.

and In reality I don't think such niche scenario is even tested 
given the only device imposing such trick is integrated Intel GPU
which iiuc no one would try to hotplug/hot-remove it to/from
a guest.

given that I'm fine with the change in this patch. Even more probably
we really want an explicit one-shot model so KVM can lock down
the property once it starts to consume it then further adding a new
group which would change the coherency is explicitly rejected and
removing an existing group leaves it intact.

> 
> So, there is no value in trying to push a device that could do enforced
> cache coherency to a dedicated domain vs re-using an existing domain since

"an existing domain (even if it doesn't enforce coherency)", otherwise if
it's already compatible there is no question here.

> KVM won't be able to take advantage of it. This just wastes domain memory.
> 
> Simplify this code and eliminate the test. This removes the only logic
> that needed to have a dummy domain attached prior to searching for a
> matching domain and simplifies the next patches.
> 
> If someday we want to try and optimize this further the better approach is
> to update the Intel driver so that enforce_cache_coherency() can work on a
> domain that already has IOPTEs and then call the enforce_cache_coherency()
> after detaching a device from a domain to upgrade the whole domain to
> enforced cache coherency mode.
> 
> Signed-off-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index c13b9290e357..f4e3b423a453 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2285,9 +2285,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>* testing if they're on the same bus_type.
>*/
>   list_for_each_entry(d, &iommu->domain_list, next) {
> - if (d->domain->ops == domain->domain->ops &&
> - d->enforce_cache_coherency ==
> - domain->enforce_cache_coherency) {
> + if (d->domain->ops == domain->domain->ops) {
>   iommu_detach_group(domain->domain, group-
> >iommu_group);
>   if (!iommu_attach_group(d->domain,
>   group->iommu_group)) {
> --
> 2.17.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-08 Thread Tian, Kevin

> From: Nicolin Chen
> Sent: Monday, June 6, 2022 2:19 PM
> 
> Cases like VFIO wish to attach a device to an existing domain that was
> not allocated specifically from the device. This raises a condition
> where the IOMMU driver can fail the domain attach because the domain and
> device are incompatible with each other.
> 
> This is a soft failure that can be resolved by using a different domain.
> 
> Provide a dedicated errno from the IOMMU driver during attach that the
> reason attached failed is because of domain incompatability. EMEDIUMTYPE
> is chosen because it is never used within the iommu subsystem today and
> evokes a sense that the 'medium' aka the domain is incompatible.
> 
> VFIO can use this to know attach is a soft failure and it should continue
> searching. Otherwise the attach will be a hard failure and VFIO will
> return the code to userspace.
> 
> Update all drivers to return EMEDIUMTYPE in their failure paths that are
> related to domain incompatability.

Seems not all drivers are converted, e.g.:

mtk_iommu_v1_attach_device():
/* Only allow the domain created internally. */
mtk_mapping = data->mapping;
if (mtk_mapping->domain != domain)
return 0;
** the current code sounds incorrect which should return an error


s390_iommu_attach_device():
/* Allow only devices with identical DMA range limits */
} else if (domain->geometry.aperture_start != zdev->start_dma ||
domain->geometry.aperture_end != zdev->end_dma) {
rc = -EINVAL;


sprd_iommu_attach_device():
if (dom->sdev) {
pr_err("There's already a device attached to this domain.\n");
return -EINVAL;
}


gart_iommu_attach_dev():
if (gart->active_domain && gart->active_domain != domain) {
ret = -EBUSY;


arm_smmu_attach_dev():
if (!fwspec || fwspec->ops != &arm_smmu_ops) {
dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
return -ENXIO;
}
**probably this check can be covered by next patch which moves bus ops
check into iommu core?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v7 06/10] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

2022-06-01 Thread Tian, Kevin

> From: Jean-Philippe Brucker 
> Sent: Wednesday, May 25, 2022 3:30 PM
> 
> On Wed, May 25, 2022 at 02:04:49AM +, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker 
> > > Sent: Tuesday, May 24, 2022 6:58 PM
> > >
> > > On Tue, May 24, 2022 at 10:22:28AM +, Tian, Kevin wrote:
> > > > > From: Lu Baolu 
> > > > > Sent: Thursday, May 19, 2022 3:21 PM
> > > > >
> > > > > The existing iommu SVA interfaces are implemented by calling the SVA
> > > > > specific iommu ops provided by the IOMMU drivers. There's no need
> for
> > > > > any SVA specific ops in iommu_ops vector anymore as we can achieve
> > > > > this through the generic attach/detach_dev_pasid domain ops.
> > > >
> > > > set/block_pasid_dev, to be consistent.
> > > >
> > > > > +
> > > > > + mutex_lock(&iommu_sva_lock);
> > > > > + /* Search for an existing domain. */
> > > > > + domain = iommu_get_domain_for_dev_pasid(dev, mm-
> >pasid);
> > > > > + if (domain) {
> > > > > + sva_domain = to_sva_domain(domain);
> > > > > + refcount_inc(&sva_domain->bond.users);
> > > > > + goto out_success;
> > > > > + }
> > > > > +
> > > >
> > > > why would one device/pasid be bound to a mm more than once?
> > >
> > > Device drivers can call bind() multiple times for the same device and mm,
> > > for example if one process wants to open multiple accelerator queues.
> > >
> >
> > Is it clearer to have a sva_bond_get/put() pair instead of calling
> > bind() multiple times here?
> 
> I don't think it's clearer, and it would force device drivers to keep
> track of {dev, mm} pairs, when the IOMMU subsystem already does that.
> At the moment a device driver calls
> 
>   bond = iommu_sva_bind_device(dev, mm)
> 
> for each ADI that it wants to assign to userspace. If a process happens to
> want multiple ADIs on one device, then the {dev, mm} parameters are the
> same and bind() returns the same bond. Since the IOMMU driver needs to
> track these anyway, it might as well refcount them.
> 

My impression was that when an interface returns an object
then further reference to this object is usually directly acquired
on the object hence requires the caller to track it. But not a
strong opinion here if others all agree to favor simplicity for
the caller side.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 09/12] iommu/vt-d: Check device list of domain in domain free path

2022-06-01 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Wednesday, June 1, 2022 7:02 PM
> 
> On 2022/6/1 17:28, Tian, Kevin wrote:
> >> From: Lu Baolu 
> >> Sent: Friday, May 27, 2022 2:30 PM
> >>
> >> When the IOMMU domain is about to be freed, it should not be set on
> any
> >> device. Instead of silently dealing with some bug cases, it's better to
> >> trigger a warning to report and fix any potential bugs at the first time.
> >>
> >
> >
> >>   static void domain_exit(struct dmar_domain *domain)
> >>   {
> >> -
> >> -  /* Remove associated devices and clear attached or cached domains
> >> */
> >> -  domain_remove_dev_info(domain);
> >> +  if (WARN_ON(!list_empty(&domain->devices)))
> >> +  return;
> >>
> >
> > warning is good but it doesn't mean the driver shouldn't deal with
> > that situation to make it safer e.g. blocking DMA from all attached
> > device...
> 
> I have ever thought the same thing. :-)
> 
> Blocking DMA from attached device should be done when setting blocking
> domain to the device. It should not be part of freeing a domain.

yes but here we are talking about some bug scenario.

> 
> Here, the caller asks the driver to free the domain, but the driver
> finds that something is wrong. Therefore, it warns and returns directly.
> The domain will still be there in use until the next set_domain().
> 

at least it'd look safer if we always try to unmap the entire domain i.e.:

static void domain_exit(struct dmar_domain *domain)
{
-
-   /* Remove associated devices and clear attached or cached domains */
-   domain_remove_dev_info(domain);

if (domain->pgd) {
LIST_HEAD(freelist);

domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist);
put_pages_list(&freelist);
}

+   if (WARN_ON(!list_empty(&domain->devices)))
+   return;

kfree(domain);
}

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-06-01 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Wednesday, June 1, 2022 5:37 PM
> 
> On 2022/6/1 09:43, Tian, Kevin wrote:
> >> From: Jacob Pan
> >> Sent: Wednesday, June 1, 2022 1:30 AM
> >>>> In both cases the pasid is stored in the attach data instead of the
> >>>> domain.
> >>>>
> >> So during IOTLB flush for the domain, do we loop through the attach data?
> > Yes and it's required.
> 
> What does the attach data mean here? Do you mean group->pasid_array?

any structure describing the attaching relationship from {device, pasid} to
a domain

> 
> Why not tracking the {device, pasid} info in the iommu driver when
> setting domain to {device, pasid}? We have tracked device in a list when
> setting a domain to device.
> 

Yes, that tracking structure is the attach data. 😊
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 09/12] iommu/vt-d: Check device list of domain in domain free path

2022-06-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> When the IOMMU domain is about to be freed, it should not be set on any
> device. Instead of silently dealing with some bug cases, it's better to
> trigger a warning to report and fix any potential bugs at the first time.
> 


>  static void domain_exit(struct dmar_domain *domain)
>  {
> -
> - /* Remove associated devices and clear attached or cached domains
> */
> - domain_remove_dev_info(domain);
> + if (WARN_ON(!list_empty(&domain->devices)))
> + return;
> 

warning is good but it doesn't mean the driver shouldn't deal with
that situation to make it safer e.g. blocking DMA from all attached
device...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 07/12] iommu/vt-d: Acquiring lock in pasid manipulation helpers

2022-06-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> The iommu->lock is used to protect the per-IOMMU pasid directory table
> and pasid table. Move the spinlock acquisition/release into the helpers
> to make the code self-contained.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian , with one nit

> 
> - /* Caller must ensure PASID entry is not in use. */
> - if (pasid_pte_is_present(pte))
> - return -EBUSY;
> + spin_lock(&iommu->lock);
> + pte = get_non_present_pasid_entry(dev, pasid);
> + if (!pte) {
> + spin_unlock(&iommu->lock);
> + return -ENODEV;
> + }

I don't think above is a good abstraction and it changes the error
code for an present entry from -EBUSY to -ENODEV.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 06/12] iommu/vt-d: Acquiring lock in domain ID allocation helpers

2022-06-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> The iommu->lock is used to protect the per-IOMMU domain ID resource.
> Move the spinlock acquisition/release into the helpers where domain
> IDs are allocated and freed. The device_domain_lock is irrelevant to
> domain ID resources, remove its assertion as well.

while moving the lock you also replace spin_lock_irqsave() with spin_lock().
It'd be cleaner to just do movement here and then replace all _irqsave()
in patch 8.

> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 25 +
>  1 file changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 2d5f02b85de8..0da937ce0534 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1774,16 +1774,13 @@ static struct dmar_domain
> *alloc_domain(unsigned int type)
>   return domain;
>  }
> 
> -/* Must be called with iommu->lock */
>  static int domain_attach_iommu(struct dmar_domain *domain,
>  struct intel_iommu *iommu)
>  {
>   unsigned long ndomains;
> - int num;
> -
> - assert_spin_locked(&device_domain_lock);
> - assert_spin_locked(&iommu->lock);
> + int num, ret = 0;
> 
> + spin_lock(&iommu->lock);
>   domain->iommu_refcnt[iommu->seq_id] += 1;
>   if (domain->iommu_refcnt[iommu->seq_id] == 1) {
>   ndomains = cap_ndoms(iommu->cap);
> @@ -1792,7 +1789,8 @@ static int domain_attach_iommu(struct
> dmar_domain *domain,
>   if (num >= ndomains) {
>   pr_err("%s: No free domain ids\n", iommu->name);
>   domain->iommu_refcnt[iommu->seq_id] -= 1;
> - return -ENOSPC;
> + ret = -ENOSPC;
> + goto out_unlock;
>   }
> 
>   set_bit(num, iommu->domain_ids);
> @@ -1801,7 +1799,9 @@ static int domain_attach_iommu(struct
> dmar_domain *domain,
>   domain_update_iommu_cap(domain);
>   }
> 
> - return 0;
> +out_unlock:
> + spin_unlock(&iommu->lock);
> + return ret;
>  }
> 
>  static void domain_detach_iommu(struct dmar_domain *domain,
> @@ -1809,9 +1809,7 @@ static void domain_detach_iommu(struct
> dmar_domain *domain,
>  {
>   int num;
> 
> - assert_spin_locked(&device_domain_lock);
> - assert_spin_locked(&iommu->lock);
> -
> + spin_lock(&iommu->lock);
>   domain->iommu_refcnt[iommu->seq_id] -= 1;
>   if (domain->iommu_refcnt[iommu->seq_id] == 0) {
>   num = domain->iommu_did[iommu->seq_id];
> @@ -1819,6 +1817,7 @@ static void domain_detach_iommu(struct
> dmar_domain *domain,
>   domain_update_iommu_cap(domain);
>   domain->iommu_did[iommu->seq_id] = 0;
>   }
> + spin_unlock(&iommu->lock);
>  }
> 
>  static inline int guestwidth_to_adjustwidth(int gaw)
> @@ -2471,9 +2470,7 @@ static int domain_add_dev_info(struct
> dmar_domain *domain, struct device *dev)
> 
>   spin_lock_irqsave(&device_domain_lock, flags);
>   info->domain = domain;
> - spin_lock(&iommu->lock);
>   ret = domain_attach_iommu(domain, iommu);
> - spin_unlock(&iommu->lock);
>   if (ret) {
>   spin_unlock_irqrestore(&device_domain_lock, flags);
>   return ret;
> @@ -4158,7 +4155,6 @@ static void __dmar_remove_one_dev_info(struct
> device_domain_info *info)
>  {
>   struct dmar_domain *domain;
>   struct intel_iommu *iommu;
> - unsigned long flags;
> 
>   assert_spin_locked(&device_domain_lock);
> 
> @@ -4179,10 +4175,7 @@ static void __dmar_remove_one_dev_info(struct
> device_domain_info *info)
>   }
> 
>   list_del(&info->link);
> -
> - spin_lock_irqsave(&iommu->lock, flags);
>   domain_detach_iommu(domain, iommu);
> - spin_unlock_irqrestore(&iommu->lock, flags);
>  }
> 
>  static void dmar_remove_one_dev_info(struct device *dev)
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 05/12] iommu/vt-d: Unncessary spinlock for root table alloc and free

2022-06-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> The IOMMU root table is allocated and freed in the IOMMU initialization
> code in static boot or hot-plug paths. There's no need for a spinlock.

s/hot-plug/hot-remove/

> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/intel/iommu.c | 18 +-
>  1 file changed, 5 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index bbdd3417a1b1..2d5f02b85de8 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -809,14 +809,12 @@ static int device_context_mapped(struct
> intel_iommu *iommu, u8 bus, u8 devfn)
> 
>  static void free_context_table(struct intel_iommu *iommu)
>  {
> - int i;
> - unsigned long flags;
>   struct context_entry *context;
> + int i;
> +
> + if (!iommu->root_entry)
> + return;
> 
> - spin_lock_irqsave(&iommu->lock, flags);
> - if (!iommu->root_entry) {
> - goto out;
> - }
>   for (i = 0; i < ROOT_ENTRY_NR; i++) {
>   context = iommu_context_addr(iommu, i, 0, 0);
>   if (context)
> @@ -828,12 +826,10 @@ static void free_context_table(struct intel_iommu
> *iommu)
>   context = iommu_context_addr(iommu, i, 0x80, 0);
>   if (context)
>   free_pgtable_page(context);
> -
>   }
> +
>   free_pgtable_page(iommu->root_entry);
>   iommu->root_entry = NULL;
> -out:
> - spin_unlock_irqrestore(&iommu->lock, flags);
>  }
> 
>  #ifdef CONFIG_DMAR_DEBUG
> @@ -1232,7 +1228,6 @@ static void domain_unmap(struct dmar_domain
> *domain, unsigned long start_pfn,
>  static int iommu_alloc_root_entry(struct intel_iommu *iommu)
>  {
>   struct root_entry *root;
> - unsigned long flags;
> 
>   root = (struct root_entry *)alloc_pgtable_page(iommu->node);
>   if (!root) {
> @@ -1242,10 +1237,7 @@ static int iommu_alloc_root_entry(struct
> intel_iommu *iommu)
>   }
> 
>   __iommu_flush_cache(iommu, root, ROOT_SIZE);
> -
> - spin_lock_irqsave(&iommu->lock, flags);
>   iommu->root_entry = root;
> - spin_unlock_irqrestore(&iommu->lock, flags);
> 
>   return 0;
>  }
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 04/12] iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()

2022-06-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> Use pci_get_domain_bus_and_slot() instead of searching the global list
> to retrieve the pci device pointer. This removes device_domain_list
> global list as there are no consumers anymore.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/intel/iommu.h |  1 -
>  drivers/iommu/intel/iommu.c | 33 ++---
>  2 files changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 2f4a5b9509c0..6724703d573b 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -609,7 +609,6 @@ struct intel_iommu {
>  /* PCI domain-device relationship */
>  struct device_domain_info {
>   struct list_head link;  /* link to domain siblings */
> - struct list_head global; /* link to global list */
>   struct list_head table; /* link to pasid table */
>   u32 segment;/* PCI segment number */
>   u8 bus; /* PCI bus number */
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 25d4c5200526..bbdd3417a1b1 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -131,8 +131,6 @@ static struct intel_iommu **g_iommus;
> 
>  static void __init check_tylersburg_isoch(void);
>  static int rwbf_quirk;
> -static inline struct device_domain_info *
> -dmar_search_domain_by_dev_info(int segment, int bus, int devfn);
> 
>  /*
>   * set to 1 to panic kernel if can't successfully enable VT-d
> @@ -315,7 +313,6 @@ static int iommu_skip_te_disable;
>  #define IDENTMAP_AZALIA  4
> 
>  static DEFINE_SPINLOCK(device_domain_lock);
> -static LIST_HEAD(device_domain_list);
>  const struct iommu_ops intel_iommu_ops;
> 
>  static bool translation_pre_enabled(struct intel_iommu *iommu)
> @@ -845,9 +842,14 @@ static void pgtable_walk(struct intel_iommu
> *iommu, unsigned long pfn, u8 bus, u
>   struct device_domain_info *info;
>   struct dma_pte *parent, *pte;
>   struct dmar_domain *domain;
> + struct pci_dev *pdev;
>   int offset, level;
> 
> - info = dmar_search_domain_by_dev_info(iommu->segment, bus,
> devfn);
> + pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn);
> + if (!pdev)
> + return;
> +
> + info = dev_iommu_priv_get(&pdev->dev);
>   if (!info || !info->domain) {
>   pr_info("device [%02x:%02x.%d] not probed\n",
>   bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> @@ -2348,19 +2350,6 @@ static void domain_remove_dev_info(struct
> dmar_domain *domain)
>   spin_unlock_irqrestore(&device_domain_lock, flags);
>  }
> 
> -static inline struct device_domain_info *
> -dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
> -{
> - struct device_domain_info *info;
> -
> - list_for_each_entry(info, &device_domain_list, global)
> - if (info->segment == segment && info->bus == bus &&
> - info->devfn == devfn)
> - return info;
> -
> - return NULL;
> -}
> -
>  static int domain_setup_first_level(struct intel_iommu *iommu,
>   struct dmar_domain *domain,
>   struct device *dev,
> @@ -4564,7 +4553,6 @@ static struct iommu_device
> *intel_iommu_probe_device(struct device *dev)
>   struct pci_dev *pdev = dev_is_pci(dev) ? to_pci_dev(dev) : NULL;
>   struct device_domain_info *info;
>   struct intel_iommu *iommu;
> - unsigned long flags;
>   u8 bus, devfn;
> 
>   iommu = device_to_iommu(dev, &bus, &devfn);
> @@ -4607,10 +4595,7 @@ static struct iommu_device
> *intel_iommu_probe_device(struct device *dev)
>   }
>   }
> 
> - spin_lock_irqsave(&device_domain_lock, flags);
> - list_add(&info->global, &device_domain_list);
>   dev_iommu_priv_set(dev, info);
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> 
>   return &iommu->iommu;
>  }
> @@ -4618,15 +4603,9 @@ static struct iommu_device
> *intel_iommu_probe_device(struct device *dev)
>  static void intel_iommu_release_device(struct device *dev)
>  {
>   struct device_domain_info *info = dev_iommu_priv_get(dev);
> - unsigned long flags;
> 
>   dmar_remove_one_dev_info(dev);
> -
> - spin_lock_irqsave(&device_domain_lock, flags);
>   dev_iommu_priv_set(dev, NULL);
> - list_del(&info->global);
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> -
>   kfree(info);
>   set_dma_ops(dev, NULL);
>  }
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 02/12] iommu/vt-d: Remove for_each_device_domain()

2022-06-01 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> The per-device device_domain_info data could be retrieved from the
> device itself. There's no need to search a global list.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/intel/iommu.h |  2 --
>  drivers/iommu/intel/iommu.c | 25 -
>  drivers/iommu/intel/pasid.c | 37 +++--
>  3 files changed, 11 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 8a6d64d726c0..2f4a5b9509c0 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -727,8 +727,6 @@ extern int dmar_ir_support(void);
>  void *alloc_pgtable_page(int node);
>  void free_pgtable_page(void *vaddr);
>  struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
> -int for_each_device_domain(int (*fn)(struct device_domain_info *info,
> -  void *data), void *data);
>  void iommu_flush_write_buffer(struct intel_iommu *iommu);
>  int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device
> *dev);
>  struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8
> *devfn);
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index cacae8bdaa65..6549b09d7f32 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -316,31 +316,6 @@ static int iommu_skip_te_disable;
> 
>  static DEFINE_SPINLOCK(device_domain_lock);
>  static LIST_HEAD(device_domain_list);
> -
> -/*
> - * Iterate over elements in device_domain_list and call the specified
> - * callback @fn against each element.
> - */
> -int for_each_device_domain(int (*fn)(struct device_domain_info *info,
> -  void *data), void *data)
> -{
> - int ret = 0;
> - unsigned long flags;
> - struct device_domain_info *info;
> -
> - spin_lock_irqsave(&device_domain_lock, flags);
> - list_for_each_entry(info, &device_domain_list, global) {
> - ret = fn(info, data);
> - if (ret) {
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> - return ret;
> - }
> - }
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> -
> - return 0;
> -}
> -
>  const struct iommu_ops intel_iommu_ops;
> 
>  static bool translation_pre_enabled(struct intel_iommu *iommu)
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index b2ac5869286f..0627d6465f25 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -103,36 +103,20 @@ device_detach_pasid_table(struct
> device_domain_info *info,
>  }
> 
>  struct pasid_table_opaque {
> - struct pasid_table  **pasid_table;
> - int segment;
> - int bus;
> - int devfn;
> + struct pasid_table  *pasid_table;
>  };
> 
> -static int search_pasid_table(struct device_domain_info *info, void *opaque)
> -{
> - struct pasid_table_opaque *data = opaque;
> -
> - if (info->iommu->segment == data->segment &&
> - info->bus == data->bus &&
> - info->devfn == data->devfn &&
> - info->pasid_table) {
> - *data->pasid_table = info->pasid_table;
> - return 1;
> - }
> -
> - return 0;
> -}
> -
>  static int get_alias_pasid_table(struct pci_dev *pdev, u16 alias, void
> *opaque)
>  {
>   struct pasid_table_opaque *data = opaque;
> + struct device_domain_info *info;
> 
> - data->segment = pci_domain_nr(pdev->bus);
> - data->bus = PCI_BUS_NUM(alias);
> - data->devfn = alias & 0xff;
> + info = dev_iommu_priv_get(&pdev->dev);
> + if (!info || !info->pasid_table)
> + return 0;
> 
> - return for_each_device_domain(&search_pasid_table, data);
> + data->pasid_table = info->pasid_table;
> + return 1;
>  }
> 
>  /*
> @@ -141,9 +125,9 @@ static int get_alias_pasid_table(struct pci_dev *pdev,
> u16 alias, void *opaque)
>   */
>  int intel_pasid_alloc_table(struct device *dev)
>  {
> + struct pasid_table_opaque data = { NULL };
>   struct device_domain_info *info;
>   struct pasid_table *pasid_table;
> - struct pasid_table_opaque data;
>   struct page *pages;
>   u32 max_pasid = 0;
>   int ret, order;
> @@ -155,11 +139,12 @@ int intel_pasid_alloc_table(struct device *dev)
>   return -EINVAL;
> 
>   /* DMA alias device already has a pasid table, use it: */
> - data.pasid_table = &pasid_table;
>   ret = pci_for_each_dma_alias(to_pci_dev(dev),
>&get_alias_pasid_table, &data);
> - if (ret)
> + if (ret) {
> + pasid_table = data.pasid_table;
>   goto attach_out;
> + }
> 
>   pasid_table = kzalloc(sizeof(*pasid_table), GFP_KERNEL);
>   if (!pasid_table)
> --
> 2.25.1

RE: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-06-01 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Wednesday, June 1, 2022 7:11 AM
> 
> On Tue, May 31, 2022 at 10:22:32PM +0100, Robin Murphy wrote:
> 
> > There are only 3 instances where we'll free a table while the domain is
> > live. The first is the one legitimate race condition, where two map requests
> > targeting relatively nearby PTEs both go to fill in an intermediate level of
> > table; whoever loses that race frees the table they allocated, but it was
> > never visible to anyone else so that's definitely fine. The second is if
> > we're mapping a block entry, and find that there's already a table entry
> > there, wherein we assume the table must be empty, clear the entry,
> > invalidate any walk caches, install the block entry, then free the orphaned
> > table; since we're mapping the entire IOVA range covered by that table,
> > there should be no other operations on that IOVA range attempting to walk
> > the table at the same time, so it's fine.
> 
> I saw these two in the Intel driver
> 
> > The third is effectively the inverse, if we get a block-sized unmap
> > but find a table entry rather than a block at that point (on the
> > assumption that it's de-facto allowed for a single unmap to cover
> > multiple adjacent mappings as long as it does so exactly); similarly
> > we assume that the table must be full, and no other operations
> > should be racing because we're unmapping its whole IOVA range, so we
> > remove the table entry, invalidate, and free as before.
> 
> Not sure I noticed this one though
> 
> This all it all makes sense though.

Intel driver also does this. See dma_pte_clear_level():

/* If range covers entire pagetable, free it */
if (start_pfn <= level_pfn &&
 last_pfn >= level_pfn + level_size(level) - 1) {
...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-31 Thread Tian, Kevin

> From: Jacob Pan 
> Sent: Wednesday, June 1, 2022 4:44 AM
> 
> Hi Jason,
> 
> On Tue, 31 May 2022 16:05:50 -0300, Jason Gunthorpe 
> wrote:
> 
> > On Tue, May 31, 2022 at 10:29:55AM -0700, Jacob Pan wrote:
> >
> > > The reason why I store PASID at IOMMU domain is for IOTLB flush within
> > > the domain. Device driver is not aware of domain level IOTLB flush. We
> > > also have iova_cookie for each domain which essentially is for
> > > RIDPASID.
> >
> > You need to make the PASID stuff work generically.
> >
> > The domain needs to hold a list of all the places it needs to flush
> > and that list needs to be maintained during attach/detach.
> >
> > A single PASID on the domain is obviously never going to work
> > generically.
> >
> I agree, I did it this way really meant to be part of iommu_domain's
> iommu_dma_cookie, not meant to be global. But for the lack of common
> storage between identity domain and dma domain, I put it here as global.
> 
> Then should we also extract RIDPASID to become part of the generic API?
> i.e. set pasid, flush IOTLB etc. Right? RIDPASID is not in group's
> pasid_array today.
> 

RIDPASID is just an alias to RID in the PASID table (similar to pasid#0
on other platforms). it's reserved and not exposed outside the 
intel-iommu driver. So I don't think it should be managed via the
generic iommu API. But probably you can generalize it with other
pasids within intel-iommu driver if doing so can simplify the logic
there.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-31 Thread Tian, Kevin

> From: Jacob Pan 
> Sent: Wednesday, June 1, 2022 1:30 AM
> > >
> > > In both cases the pasid is stored in the attach data instead of the
> > > domain.
> > >
> So during IOTLB flush for the domain, do we loop through the attach data?

Yes and it's required.

> 
> > > DMA API pasid is no special from above except it needs to allow
> > > one device attached to the same domain twice (one with RID
> > > and the other with RID+PASID).
> > >
> > > for iommufd those operations are initiated by userspace via
> > > iommufd uAPI.
> >
> > My understanding is that device driver owns its PASID policy. If ENQCMD
> > is supported on the device, the PASIDs should be allocated through
> > ioasid_alloc(). Otherwise, the whole PASID pool is managed by the device
> > driver.
> >
> It seems the changes we want for this patchset are:
> 1. move ioasid_alloc() from the core to device (allocation scope will be
> based on whether ENQCMD is intended or not)

yes, and the driver can specify whether the allocation is system-wide
or per-device.

> 2. store pasid in the attach data
> 3. use the same iommufd api to attach/set pasid on its default domain

s/iommufd/iommu/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-31 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Monday, May 30, 2022 8:23 PM
> 
> On Tue, May 24, 2022 at 08:17:27AM -0700, Jacob Pan wrote:
> > Hi Jason,
> >
> > On Tue, 24 May 2022 10:50:34 -0300, Jason Gunthorpe 
> wrote:
> >
> > > On Wed, May 18, 2022 at 11:21:15AM -0700, Jacob Pan wrote:
> > > > DMA requests tagged with PASID can target individual IOMMU domains.
> > > > Introduce a domain-wide PASID for DMA API, it will be used on the
> same
> > > > mapping as legacy DMA without PASID. Let it be IOVA or PA in case of
> > > > identity domain.
> > >
> > > Huh? I can't understand what this is trying to say or why this patch
> > > makes sense.
> > >
> > > We really should not have pasid's like this attached to the domains..
> > >
> > This is the same "DMA API global PASID" you reviewed in v3, I just
> > singled it out as a standalone patch and renamed it. Here is your previous
> > review comment.
> >
> > > +++ b/include/linux/iommu.h
> > > @@ -105,6 +105,8 @@ struct iommu_domain {
> > >   enum iommu_page_response_code (*iopf_handler)(struct
> iommu_fault *fault,
> > > void *data);
> > >   void *fault_data;
> > > + ioasid_t pasid; /* Used for DMA requests with PASID */
> > > + atomic_t pasid_users;
> >
> > These are poorly named, this is really the DMA API global PASID and
> > shouldn't be used for other things.
> >
> >
> >
> > Perhaps I misunderstood, do you mind explaining more?
> 
> You still haven't really explained what this is for in this patch,
> maybe it just needs a better commit message, or maybe something is
> wrong.
> 
> I keep saying the DMA API usage is not special, so why do we need to
> create a new global pasid and refcount? Realistically this is only
> going to be used by IDXD, why can't we just allocate a PASID and
> return it to the driver every time a driver asks for DMA API on PASI
> mode? Why does the core need to do anything special?
> 

Agree. I guess it was a mistake caused by treating ENQCMD as the
only user although the actual semantics of the invented interfaces
have already evolved to be quite general.

This is very similar to what we have been discussing for iommufd.
a PASID is just an additional routing info when attaching a device
to an I/O address space (DMA API in this context) and by default
it should be a per-device resource except when ENQCMD is
explicitly opt in.

Hence it's right time for us to develop common facility working
for both this DMA API usage and iommufd, i.e.:

for normal PASID attach to a domain, driver:

allocates a local pasid from device local space;
attaches the local pasid to a domain;

for PASID attach in particular for ENQCMD, driver:

allocates a global pasid in system-wide;
attaches the global pasid to a domain;
set the global pasid in PASID_MSR;

In both cases the pasid is stored in the attach data instead of the
domain.

DMA API pasid is no special from above except it needs to allow
one device attached to the same domain twice (one with RID
and the other with RID+PASID).

for iommufd those operations are initiated by userspace via
iommufd uAPI.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 1/3] iommu: Add Visconti5 IOMMU driver

2022-05-25 Thread Tian, Kevin

> From: Jason Gunthorpe
> Sent: Thursday, May 26, 2022 2:27 AM
> 
> On Wed, May 25, 2022 at 02:26:37PM +0800, Baolu Lu wrote:
> > On 2022/5/25 09:31, Nobuhiro Iwamatsu wrote:
> > > +static const struct iommu_ops visconti_atu_ops = {
> > > + .domain_alloc = visconti_atu_domain_alloc,
> > > + .probe_device = visconti_atu_probe_device,
> > > + .release_device = visconti_atu_release_device,
> > > + .device_group = generic_device_group,
> > > + .of_xlate = visconti_atu_of_xlate,
> > > + .pgsize_bitmap = ATU_IOMMU_PGSIZE_BITMAP,
> > > + .default_domain_ops = &(const struct iommu_domain_ops) {
> > > + .attach_dev = visconti_atu_attach_device,
> > > + .detach_dev = visconti_atu_detach_device,
> >
> > The detach_dev callback is about to be deprecated. The new drivers
> > should implement the default domain and blocking domain instead.
> 
> Yes please, new drivers need to use default_domains.
> 
> It is very strange that visconti_atu_detach_device() does nothing.  It
> is not required that a domain is fully unmapped before being
> destructed, I think detach should set ATU_AT_EN to 0.

Looks the atu is shared by all devices behind and can only serve one
I/O address space. The atu registers only keep information about
iova ranges without any device notation. That is probably the reason 
why both attach/detach() don't touch hardware.

iiuc then this suggests there should be only one iommu group per atu,
instead of using generic_device_group() to create one group per
device.

> 
> What behavior does the HW have when ATU_AT_ENTRY_EN == 0? If DMA is

I guess it's a blocking behavior since that register tracks which iova range
register is valid. 

> rejected then this driver should have a IOMMU_DOMAIN_BLOCKING and
> return that from ops->def_domain_type().

BLOCKING should not be used as a default domain type for DMA API
which needs either a DMA or IDENTITY type.

> 
> Attaching a the blocking domain should set ATU_AT_ENTRY_EN = 0

Agree

> 
> Also, if I surmise how this works properly, it is not following the
> iommu API to halt all DMA during map/unmap operations. Should at least
> document this and explain why it is OK..
> 
> I'm feeling like these "special" drivers need some kind of handshake
> with their only users because they don't work with things like VFIO..
> 
> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v7 06/10] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

2022-05-24 Thread Tian, Kevin

> From: Jean-Philippe Brucker 
> Sent: Tuesday, May 24, 2022 6:58 PM
> 
> On Tue, May 24, 2022 at 10:22:28AM +, Tian, Kevin wrote:
> > > From: Lu Baolu 
> > > Sent: Thursday, May 19, 2022 3:21 PM
> > >
> > > The existing iommu SVA interfaces are implemented by calling the SVA
> > > specific iommu ops provided by the IOMMU drivers. There's no need for
> > > any SVA specific ops in iommu_ops vector anymore as we can achieve
> > > this through the generic attach/detach_dev_pasid domain ops.
> >
> > set/block_pasid_dev, to be consistent.
> >
> > > +
> > > + mutex_lock(&iommu_sva_lock);
> > > + /* Search for an existing domain. */
> > > + domain = iommu_get_domain_for_dev_pasid(dev, mm->pasid);
> > > + if (domain) {
> > > + sva_domain = to_sva_domain(domain);
> > > + refcount_inc(&sva_domain->bond.users);
> > > + goto out_success;
> > > + }
> > > +
> >
> > why would one device/pasid be bound to a mm more than once?
> 
> Device drivers can call bind() multiple times for the same device and mm,
> for example if one process wants to open multiple accelerator queues.
> 

Is it clearer to have a sva_bond_get/put() pair instead of calling
bind() multiple times here? 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Tuesday, May 24, 2022 9:39 PM
> 
> On Tue, May 24, 2022 at 09:39:52AM +, Tian, Kevin wrote:
> > > From: Lu Baolu 
> > > Sent: Thursday, May 19, 2022 3:21 PM
> > >
> > > The iommu_sva_domain represents a hardware pagetable that the
> IOMMU
> > > hardware could use for SVA translation. This adds some infrastructure
> > > to support SVA domain in the iommu common layer. It includes:
> > >
> > > - Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA
> > > domain
> > >   type.
> > > - Add a new domain ops pointer in iommu_ops. The IOMMU drivers that
> > >   support SVA should provide the callbacks.
> > > - Add helpers to allocate and free an SVA domain.
> > > - Add helpers to set an SVA domain to a device and the reverse
> > >   operation.
> > >
> > > Some buses, like PCI, route packets without considering the PASID value.
> > > Thus a DMA target address with PASID might be treated as P2P if the
> > > address falls into the MMIO BAR of other devices in the group. To make
> > > things simple, the attach/detach interfaces only apply to devices
> > > belonging to the singleton groups, and the singleton is immutable in
> > > fabric i.e. not affected by hotplug.
> > >
> > > The iommu_set/block_device_pasid() can be used for other purposes,
> > > such as kernel DMA with pasid, mediation device, etc. Hence, it is put
> > > in the iommu.c.
> >
> > usually we have 'set/clear' pair or 'allow/block'. Having 'set' paired
> > with 'block' doesn't read very clearly.
> 
> I thought we agreed we'd use the blocking domain for this? Why did it
> go back to an op?
> 

Probably it's based on following discussion:

https://lore.kernel.org/all/c8492b29-bc27-ae12-d5c4-9fbbc797e...@linux.intel.com/

--
> FWIW from my point of view I'm happy with having a .detach_dev_pasid op
> meaning implicitly-blocked access for now. 

If this is the path then lets not call it attach/detach
please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer
names.
--

Looks Baolu chooses this path and plans to use the blocking domain
later.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v7 07/10] iommu: Remove SVA related callbacks from iommu ops

2022-05-24 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Thursday, May 19, 2022 3:21 PM
> 
> These ops'es have been replaced with the dev_attach/detach_pasid domain
> ops'es. There's no need for them anymore. Remove them to avoid dead
> code.
> 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jean-Philippe Brucker 

Reviewed-by: Kevin Tian 

> ---
>  include/linux/intel-iommu.h   |  3 --
>  include/linux/iommu.h |  7 ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 16 --
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 40 ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  3 --
>  drivers/iommu/intel/iommu.c   |  3 --
>  drivers/iommu/intel/svm.c | 49 ---
>  7 files changed, 121 deletions(-)
> 
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 5e88eaa245aa..536f229fd274 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -739,9 +739,6 @@ struct intel_iommu *device_to_iommu(struct device
> *dev, u8 *bus, u8 *devfn);
>  extern void intel_svm_check(struct intel_iommu *iommu);
>  extern int intel_svm_enable_prq(struct intel_iommu *iommu);
>  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
> *mm);
> -void intel_svm_unbind(struct iommu_sva *handle);
> -u32 intel_svm_get_pasid(struct iommu_sva *handle);
>  int intel_svm_page_response(struct device *dev, struct iommu_fault_event
> *evt,
>   struct iommu_page_response *msg);
>  int intel_svm_attach_dev_pasid(struct iommu_domain *domain,
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index d9ac5ebe5bbb..e4ce2fe0e144 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -212,9 +212,6 @@ struct iommu_iotlb_gather {
>   * @dev_has/enable/disable_feat: per device entries to
> check/enable/disable
>   *   iommu specific features.
>   * @dev_feat_enabled: check enabled feature
> - * @sva_bind: Bind process address space to device
> - * @sva_unbind: Unbind process address space from device
> - * @sva_get_pasid: Get PASID associated to a SVA handle
>   * @page_response: handle page request response
>   * @def_domain_type: device default domain type, return value:
>   *   - IOMMU_DOMAIN_IDENTITY: must use an identity domain
> @@ -248,10 +245,6 @@ struct iommu_ops {
>   int (*dev_enable_feat)(struct device *dev, enum
> iommu_dev_features f);
>   int (*dev_disable_feat)(struct device *dev, enum
> iommu_dev_features f);
> 
> - struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct
> *mm);
> - void (*sva_unbind)(struct iommu_sva *handle);
> - u32 (*sva_get_pasid)(struct iommu_sva *handle);
> -
>   int (*page_response)(struct device *dev,
>struct iommu_fault_event *evt,
>struct iommu_page_response *msg);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index ec77f6a51ff9..0f0f5ba26dd5 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -754,9 +754,6 @@ bool arm_smmu_master_sva_enabled(struct
> arm_smmu_master *master);
>  int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
>  int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
>  bool arm_smmu_master_iopf_supported(struct arm_smmu_master
> *master);
> -struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct
> mm_struct *mm);
> -void arm_smmu_sva_unbind(struct iommu_sva *handle);
> -u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
>  void arm_smmu_sva_notifier_synchronize(void);
>  int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
> struct device *dev, ioasid_t id);
> @@ -793,19 +790,6 @@ static inline bool
> arm_smmu_master_iopf_supported(struct arm_smmu_master *master
>   return false;
>  }
> 
> -static inline struct iommu_sva *
> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> -{
> - return ERR_PTR(-ENODEV);
> -}
> -
> -static inline void arm_smmu_sva_unbind(struct iommu_sva *handle) {}
> -
> -static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
> -{
> - return IOMMU_PASID_INVALID;
> -}
> -
>  static inline void arm_smmu_sva_notifier_synchronize(void) {}
>  #endif /* CONFIG_ARM_SMMU_V3_SVA */
>  #endif /* _ARM_SMMU_V3_H */
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index 6969974ca89e..8290d66569f3 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -344,11 +344,6 @@ __arm_smmu_sva_bind(struct device *dev, struct
> mm_struct *mm)
>   if (!bond)
>   return ERR_PTR(-ENOMEM);
> 
> - /* Allocate a PASID for this mm if

RE: [PATCH v7 06/10] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

2022-05-24 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Thursday, May 19, 2022 3:21 PM
> 
> The existing iommu SVA interfaces are implemented by calling the SVA
> specific iommu ops provided by the IOMMU drivers. There's no need for
> any SVA specific ops in iommu_ops vector anymore as we can achieve
> this through the generic attach/detach_dev_pasid domain ops.

set/block_pasid_dev, to be consistent.

> +
> + mutex_lock(&iommu_sva_lock);
> + /* Search for an existing domain. */
> + domain = iommu_get_domain_for_dev_pasid(dev, mm->pasid);
> + if (domain) {
> + sva_domain = to_sva_domain(domain);
> + refcount_inc(&sva_domain->bond.users);
> + goto out_success;
> + }
> +

why would one device/pasid be bound to a mm more than once?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Tian, Kevin

> From: Baolu Lu 
> Sent: Monday, May 23, 2022 3:13 PM
> > @@ -254,6 +259,7 @@ struct iommu_ops {
> > int (*def_domain_type)(struct device *dev);
> >
> > const struct iommu_domain_ops *default_domain_ops;
> > +   const struct iommu_domain_ops *sva_domain_ops;
> 
> Per Joerg's comment in anther thread,
> 
> https://lore.kernel.org/linux-iommu/yodvj7ervpidw...@8bytes.org/
> 
> adding a sva_domain_ops here is not the right way to go.
> 
> If no objection, I will make the sva domain go through the
> generic domain_alloc/free() callbacks in the next version.
> 

suppose it's just back to what v1-v6 did which all registered the ops
in domain_alloc() callback?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Thursday, May 19, 2022 3:21 PM
> 
> The iommu_sva_domain represents a hardware pagetable that the IOMMU
> hardware could use for SVA translation. This adds some infrastructure
> to support SVA domain in the iommu common layer. It includes:
> 
> - Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA
> domain
>   type.
> - Add a new domain ops pointer in iommu_ops. The IOMMU drivers that
>   support SVA should provide the callbacks.
> - Add helpers to allocate and free an SVA domain.
> - Add helpers to set an SVA domain to a device and the reverse
>   operation.
> 
> Some buses, like PCI, route packets without considering the PASID value.
> Thus a DMA target address with PASID might be treated as P2P if the
> address falls into the MMIO BAR of other devices in the group. To make
> things simple, the attach/detach interfaces only apply to devices
> belonging to the singleton groups, and the singleton is immutable in
> fabric i.e. not affected by hotplug.
> 
> The iommu_set/block_device_pasid() can be used for other purposes,
> such as kernel DMA with pasid, mediation device, etc. Hence, it is put
> in the iommu.c.

usually we have 'set/clear' pair or 'allow/block'. Having 'set' paired
with 'block' doesn't read very clearly.

> +static bool device_group_immutable_singleton(struct device *dev)
> +{
> + struct iommu_group *group = iommu_group_get(dev);

what about passing group as the parameter since the caller will
get the group again right after calling this function? In that case
the function could be renamed as:

iommu_group_immutable_singleton()

or be shorter:

iommu_group_fixed_singleton()

> + int count;
> +
> + if (!group)
> + return false;
> +
> + mutex_lock(&group->mutex);
> + count = iommu_group_device_count(group);
> + mutex_unlock(&group->mutex);
> + iommu_group_put(group);
> +
> + if (count != 1)
> + return false;

For non-pci devices above doesn't check anything against immutable.
Please add some comment to explain why doing so is correct.

> +
> + /*
> +  * The PCI device could be considered to be fully isolated if all
> +  * devices on the path from the device to the host-PCI bridge are
> +  * protected from peer-to-peer DMA by ACS.
> +  */
> + if (dev_is_pci(dev))
> + return pci_acs_path_enabled(to_pci_dev(dev), NULL,
> + REQ_ACS_FLAGS);
> +
> + return true;
> +}
> +
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v7 02/10] iommu: Remove SVM_FLAG_SUPERVISOR_MODE support

2022-05-24 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Thursday, May 19, 2022 3:21 PM
> 
> The current kernel DMA with PASID support is based on the SVA with a flag
> SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel
> memory address
> space to a PASID of the device. The device driver programs the device with
> kernel virtual address (KVA) for DMA access. There have been security and
> functional issues with this approach:
> 
> - The lack of IOTLB synchronization upon kernel page table updates.
>   (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
> - Other than slight more protection, using kernel virtual address (KVA)
>   has little advantage over physical address. There are also no use
>   cases yet where DMA engines need kernel virtual addresses for in-kernel
>   DMA.
> 
> This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU
> interface.
> The device drivers are suggested to handle kernel DMA with PASID through
> the kernel DMA APIs.
> 
> The drvdata parameter in iommu_sva_bind_device() and all callbacks is not
> needed anymore. Cleanup them as well.
> 
> Link: https://lore.kernel.org/linux-
> iommu/20210511194726.gp1002...@nvidia.com/
> Signed-off-by: Jacob Pan 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 

> ---
>  include/linux/intel-iommu.h   |  3 +-
>  include/linux/intel-svm.h | 13 -
>  include/linux/iommu.h |  8 +--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  5 +-
>  drivers/dma/idxd/cdev.c   |  2 +-
>  drivers/dma/idxd/init.c   | 24 +---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  3 +-
>  drivers/iommu/intel/svm.c | 57 +--
>  drivers/iommu/iommu.c |  5 +-
>  drivers/misc/uacce/uacce.c|  2 +-
>  10 files changed, 26 insertions(+), 96 deletions(-)
> 
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 4f29139bbfc3..df23300cfa88 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -739,8 +739,7 @@ struct intel_iommu *device_to_iommu(struct device
> *dev, u8 *bus, u8 *devfn);
>  extern void intel_svm_check(struct intel_iommu *iommu);
>  extern int intel_svm_enable_prq(struct intel_iommu *iommu);
>  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
> *mm,
> -  void *drvdata);
> +struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
> *mm);
>  void intel_svm_unbind(struct iommu_sva *handle);
>  u32 intel_svm_get_pasid(struct iommu_sva *handle);
>  int intel_svm_page_response(struct device *dev, struct iommu_fault_event
> *evt,
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index 207ef06ba3e1..f9a0d44f6fdb 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -13,17 +13,4 @@
>  #define PRQ_RING_MASK((0x1000 << PRQ_ORDER) - 0x20)
>  #define PRQ_DEPTH((0x1000 << PRQ_ORDER) >> 5)
> 
> -/*
> - * The SVM_FLAG_SUPERVISOR_MODE flag requests a PASID which can be
> used only
> - * for access to kernel addresses. No IOTLB flushes are automatically done
> - * for kernel mappings; it is valid only for access to the kernel's static
> - * 1:1 mapping of physical memory — not to vmalloc or even module
> mappings.
> - * A future API addition may permit the use of such ranges, by means of an
> - * explicit IOTLB flush call (akin to the DMA API's unmap method).
> - *
> - * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
> - * do such IOTLB flushes automatically.
> - */
> -#define SVM_FLAG_SUPERVISOR_MODE BIT(0)
> -
>  #endif /* __INTEL_SVM_H__ */
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index da423e87f248..0c358b7c583b 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -243,8 +243,7 @@ struct iommu_ops {
>   int (*dev_enable_feat)(struct device *dev, enum
> iommu_dev_features f);
>   int (*dev_disable_feat)(struct device *dev, enum
> iommu_dev_features f);
> 
> - struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct
> *mm,
> -   void *drvdata);
> + struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct
> *mm);
>   void (*sva_unbind)(struct iommu_sva *handle);
>   u32 (*sva_get_pasid)(struct iommu_sva *handle);
> 
> @@ -667,8 +666,7 @@ int iommu_dev_disable_feature(struct device *dev,
> enum iommu_dev_features f);
>  bool iommu_dev_feature_enabled(struct device *dev, enum
> iommu_dev_features f);
> 
>  struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> - struct mm_struct *mm,
> - void *drvdata);
> + struct mm_struct *mm);
>  void iommu_sva_unbind_device(struct iommu_sva *ha

RE: [PATCH v7 01/10] iommu: Add pasids field in struct iommu_device

2022-05-24 Thread Tian, Kevin

> From: Lu Baolu 
> Sent: Thursday, May 19, 2022 3:21 PM
> 
> Use this field to keep the number of supported PASIDs that an IOMMU
> hardware is able to support. This is a generic attribute of an IOMMU
> and lifting it into the per-IOMMU device structure makes it possible
> to allocate a PASID for device without calls into the IOMMU drivers.
> Any iommu driver which suports PASID related features should set this
> field before enabling them on the devices.
> 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jean-Philippe Brucker 
> ---
>  include/linux/iommu.h   | 2 ++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
>  drivers/iommu/intel/dmar.c  | 4 
>  3 files changed, 7 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 5e1afe169549..da423e87f248 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -318,12 +318,14 @@ struct iommu_domain_ops {
>   * @list: Used by the iommu-core to keep a list of registered iommus
>   * @ops: iommu-ops for talking to this iommu
>   * @dev: struct device for sysfs handling
> + * @pasids: number of supported PASIDs
>   */
>  struct iommu_device {
>   struct list_head list;
>   const struct iommu_ops *ops;
>   struct fwnode_handle *fwnode;
>   struct device *dev;
> + u32 pasids;

max_pasid or nr_pasids?

>  };
> 
>  /**
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 88817a3376ef..6e2cd082c670 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3546,6 +3546,7 @@ static int arm_smmu_device_hw_probe(struct
> arm_smmu_device *smmu)
>   /* SID/SSID sizes */
>   smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
>   smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
> + smmu->iommu.pasids = smmu->ssid_bits;
> 
>   /*
>* If the SMMU supports fewer bits than would fill a single L2 stream
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 4de960834a1b..1c3cf267934d 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1126,6 +1126,10 @@ static int alloc_iommu(struct dmar_drhd_unit
> *drhd)
> 
>   raw_spin_lock_init(&iommu->register_lock);
> 
> + /* Supports full 20-bit PASID in scalable mode. */
> + if (ecap_pasid(iommu->ecap))
> + iommu->iommu.pasids = 1UL << 20;
> +

supported pasid bits is reported by ecap_pss(). I don't think we should
assume 20bits here.

>   /*
>* This is only for hotplug; at boot time intel_iommu_enabled won't
>* be set yet. When intel_iommu_init() runs, it registers the units
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-23 Thread Tian, Kevin

> From: Tian, Kevin
> Sent: Monday, May 23, 2022 3:55 PM
> 
> > From: Jacob Pan 
> > +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> > iommu_domain *domain)
> > +{
> > +   struct iommu_domain *tdomain;
> > +   struct iommu_group *group;
> > +   unsigned long index;
> > +   ioasid_t pasid = INVALID_IOASID;
> > +
> > +   group = iommu_group_get(dev);
> > +   if (!group)
> > +   return pasid;
> > +
> > +   xa_for_each(&group->pasid_array, index, tdomain) {
> > +   if (domain == tdomain) {
> > +   pasid = index;
> > +   break;
> > +   }
> > +   }
> 
> Don't we need to acquire the group lock here?
> 
> Btw the intention of this function is a bit confusing. Patch01 already
> stores the pasid under domain hence it's redundant to get it
> indirectly from xarray index. You could simply introduce a flag bit
> (e.g. dma_pasid_enabled) in device_domain_info and then directly
> use domain->dma_pasid once the flag is true.
> 

Just saw your discussion with Jason about v3. While it makes sense
to not specialize DMA domain in iommu driver, the use of this function
should only be that when the call chain doesn't pass down a pasid
value e.g. when doing cache invalidation for domain map/unmap. If
the upper interface already carries a pasid e.g. in detach_dev_pasid()
iommu driver can simply verify that the corresponding pasid xarray 
entry points to the specified domain instead of using this function to
loop xarray and then verify the returned pasid (as done in patch03/04).

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v4 4/6] iommu: Add PASID support for DMA mapping API users

2022-05-23 Thread Tian, Kevin

> From: Jacob Pan 
> Sent: Thursday, May 19, 2022 2:21 AM
> 
> DMA mapping API is the de facto standard for in-kernel DMA. It operates
> on a per device/RID basis which is not PASID-aware.
> 
> Some modern devices such as Intel Data Streaming Accelerator, PASID is
> required for certain work submissions. To allow such devices use DMA
> mapping API, we need the following functionalities:
> 1. Provide device a way to retrieve a PASID for work submission within
> the kernel
> 2. Enable the kernel PASID on the IOMMU for the device
> 3. Attach the kernel PASID to the device's default DMA domain, let it
> be IOVA or physical address in case of pass-through.
> 
> This patch introduces a driver facing API that enables DMA API
> PASID usage. Once enabled, device drivers can continue to use DMA APIs as
> is. There is no difference in dma_handle between without PASID and with
> PASID.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/dma-iommu.c | 114
> ++
>  include/linux/dma-iommu.h |   3 +
>  2 files changed, 117 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 1ca85d37eeab..6ad7ba619ef0 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
>   phys_addr_t phys;
>  };
> 
> +static DECLARE_IOASID_SET(iommu_dma_pasid);
> +
>  enum iommu_dma_cookie_type {
>   IOMMU_DMA_IOVA_COOKIE,
>   IOMMU_DMA_MSI_COOKIE,
> @@ -370,6 +372,118 @@ void iommu_put_dma_cookie(struct
> iommu_domain *domain)
>   domain->iova_cookie = NULL;
>  }
> 
> +/* Protect iommu_domain DMA PASID data */
> +static DEFINE_MUTEX(dma_pasid_lock);
> +/**
> + * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the
> device's
> + * DMA domain.
> + * @dev: Device to be enabled
> + * @pasid: The returned kernel PASID to be used for DMA
> + *
> + * DMA request with PASID will be mapped the same way as the legacy DMA.
> + * If the device is in pass-through, PASID will also pass-through. If the
> + * device is in IOVA, the PASID will point to the same IOVA page table.
> + *
> + * @return err code or 0 on success
> + */
> +int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)

iommu_attach_dma_domain_pasid? 'dma_pasid' is too broad from
a API p.o.v.

> +{
> + struct iommu_domain *dom;
> + ioasid_t id, max;
> + int ret = 0;
> +
> + dom = iommu_get_domain_for_dev(dev);
> + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> + return -ENODEV;
> +
> + /* Only support domain types that DMA API can be used */
> + if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
> + dom->type == IOMMU_DOMAIN_BLOCKED) {
> + dev_warn(dev, "Invalid domain type %d", dom->type);
> + return -EPERM;
> + }

WARN_ON.

and probably we can just check whether domain is default domain here.

> +
> + mutex_lock(&dma_pasid_lock);
> + id = dom->dma_pasid;
> + if (!id) {
> + /*
> +  * First device to use PASID in its DMA domain, allocate
> +  * a single PASID per DMA domain is all we need, it is also
> +  * good for performance when it comes down to IOTLB flush.
> +  */
> + max = 1U << dev->iommu->pasid_bits;
> + if (!max) {
> + ret = -EINVAL;
> + goto done_unlock;
> + }
> +
> + id = ioasid_alloc(&iommu_dma_pasid, 1, max, dev);
> + if (id == INVALID_IOASID) {
> + ret = -ENOMEM;
> + goto done_unlock;
> + }
> +
> + dom->dma_pasid = id;
> + atomic_set(&dom->dma_pasid_users, 1);

this is always accessed with lock held hence no need to be atomic.

> + }
> +
> + ret = iommu_attach_device_pasid(dom, dev, id);
> + if (!ret) {
> + *pasid = id;
> + atomic_inc(&dom->dma_pasid_users);
> + goto done_unlock;
> + }
> +
> + if (atomic_dec_and_test(&dom->dma_pasid_users)) {
> + ioasid_free(id);
> + dom->dma_pasid = 0;
> + }
> +done_unlock:
> + mutex_unlock(&dma_pasid_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL(iommu_attach_dma_pasid);
> +
> +/**
> + * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID
> + * @dev: Device's PASID DMA to be disabled
> + *
> + * It is the device driver's responsibility to ensure no more incoming DMA
> + * requests with the kernel PASID before calling this function. IOMMU driver
> + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
> + * drained.
> + *
> + */
> +void iommu_detach_dma_pasid(struct device *dev)
> +{
> + struct iommu_domain *dom;
> + ioasid_t pasid;
> +
> + dom = iommu_get_domain_for_dev(dev);
> + if (WARN_ON(!dom || !dom->ops || !dom->ops->detach_dev_pasid))
> + return;
> +
> + /* Only support D

RE: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-23 Thread Tian, Kevin

> From: Jacob Pan 
> Sent: Thursday, May 19, 2022 2:21 AM
> 
> IOMMU group maintains a PASID array which stores the associated IOMMU
> domains. This patch introduces a helper function to do domain to PASID
> look up. It will be used by TLB flush and device-PASID attach verification.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/iommu.c | 22 ++
>  include/linux/iommu.h |  6 +-
>  2 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 00d0262a1fe9..22f44833db64 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3199,3 +3199,25 @@ struct iommu_domain
> *iommu_get_domain_for_iopf(struct device *dev,
> 
>   return domain;
>  }
> +
> +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> iommu_domain *domain)
> +{
> + struct iommu_domain *tdomain;
> + struct iommu_group *group;
> + unsigned long index;
> + ioasid_t pasid = INVALID_IOASID;
> +
> + group = iommu_group_get(dev);
> + if (!group)
> + return pasid;
> +
> + xa_for_each(&group->pasid_array, index, tdomain) {
> + if (domain == tdomain) {
> + pasid = index;
> + break;
> + }
> + }

Don't we need to acquire the group lock here?

Btw the intention of this function is a bit confusing. Patch01 already
stores the pasid under domain hence it's redundant to get it 
indirectly from xarray index. You could simply introduce a flag bit
(e.g. dma_pasid_enabled) in device_domain_info and then directly
use domain->dma_pasid once the flag is true.

> + iommu_group_put(group);
> +
> + return pasid;
> +}
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 36ad007084cc..c0440a4be699 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -694,7 +694,7 @@ void iommu_detach_device_pasid(struct
> iommu_domain *domain,
>  struct device *dev, ioasid_t pasid);
>  struct iommu_domain *
>  iommu_get_domain_for_iopf(struct device *dev, ioasid_t pasid);
> -
> +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> iommu_domain *domain);
>  #else /* CONFIG_IOMMU_API */
> 
>  struct iommu_ops {};
> @@ -1070,6 +1070,10 @@ iommu_get_domain_for_iopf(struct device *dev,
> ioasid_t pasid)
>  {
>   return NULL;
>  }
> +static ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> iommu_domain *domain)
> +{
> + return INVALID_IOASID;
> +}
>  #endif /* CONFIG_IOMMU_API */
> 
>  #ifdef CONFIG_IOMMU_SVA
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting

2022-05-22 Thread Tian, Kevin

> From: Steve Wahl
> Sent: Thursday, May 19, 2022 3:58 AM
> 
> On Fri, May 13, 2022 at 10:09:46AM +0800, Baolu Lu wrote:
> > On 2022/5/13 07:12, Steve Wahl wrote:
> > > On Thu, May 12, 2022 at 10:13:09AM -0500, Steve Wahl wrote:
> > > > To support up to 64 sockets with 10 DMAR units each (640), make the
> > > > value of DMAR_UNITS_SUPPORTED adjustable by a config variable,
> > > > CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when
> MAXSMP is
> > > > set.
> > > >
> > > > If the available hardware exceeds DMAR_UNITS_SUPPORTED
> (previously set
> > > > to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
> > > > allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
> > > > remapping doesn't support X2APIC mode x2apic disabled"; and the
> system
> > > > fails to boot properly.
> > > >
> > > > Signed-off-by: Steve Wahl 
> > >
> > > I've received a report from the kernel test robot ,
> > > that this patch causes an error (shown below) when
> > > CONFIG_IOMMU_SUPPORT is not set.
> > >
> > > In my opinion, this is because include/linux/dmar.h and
> > > include/linux/intel-iommu are being #included when they are not really
> > > being used.
> > >
> > > I've tried placing the contents of intel-iommu.h within an #ifdef
> > > CONFIG_INTEL_IOMMU, and that fixes the problem.
> > >
> > > Two questions:
> > >
> > > A) Is this the desired approach to to the fix?
> >
> > Most part of include/linux/intel-iommu.h is private to Intel IOMMU
> > driver. They should be put in a header like drivers/iommu/intel
> > /iommu.h. Eventually, we should remove include/linux/intel-iommu.h
> > and device drivers interact with iommu subsystem through the IOMMU
> > kAPIs.
> >
> > Best regards,
> > baolu
> 
> Baolu's recent patch to move intel-iommu.h private still allows my
> [PATCH v2] to apply with no changes, and gets rid of the compile
> errors when CONFIG_IOMMU_SUPPORT is not set, so the kernel test robot
> should be happy now.
> 
> Is there another step I should do to bring this patch back into focus?
> 

This looks good to me.

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

1 2 3 4 5 6 7 8 >

1 - 100 of 784 matches

Mail list logo