Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-10 Thread David Gibson
On Tue, May 10, 2022 at 04:00:09PM -0300, Jason Gunthorpe wrote:
> On Tue, May 10, 2022 at 05:12:04PM +1000, David Gibson wrote:
> > On Mon, May 09, 2022 at 11:00:41AM -0300, Jason Gunthorpe wrote:
> > > On Mon, May 09, 2022 at 04:01:52PM +1000, David Gibson wrote:
> > > 
> > > > > The default iommu_domain that the iommu driver creates will be used
> > > > > here, it is up to the iommu driver to choose something reasonable for
> > > > > use by applications like DPDK. ie PPC should probably pick its biggest
> > > > > x86-like aperture.
> > > > 
> > > > So, using the big aperture means a very high base IOVA
> > > > (1<<59)... which means that it won't work at all if you want to attach
> > > > any devices that aren't capable of 64-bit DMA.
> > > 
> > > I'd expect to include the 32 bit window too..
> > 
> > I'm not entirely sure what you mean.  Are you working on the
> > assumption that we've extended to allowing multiple apertures, so we'd
> > default to advertising both a small/low aperture and a large/high
> > aperture?
> 
> Yes

Ok, that works assuming we can advertise multiple windows.

> > > No, this just makes it fragile in the other direction because now
> > > userspace has to know what platform specific things to ask for *or it
> > > doesn't work at all*. This is not a improvement for the DPDK cases.
> > 
> > Um.. no.  The idea is that userspace requests *what it needs*, not
> > anything platform specific.  In the case of DPDK that would be nothing
> > more than the (minimum) aperture size.  Nothing platform specific
> > about that.
> 
> Except a 32 bit platform can only maybe do a < 4G aperture, a 64 bit
> platform can do more, but it varies how much more, etc.
> 
> There is no constant value DPDK could stuff in this request, unless it
> needs a really small amount of IOVA, like 1G or something.

Well, my assumption was that DPDK always wanted an IOVA window to
cover its hugepage buffer space.  So not "constant" exactly, but a
value it will know at start up time.  But I think we cover that more
closely below.

> > > It isn't like there is some hard coded value we can put into DPDK that
> > > will work on every platform. So kernel must pick for DPDK, IMHO. I
> > > don't see any feasible alternative.
> > 
> > Yes, hence *optionally specified* base address only.
> 
> Okay, so imagine we've already done this and DPDK is not optionally
> specifying anything :)
> 
> The structs can be extended so we can add this as an input to creation
> when a driver can implement it.
> 
> > > The ppc specific driver would be on the generic side of qemu in its
> > > viommu support framework. There is lots of host driver optimization
> > > possible here with knowledge of the underlying host iommu HW. It
> > > should not be connected to the qemu target.
> > 
> > Thinking through this...
> > 
> > So, I guess we could have basically the same logic I'm suggesting be
> > in the qemu backend iommu driver instead.  So the target side (machine
> > type, strictly speaking) would request of the host side the apertures
> > it needs, and the host side driver would see if it can do that, based
> > on both specific knowledge of that driver and the query reponses.
> 
> Yes, this is what I'm thinking
> 
> > ppc on x86 should work with that.. at least if the x86 aperture is
> > large enough to reach up to ppc's high window.  I guess we'd have the
> > option here of using either the generic host driver or the
> > x86-specific driver.  The latter would mean qemu maintaining an
> > x86-format shadow of the io pagetables; mildly tedious, but doable.
> 
> The appeal of having userspace page tables is performance, so it is
> tedious to shadow, but it should run faster.

I doubt the difference is meaningful in the context of an emulated
guest, though.

> > So... is there any way of backing out of this gracefully.  We could
> > detach the device, but in the meantime ongoing DMA maps from
> > previous devices might have failed.  
> 
> This sounds like a good use case for qemu to communicate ranges - but
> as I mentioned before Alex said qemu didn't know the ranges..

Yeah, I'm a bit baffled by that, and I don't know the context.  Note
that there are at least two different very different users of the host
IOMMU backends in: one is for emulation of guest DMA (with or without
a vIOMMU).  In that case the details of the guest platform should let
qemu know the ranges.  There's also a VFIO based NVME backend; that
one's much more like a "normal" userspace driver, where it doesn't
care about the address ranges (because they're not guest visible).

> > We could pre-attach the new device to a new IOAS and check the
> > apertures there - but when we move it to the final IOAS is it
> > guaranteed that the apertures will be (at least) the intersection of
> > the old and new apertures, or is that just the probable outcome. 
> 
> Should be guarenteed

Ok; that would need to be documented.

> > Ok.. you convinced me.  As long as we have some way to handle the
> 

RE: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-10 Thread Tian, Kevin
> From: Baolu Lu 
> Sent: Wednesday, May 11, 2022 10:32 AM
> 
> On 2022/5/10 22:02, Jason Gunthorpe wrote:
> > On Tue, May 10, 2022 at 02:17:29PM +0800, Lu Baolu wrote:
> >
> >> This adds a pair of common domain ops for this purpose and adds
> helpers
> >> to attach/detach a domain to/from a {device, PASID}.
> >
> > I wonder if this should not have a detach op - after discussing with
> > Robin we can see that detach_dev is not used in updated
> > drivers. Instead attach_dev acts as 'set_domain'
> >
> > So, it would be more symmetrical if attaching a blocking_domain to the
> > PASID was the way to 'detach'.
> >
> > This could be made straightforward by following the sketch I showed to
> > have a static, global blocing_domain and providing a pointer to it in
> > struct iommu_ops
> >
> > Then 'detach pasid' is:
> >
> > iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev,
> pasid);
> >
> > And we move away from the notion of 'detach' and in the direction that
> > everything continuously has a domain set. PASID would logically
> > default to blocking_domain, though we wouldn't track this anywhere.
> 
> I am not sure whether we still need to keep the blocking domain concept
> when we are entering the new PASID world. Please allow me to wait and
> listen to more opinions.
> 

I'm with Jason on this direction. In concept after a PASID is detached it's
essentially blocked. Implementation-wise it doesn't prevent the iommu
driver from marking the PASID entry as non-present as doing in this
series instead of actually pointing to the empty page table of the block
domain. But api-wise it does make the entire semantics more consistent.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-10 Thread Tian, Kevin
> From: Steve Wahl 
> Sent: Wednesday, May 11, 2022 3:07 AM
> 
> On Tue, May 10, 2022 at 01:16:26AM +, Tian, Kevin wrote:
> > > From: Steve Wahl 
> > > Sent: Friday, May 6, 2022 11:26 PM
> > >
> > > On Fri, May 06, 2022 at 08:12:11AM +, Tian, Kevin wrote:
> > > > > From: David Woodhouse 
> > > > > Sent: Friday, May 6, 2022 3:17 PM
> > > > >
> > > > > On Fri, 2022-05-06 at 06:49 +, Tian, Kevin wrote:
> > > > > > > From: Baolu Lu 
> > > > > > >
> > > > > > > > --- a/include/linux/dmar.h
> > > > > > > > +++ b/include/linux/dmar.h
> > > > > > > > @@ -19,7 +19,7 @@
> > > > > > > >   struct acpi_dmar_header;
> > > > > > > >
> > > > > > > >   #ifdefCONFIG_X86
> > > > > > > > -# define   DMAR_UNITS_SUPPORTEDMAX_IO_APICS
> > > > > > > > +# define   DMAR_UNITS_SUPPORTED640
> > > > > > > >   #else
> > > > > > > >   # define  DMAR_UNITS_SUPPORTED64
> > > > > > > >   #endif
> > > > > >
> > > > > > ... is it necessary to permanently do 10x increase which wastes
> memory
> > > > > > on most platforms which won't have such need.
> > > > >
> > > > > I was just looking at that. It mostly adds about 3½ KiB to each struct
> > > > > dmar_domain.
> > > > >
> > > > > I think the only actual static array is the dmar_seq_ids bitmap which
> > > > > grows to 640 *bits* which is fairly negligible, and the main growth is
> > > > > that it adds about 3½ KiB to each struct dmar_domain for the
> > > > > iommu_refcnt[] and iommu_did[] arrays.
> > > >
> > > > Thanks for the quick experiment! though the added material is
> > > > negligible it's cleaner to me if having a way to configure it as
> > > > discussed below.
> > > >
> > > > >
> > > > > > Does it make more sense to have a configurable approach similar to
> > > > > > CONFIG_NR_CPUS? or even better can we just replace those static
> > > > > > arrays with dynamic allocation so removing this restriction
> > > > > > completely?
> > > > >
> > > > > Hotplug makes that fun, but I suppose you only need to grow the
> array
> > > > > in a given struct dmar_domain if you actually add a device to it 
> > > > > that's
> > > > > behind a newly added IOMMU. I don't know if the complexity of
> making it
> > > > > fully dynamic is worth it though. We could make it a config option,
> > > > > and/or a command line option (perhaps automatically derived from
> > > > > CONFIG_NR_CPUS).
> > > >
> > > > either config option or command line option is OK to me. Probably
> > > > the former is simpler given no need to dynamically expand the
> > > > static array. btw though deriving from CONFIG_NR_CPUS could work
> > > > in this case it is unclear why tying the two together is necessary in
> > > > concept, e.g. is there guarantee that the number of IOMMUs must
> > > > be smaller than the number of CPUs in a platform?
> > > >
> > > > >
> > > > > If it wasn't for hotplug, I think we'd know the right number by the
> > > > > time we actually need it anyway, wouldn't we? Can we have a
> heuristic
> > > > > for how many DMAR units are likely to be hotplugged? Is it as simple
> as
> > > > > the ratio of present to not-yet-present CPUs in MADT?
> > > >
> > > > Probably. But I don't have enough knowledge on DMAR hotplug to
> > > > judge (e.g. whether it's strictly tied to CPU hotplug and if yes whether
> > > > there could be multiple IOMMUs hotplugged together with a CPU
> > > > socket)...
> > > >
> > > > Thanks
> > > > Kevin
> > >
> > > Would anyone be more comfortable if we only increase the limit where
> > > MAXSMP is set?
> > >
> > > i.e.
> > >
> > > #if defined(CONFIG_X86) && defined(CONFIG_MAXSMP)
> > > # define  DMAR_UNITS_SUPPORTED640
> > > #elif defined(CONFIG_X86)
> > > # define  DMAR_UNITS_SUPPORTEDMAX_IO_APICS
> > > #else
> > > # define  DMAR_UNITS_SUPPORTED64
> > > #endif
> > >
> > > Thank you all for your time looking at this.
> > >
> >
> > This works for your own configuration but it's unclear whether other
> > MAXSMP platforms have the exact same requirements (different
> > number of sockets, different ratio of #iommus/#sockets, etc.). In any
> > case since we are at it having a generic way to extend it makes more
> > sense to me.
> 
> So, to be clear, what you would like to see would be Kconfig entries
> to create a config option, say "NR_DMARS", set up so the default is:
> 
>  MAXSMP?  640

usually we do 2's power thus 1024 is more reasonable. If people do
care about the exact memory footprint they can always manually
change it.

>  X86_64?  128
>  X86_32?  64
>  other64
> 
> And DMAR_UNITS_SUPPORTED gets removed, and everywhere it was used
> we
> use CONFIG_NR_DMARS in its place?

Let's keep DMAR_UNITS_SUPPORTED and just redefine it to be
CONFIG_NR_DMARS for less changes.

> 
> I can give that a shot but wanted to confirm this is what you'd want
> first.
> 
> Thanks,
> 
> --> Steve
> 
> --
> Steve Wahl, Hewlett Packard Enterprise
___
iommu mailing list

RE: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-10 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, May 11, 2022 3:00 AM
> 
> On Tue, May 10, 2022 at 05:12:04PM +1000, David Gibson wrote:
> > Ok... here's a revised version of my proposal which I think addresses
> > your concerns and simplfies things.
> >
> > - No new operations, but IOAS_MAP gets some new flags (and IOAS_COPY
> >   will probably need matching changes)
> >
> > - By default the IOVA given to IOAS_MAP is a hint only, and the IOVA
> >   is chosen by the kernel within the aperture(s).  This is closer to
> >   how mmap() operates, and DPDK and similar shouldn't care about
> >   having specific IOVAs, even at the individual mapping level.
> >
> > - IOAS_MAP gets an IOMAP_FIXED flag, analagous to mmap()'s MAP_FIXED,
> >   for when you really do want to control the IOVA (qemu, maybe some
> >   special userspace driver cases)
> 
> We already did both of these, the flag is called
> IOMMU_IOAS_MAP_FIXED_IOVA - if it is not specified then kernel will
> select the IOVA internally.
> 
> > - ATTACH will fail if the new device would shrink the aperture to
> >   exclude any already established mappings (I assume this is already
> >   the case)
> 
> Yes
> 
> > - IOAS_MAP gets an IOMAP_RESERVE flag, which operates a bit like a
> >   PROT_NONE mmap().  It reserves that IOVA space, so other (non-FIXED)
> >   MAPs won't use it, but doesn't actually put anything into the IO
> >   pagetables.
> > - Like a regular mapping, ATTACHes that are incompatible with an
> >   IOMAP_RESERVEed region will fail
> > - An IOMAP_RESERVEed area can be overmapped with an IOMAP_FIXED
> >   mapping
> 
> Yeah, this seems OK, I'm thinking a new API might make sense because
> you don't really want mmap replacement semantics but a permanent
> record of what IOVA must always be valid.
> 
> IOMMU_IOA_REQUIRE_IOVA perhaps, similar signature to
> IOMMUFD_CMD_IOAS_IOVA_RANGES:
> 
> struct iommu_ioas_require_iova {
> __u32 size;
> __u32 ioas_id;
> __u32 num_iovas;
> __u32 __reserved;
> struct iommu_required_iovas {
> __aligned_u64 start;
> __aligned_u64 last;
> } required_iovas[];
> };

As a permanent record do we want to enforce that once the required
range list is set all FIXED and non-FIXED allocations must be within the
list of ranges?

If yes we can take the end of the last range as the max size of the iova
address space to optimize the page table layout.

otherwise we may need another dedicated hint for that optimization.

> 
> > So, for DPDK the sequence would be:
> >
> > 1. Create IOAS
> > 2. ATTACH devices
> > 3. IOAS_MAP some stuff
> > 4. Do DMA with the IOVAs that IOAS_MAP returned
> >
> > (Note, not even any need for QUERY in simple cases)
> 
> Yes, this is done already
> 
> > For (unoptimized) qemu it would be:
> >
> > 1. Create IOAS
> > 2. IOAS_MAP(IOMAP_FIXED|IOMAP_RESERVE) the valid IOVA regions of
> the
> >guest platform
> > 3. ATTACH devices (this will fail if they're not compatible with the
> >reserved IOVA regions)
> > 4. Boot the guest

I suppose above is only the sample flow for PPC vIOMMU. For non-PPC
vIOMMUs regular mappings are required before booting the guest and
reservation might be done but not mandatory (at least not what current
Qemu vfio can afford as it simply replays valid ranges in the CPU address
space).

> >
> >   (on guest map/invalidate) -> IOAS_MAP(IOMAP_FIXED) to overmap part
> of
> >the reserved regions
> >   (on dev hotplug) -> ATTACH (which might fail, if it conflicts with the
> >   reserved regions)
> >   (on vIOMMU reconfiguration) -> UNMAP/MAP reserved regions as
> >  necessary (which might fail)
> 
> OK, I will take care of it
> 
> Thanks,
> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-10 Thread Tian, Kevin
> From: Jason Gunthorpe
> Sent: Monday, May 9, 2022 10:01 PM
> 
> On Mon, May 09, 2022 at 04:01:52PM +1000, David Gibson wrote:
> 
> > Which is why I'm suggesting that the base address be an optional
> > request.  DPDK *will* care about the size of the range, so it just
> > requests that and gets told a base address.
> 
> We've talked about a size of IOVA address space before, strictly as a
> hint, to possible optimize page table layout, or something, and I'm
> fine with that idea. But - we have no driver implementation today, so
> I'm not sure what we can really do with this right now..
> 
> Kevin could Intel consume a hint on IOVA space and optimize the number
> of IO page table levels?
> 

It could, but not implemented now.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-10 Thread Baolu Lu

On 2022/5/10 22:02, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 02:17:29PM +0800, Lu Baolu wrote:


This adds a pair of common domain ops for this purpose and adds helpers
to attach/detach a domain to/from a {device, PASID}.


I wonder if this should not have a detach op - after discussing with
Robin we can see that detach_dev is not used in updated
drivers. Instead attach_dev acts as 'set_domain'

So, it would be more symmetrical if attaching a blocking_domain to the
PASID was the way to 'detach'.

This could be made straightforward by following the sketch I showed to
have a static, global blocing_domain and providing a pointer to it in
struct iommu_ops

Then 'detach pasid' is:

iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev, pasid);

And we move away from the notion of 'detach' and in the direction that
everything continuously has a domain set. PASID would logically
default to blocking_domain, though we wouldn't track this anywhere.


I am not sure whether we still need to keep the blocking domain concept
when we are entering the new PASID world. Please allow me to wait and
listen to more opinions.

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/12] iommu: Add pasid_bits field in struct dev_iommu

2022-05-10 Thread Baolu Lu

On 2022/5/10 22:34, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 02:17:28PM +0800, Lu Baolu wrote:


  int iommu_device_register(struct iommu_device *iommu,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8f..afc63fce6107 100644
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2681,6 +2681,8 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
master->stall_enabled = true;
  
+	dev->iommu->pasid_bits = master->ssid_bits;

return >iommu;
  
  err_free_master:

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2990f80c5e08..99643f897f26 100644
+++ b/drivers/iommu/intel/iommu.c
@@ -4624,8 +4624,11 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
if (pasid_supported(iommu)) {
int features = pci_pasid_features(pdev);
  
-if (features >= 0)

+   if (features >= 0) {
info->pasid_supported = features | 1;
+   dev->iommu->pasid_bits =
+   fls(pci_max_pasids(pdev)) - 1;
+   }


It is not very nice that both the iommu drivers have to duplicate the
code to read the pasid capability out of the PCI device.

IMHO it would make more sense for the iommu layer to report the
capability of its own HW block only, and for the core code to figure
out the master's limitation using a bus-specific approach.


Fair enough. The iommu hardware capability could be reported in

/**
 * struct iommu_device - IOMMU core representation of one IOMMU hardware
 *   instance
 * @list: Used by the iommu-core to keep a list of registered iommus
 * @ops: iommu-ops for talking to this iommu
 * @dev: struct device for sysfs handling
 */
struct iommu_device {
struct list_head list;
const struct iommu_ops *ops;
struct fwnode_handle *fwnode;
struct device *dev;
};

I haven't checked ARM code yet, but it works for x86 as far as I can
see.



It is also unfortunate that the enable/disable pasid is inside the
iommu driver as well - ideally the PCI driver itself would do this
when it knows it wants to use PASIDs.

The ordering interaction with ATS makes this look quite annoying
though. :(

I'm also not convinced individual IOMMU drivers should be forcing ATS
on, there are performance and functional implications here. Using ATS
or not is possibly best left as an administrator policy controlled by
the core code. Again we seem to have some mess.


Agreed with you. This has already been in my task list. I will start to
solve it after the iommufd tasks.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 5/7] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

2022-05-10 Thread liuqi (BA) via iommu



Hi James,

On 2022/5/10 18:14, James Clark wrote:



On 07/04/2022 13:58, Yicong Yang wrote:

From: Qi Liu 


[...]

  struct auxtrace_record
  *auxtrace_record__init(struct evlist *evlist, int *err)
  {
@@ -57,8 +112,12 @@ struct auxtrace_record
struct evsel *evsel;
bool found_etm = false;
struct perf_pmu *found_spe = NULL;
+   struct perf_pmu *found_ptt = NULL;
struct perf_pmu **arm_spe_pmus = NULL;
+   struct perf_pmu **hisi_ptt_pmus = NULL;
+
int nr_spes = 0;
+   int nr_ptts = 0;
int i = 0;
  
  	if (!evlist)

@@ -66,13 +125,14 @@ struct auxtrace_record
  
  	cs_etm_pmu = perf_pmu__find(CORESIGHT_ETM_PMU_NAME);

arm_spe_pmus = find_all_arm_spe_pmus(_spes, err);
+   hisi_ptt_pmus = find_all_hisi_ptt_pmus(_ptts, err);
  
  	evlist__for_each_entry(evlist, evsel) {

if (cs_etm_pmu &&
evsel->core.attr.type == cs_etm_pmu->type)
found_etm = true;
  
-		if (!nr_spes || found_spe)

+   if ((!nr_spes || found_spe) && (!nr_ptts || found_ptt))
continue;
  
  		for (i = 0; i < nr_spes; i++) {

@@ -81,11 +141,18 @@ struct auxtrace_record
break;
}
}
+
+   for (i = 0; i < nr_ptts; i++) {
+   if (evsel->core.attr.type == hisi_ptt_pmus[i]->type) {
+   found_ptt = hisi_ptt_pmus[i];
+   break;
+   }
+   }
}
free(arm_spe_pmus);
  
-	if (found_etm && found_spe) {

-   pr_err("Concurrent ARM Coresight ETM and SPE operation not currently 
supported\n");
+   if (found_etm && found_spe && found_ptt) {
+   pr_err("Concurrent ARM Coresight ETM ,SPE and HiSilicon PCIe Trace 
operation not currently supported\n");


Hi Yicong,

Is that actually a limitation? I don't see why they couldn't work concurrently.


As Leo said, the logic here should be like this:

int auxtrace_event_cnt = 0;
if (found_etm)
auxtrace_event_cnt++;
if (found_spe)
auxtrace_event_cnt++;
if (found_ptt)
auxtrace_event_cnt++;

if (auxtrace_event_cnt > 1) {
pr_err("Concurrent AUX trace operation isn't supported: 
found etm %d spe %d ptt %d\n",

   found_etm, found_spe, found_ptt);
*err = -EOPNOTSUPP;
return NULL;
}

which means perf doesn't allow more than one auxtrace event recording at 
the same time.






*err = -EOPNOTSUPP;
return NULL;
}
@@ -96,6 +163,9 @@ struct auxtrace_record
  #if defined(__aarch64__)
if (found_spe)
return arm_spe_recording_init(err, found_spe);
+
+   if (found_ptt)
+   return hisi_ptt_recording_init(err, found_ptt);
  #endif
  
  	/*


[...]

+
+static int hisi_ptt_recording_options(struct auxtrace_record *itr,
+ struct evlist *evlist,
+ struct record_opts *opts)
+{
+   struct hisi_ptt_recording *pttr =
+   container_of(itr, struct hisi_ptt_recording, itr);
+   struct perf_pmu *hisi_ptt_pmu = pttr->hisi_ptt_pmu;
+   struct perf_cpu_map *cpus = evlist->core.cpus;
+   struct evsel *evsel, *hisi_ptt_evsel = NULL;
+   struct evsel *tracking_evsel;
+   int err;
+
+   pttr->evlist = evlist;
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel->core.attr.type == hisi_ptt_pmu->type) {
+   if (hisi_ptt_evsel) {
+   pr_err("There may be only one " HISI_PTT_PMU_NAME "x 
event\n");
+   return -EINVAL;
+   }
+   evsel->core.attr.freq = 0;
+   evsel->core.attr.sample_period = 1;
+   hisi_ptt_evsel = evsel;
+   opts->full_auxtrace = true;
+   }
+   }
+
+   err = hisi_ptt_set_auxtrace_mmap_page(opts);
+   if (err)
+   return err;
+   /*
+* To obtain the auxtrace buffer file descriptor, the auxtrace event
+* must come first.
+*/
+   evlist__to_front(evlist, hisi_ptt_evsel);
+
+   if (!perf_cpu_map__empty(cpus)) {
+   evsel__set_sample_bit(hisi_ptt_evsel, TIME);
+   evsel__set_sample_bit(hisi_ptt_evsel, CPU);
+   }


Similar to Leo's comment: CPU isn't required if it's uncore,
and if TIME is useful then add it regardless of whether the
event is opened per-cpu or on a task.


got it, will fix this next time.


+
+   /* Add dummy event to keep tracking */
+   err = parse_events(evlist, "dummy:u", NULL);
+   if (err)
+   return err;
+
+   tracking_evsel = evlist__last(evlist);
+ 

RE: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

2022-05-10 Thread Tian, Kevin
> From: Joao Martins 
> Sent: Tuesday, May 10, 2022 7:51 PM
> 
> On 5/10/22 02:38, Tian, Kevin wrote:
> >> From: Jason Gunthorpe 
> >> Sent: Friday, May 6, 2022 7:46 PM
> >>
> >> On Fri, May 06, 2022 at 03:51:40AM +, Tian, Kevin wrote:
>  From: Jason Gunthorpe 
>  Sent: Thursday, May 5, 2022 10:08 PM
> 
>  On Thu, May 05, 2022 at 07:40:37AM +, Tian, Kevin wrote:
> 
> > In concept this is an iommu property instead of a domain property.
> 
>  Not really, domains shouldn't be changing behaviors once they are
>  created. If a domain supports dirty tracking and I attach a new device
>  then it still must support dirty tracking.
> >>>
> >>> That sort of suggests that userspace should specify whether a domain
> >>> supports dirty tracking when it's created. But how does userspace
> >>> know that it should create the domain in this way in the first place?
> >>> live migration is triggered on demand and it may not happen in the
> >>> lifetime of a VM.
> >>
> >> The best you could do is to look at the devices being plugged in at VM
> >> startup, and if they all support live migration then request dirty
> >> tracking, otherwise don't.
> >
> > Yes, this is how a device capability can help.
> >
> >>
> >> However, tt costs nothing to have dirty tracking as long as all iommus
> >> support it in the system - which seems to be the normal case today.
> >>
> >> We should just always turn it on at this point.
> >
> > Then still need a way to report " all iommus support it in the system"
> > to userspace since many old systems don't support it at all. If we all
> > agree that a device capability flag would be helpful on this front (like
> > you also said below), probably can start building the initial skeleton
> > with that in mind?
> >
> 
> This would capture device-specific and maybe iommu-instance features, but
> there's some tiny bit odd semantic here. There's nothing that
> depends on the device to support any of this, but rather the IOMMU instance
> that sits
> below the device which is independent of device-own capabilities e.g. PRI on
> the other
> hand would be a perfect fit for a device capability (?), but dirty tracking
> conveying over a device capability would be a convenience rather than an
> exact
> hw representation.

it is sort of getting certain iommu capability for a given device, as how
the iommu kAPI is moving toward.

> 
> Thinking out loud if we are going as a device/iommu capability [to see if this
> matches
> what people have or not in mind]: we would add dirty-tracking feature bit via
> the existent
> kAPI for iommu device features (e.g. IOMMU_DEV_FEAT_AD) and on
> iommufd we would maybe add
> an IOMMUFD_CMD_DEV_GET_IOMMU_FEATURES ioctl which would have an
> u64 dev_id as input (from
> the returned vfio-pci BIND_IOMMUFD @out_dev_id) and u64 features as an
> output bitmap of
> synthetic feature bits, having IOMMUFD_FEATURE_AD the only one we query
> (and
> IOMMUFD_FEATURE_{SVA,IOPF} as potentially future candidates). Qemu
> would then at start of
> day would check if /all devices/ support it and it would then still do the 
> blind
> set
> tracking, but bail out preemptively if any of device-iommu don't support
> dirty-tracking. I
> don't think we have any case today for having to deal with different IOMMU
> instances that
> have different features.

This heterogeneity already exists today. On Intel platform not all IOMMUs
support force snooping. I believe ARM has similar situation which is why
Robin is refactoring bus-oriented iommu_capable() etc. to device-oriented.

I'm not aware of such heterogeneity particularly for dirty tracking today. But
who knows it won't happen in the future? I just feel that aligning iommufd
uAPI to iommu kAPI for capability reporting might be more future proof here.

> 
> Either that or as discussed in the beginning perhaps add an iommufd (or
> iommufd hwpt one)
> ioctl  call (e.g.IOMMUFD_CMD_CAP) via a input value (e.g. subop
> IOMMU_FEATURES) which
> would gives us a structure of things (e.g. for the IOMMU_FEATURES subop
> the common
> featureset bitmap in all iommu instances). This would give the 'all iommus
> support it in
> the system'. Albeit the device one might have more concrete longevity if
> there's further
> plans aside from dirty tracking.
> 
> >>
> >>> and if the user always creates domain to allow dirty tracking by default,
> >>> how does it know a failed attach is due to missing dirty tracking support
> >>> by the IOMMU and then creates another domain which disables dirty
> >>> tracking and retry-attach again?
> >>
> >> The automatic logic is complicated for sure, if you had a device flag
> >> it would have to figure it out that way
> >>
> >
> > Yes. That is the model in my mind.
> >
> > Thanks
> > Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

2022-05-10 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, May 10, 2022 9:47 PM
> 
> On Tue, May 10, 2022 at 01:38:26AM +, Tian, Kevin wrote:
> 
> > > However, tt costs nothing to have dirty tracking as long as all iommus
> > > support it in the system - which seems to be the normal case today.
> > >
> > > We should just always turn it on at this point.
> >
> > Then still need a way to report " all iommus support it in the system"
> > to userspace since many old systems don't support it at all.
> 
> Userspace can query the iommu_domain directly, or 'try and fail' to
> turn on tracking.
> 
> A device capability flag is useless without a control knob to request
> a domain is created with tracking, and we don't have that, or a reason
> to add that.
> 

I'm getting confused on your last comment. A capability flag has to
accompany with a control knob which iiuc is what you advocated
in earlier discussion i.e. specifying the tracking property when creating
the domain. In this case the flag assists the userspace in deciding
whether to set the property.

Not sure whether we argued pass each other but here is another
attempt.

In general I saw three options here:

a) 'try and fail' when creating the domain. It succeeds only when
all iommus support tracking;

b) capability reported on iommu domain. The capability is reported true
only when all iommus support tracking. This allows domain property
to be set after domain is created. But there is no much gain of doing
so when comparing to a).

c) capability reported on device. future compatible for heterogenous
platform. domain property is specified at domain creation and domains
can have different properties based on tracking capability of attached
devices.

I'm inclined to c) as it is more aligned to Robin's cleanup effort on
iommu_capable() and iommu_present() in the iommu layer which
moves away from global manner to per-device style. Along with 
that direction I guess we want to discourage adding more APIs
assuming 'all iommus supporting certain capability' thing?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/4] iommu: Add PASID support for DMA mapping API users

2022-05-10 Thread Jacob Pan
Hi Jason,

On Tue, 10 May 2022 20:28:04 -0300, Jason Gunthorpe  wrote:

> On Tue, May 10, 2022 at 02:07:02PM -0700, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> > 
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a PASID for work submission within
> > the kernel
> > 2. Enable the kernel PASID on the IOMMU for the device
> > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > be IOVA or physical address in case of pass-through.
> > 
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
> > 
> > Signed-off-by: Jacob Pan 
> >  drivers/iommu/dma-iommu.c | 107 ++
> >  include/linux/dma-iommu.h |   3 ++
> >  include/linux/iommu.h |   2 +
> >  3 files changed, 112 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 1ca85d37eeab..5984f3129fa2 100644
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
> > phys_addr_t phys;
> >  };
> >  
> > +static DECLARE_IOASID_SET(iommu_dma_pasid);
> > +
> >  enum iommu_dma_cookie_type {
> > IOMMU_DMA_IOVA_COOKIE,
> > IOMMU_DMA_MSI_COOKIE,
> > @@ -370,6 +372,111 @@ void iommu_put_dma_cookie(struct iommu_domain
> > *domain) domain->iova_cookie = NULL;
> >  }
> >  
> > +/**
> > + * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the
> > device's
> > + * DMA domain.
> > + * @dev: Device to be enabled
> > + * @pasid: The returned kernel PASID to be used for DMA
> > + *
> > + * DMA request with PASID will be mapped the same way as the legacy
> > DMA.
> > + * If the device is in pass-through, PASID will also pass-through. If
> > the
> > + * device is in IOVA, the PASID will point to the same IOVA page table.
> > + *
> > + * @return err code or 0 on success
> > + */
> > +int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)
> > +{
> > +   struct iommu_domain *dom;
> > +   ioasid_t id, max;
> > +   int ret = 0;
> > +
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > +   return -ENODEV;
> > +
> > +   /* Only support domain types that DMA API can be used */
> > +   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
> > +   dom->type == IOMMU_DOMAIN_BLOCKED) {
> > +   dev_warn(dev, "Invalid domain type %d", dom->type);  
> 
> This should be a WARN_ON
> 
will do, thanks

> > +   return -EPERM;
> > +   }
> > +
> > +   id = dom->pasid;
> > +   if (!id) {
> > +   /*
> > +* First device to use PASID in its DMA domain,
> > allocate
> > +* a single PASID per DMA domain is all we need, it is
> > also
> > +* good for performance when it comes down to IOTLB
> > flush.
> > +*/
> > +   max = 1U << dev->iommu->pasid_bits;
> > +   if (!max)
> > +   return -EINVAL;
> > +
> > +   id = ioasid_alloc(_dma_pasid, 1, max, dev);
> > +   if (id == INVALID_IOASID)
> > +   return -ENOMEM;
> > +
> > +   dom->pasid = id;
> > +   atomic_set(>pasid_users, 1);  
> 
> All of this needs proper locking.
> 
good catch, will add a mutex for domain updates, detach as well.

> > +   }
> > +
> > +   ret = dom->ops->attach_dev_pasid(dom, dev, id);
> > +   if (!ret) {
> > +   *pasid = id;
> > +   atomic_inc(>pasid_users);
> > +   return 0;
> > +   }
> > +
> > +   if (atomic_dec_and_test(>pasid_users)) {
> > +   ioasid_free(id);
> > +   dom->pasid = 0;
> > +   }
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL(iommu_attach_dma_pasid);
> > +
> > +/**
> > + * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID
> > + * @dev:   Device's PASID DMA to be disabled
> > + *
> > + * It is the device driver's responsibility to ensure no more incoming
> > DMA
> > + * requests with the kernel PASID before calling this function. IOMMU
> > driver
> > + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared
> > and
> > + * drained.
> > + *
> > + */
> > +void iommu_detach_dma_pasid(struct device *dev)
> > +{
> > +   struct iommu_domain *dom;
> > +   ioasid_t pasid;
> > +
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom || !dom->ops || !dom->ops->detach_dev_pasid) {
> > +   dev_warn(dev, "No ops for detaching PASID %u", pasid);
> > +   return;
> > +   }
> > +   /* Only support DMA API managed domain type */
> > +   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
> > +   

Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-10 Thread Jacob Pan
Hi Jason,

On Tue, 10 May 2022 20:21:21 -0300, Jason Gunthorpe  wrote:

> On Tue, May 10, 2022 at 02:07:01PM -0700, Jacob Pan wrote:
> > +static int intel_iommu_attach_dev_pasid(struct iommu_domain *domain,
> > +   struct device *dev,
> > +   ioasid_t pasid)
> > +{
> > +   struct device_domain_info *info = dev_iommu_priv_get(dev);
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct intel_iommu *iommu = info->iommu;
> > +   unsigned long flags;
> > +   int ret = 0;
> > +
> > +   if (!sm_supported(iommu) || !info)
> > +   return -ENODEV;
> > +
> > +   spin_lock_irqsave(_domain_lock, flags);
> > +   /*
> > +* If the same device already has a PASID attached, just
> > return.
> > +* DMA layer will return the PASID value to the caller.
> > +*/
> > +   if (pasid != PASID_RID2PASID && info->pasid) {  
> 
> Why check for PASID == 0 like this? Shouldn't pasid == 0 be rejected
> as an invalid argument?
Right, I was planning on reuse the attach function for RIDPASID as clean
up, but didn't include here. Will fix.

> 
> > +   if (info->pasid == pasid)
> > +   ret = 0;  
> 
> Doesn't this need to check that the current domain is the requested
> domain as well? How can this happen anyhow - isn't it an error to
> double attach?
> 
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 5af24befc9f1..55845a8c4f4d 100644
> > +++ b/include/linux/intel-iommu.h
> > @@ -627,6 +627,7 @@ struct device_domain_info {
> > struct intel_iommu *iommu; /* IOMMU used by this device */
> > struct dmar_domain *domain; /* pointer to domain */
> > struct pasid_table *pasid_table; /* pasid table */
> > +   ioasid_t pasid; /* DMA request with PASID */  
> 
> And this seems wrong - the DMA API is not the only user of
> attach_dev_pasid, so there should not be any global pasid for the
> device.
> 
True but the attach_dev_pasid() op is domain type specific. i.e. DMA API
has its own attach_dev_pasid which is different than sva domain
attach_dev_pasid().
device_domain_info is only used by DMA API.

> I suspect this should be a counter of # of pasid domains attached so
> that the special flush logic triggers
> 
This field is only used for devTLB, so it is per domain-device. struct
device_domain_info is allocated per device-domain as well. Sorry, I might
have totally missed your point.

> And rely on the core code to worry about assigning only one domain per
> pasid - this should really be a 'set' function.
> 
Yes, in this set the core code (in dma-iommu.c) only assign one PASID per
DMA domain type.

Are you suggesting the dma-iommu API should be called
iommu_set_dma_pasid instead of iommu_attach_dma_pasid?

Thanks a lot for the quick review!

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/4] iommu: Add PASID support for DMA mapping API users

2022-05-10 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 02:07:02PM -0700, Jacob Pan wrote:
> DMA mapping API is the de facto standard for in-kernel DMA. It operates
> on a per device/RID basis which is not PASID-aware.
> 
> Some modern devices such as Intel Data Streaming Accelerator, PASID is
> required for certain work submissions. To allow such devices use DMA
> mapping API, we need the following functionalities:
> 1. Provide device a way to retrieve a PASID for work submission within
> the kernel
> 2. Enable the kernel PASID on the IOMMU for the device
> 3. Attach the kernel PASID to the device's default DMA domain, let it
> be IOVA or physical address in case of pass-through.
> 
> This patch introduces a driver facing API that enables DMA API
> PASID usage. Once enabled, device drivers can continue to use DMA APIs as
> is. There is no difference in dma_handle between without PASID and with
> PASID.
> 
> Signed-off-by: Jacob Pan 
>  drivers/iommu/dma-iommu.c | 107 ++
>  include/linux/dma-iommu.h |   3 ++
>  include/linux/iommu.h |   2 +
>  3 files changed, 112 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 1ca85d37eeab..5984f3129fa2 100644
> +++ b/drivers/iommu/dma-iommu.c
> @@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
>   phys_addr_t phys;
>  };
>  
> +static DECLARE_IOASID_SET(iommu_dma_pasid);
> +
>  enum iommu_dma_cookie_type {
>   IOMMU_DMA_IOVA_COOKIE,
>   IOMMU_DMA_MSI_COOKIE,
> @@ -370,6 +372,111 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
>   domain->iova_cookie = NULL;
>  }
>  
> +/**
> + * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the 
> device's
> + * DMA domain.
> + * @dev: Device to be enabled
> + * @pasid: The returned kernel PASID to be used for DMA
> + *
> + * DMA request with PASID will be mapped the same way as the legacy DMA.
> + * If the device is in pass-through, PASID will also pass-through. If the
> + * device is in IOVA, the PASID will point to the same IOVA page table.
> + *
> + * @return err code or 0 on success
> + */
> +int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)
> +{
> + struct iommu_domain *dom;
> + ioasid_t id, max;
> + int ret = 0;
> +
> + dom = iommu_get_domain_for_dev(dev);
> + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> + return -ENODEV;
> +
> + /* Only support domain types that DMA API can be used */
> + if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
> + dom->type == IOMMU_DOMAIN_BLOCKED) {
> + dev_warn(dev, "Invalid domain type %d", dom->type);

This should be a WARN_ON

> + return -EPERM;
> + }
> +
> + id = dom->pasid;
> + if (!id) {
> + /*
> +  * First device to use PASID in its DMA domain, allocate
> +  * a single PASID per DMA domain is all we need, it is also
> +  * good for performance when it comes down to IOTLB flush.
> +  */
> + max = 1U << dev->iommu->pasid_bits;
> + if (!max)
> + return -EINVAL;
> +
> + id = ioasid_alloc(_dma_pasid, 1, max, dev);
> + if (id == INVALID_IOASID)
> + return -ENOMEM;
> +
> + dom->pasid = id;
> + atomic_set(>pasid_users, 1);

All of this needs proper locking.

> + }
> +
> + ret = dom->ops->attach_dev_pasid(dom, dev, id);
> + if (!ret) {
> + *pasid = id;
> + atomic_inc(>pasid_users);
> + return 0;
> + }
> +
> + if (atomic_dec_and_test(>pasid_users)) {
> + ioasid_free(id);
> + dom->pasid = 0;
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(iommu_attach_dma_pasid);
> +
> +/**
> + * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID
> + * @dev: Device's PASID DMA to be disabled
> + *
> + * It is the device driver's responsibility to ensure no more incoming DMA
> + * requests with the kernel PASID before calling this function. IOMMU driver
> + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
> + * drained.
> + *
> + */
> +void iommu_detach_dma_pasid(struct device *dev)
> +{
> + struct iommu_domain *dom;
> + ioasid_t pasid;
> +
> + dom = iommu_get_domain_for_dev(dev);
> + if (!dom || !dom->ops || !dom->ops->detach_dev_pasid) {
> + dev_warn(dev, "No ops for detaching PASID %u", pasid);
> + return;
> + }
> + /* Only support DMA API managed domain type */
> + if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
> + dom->type == IOMMU_DOMAIN_BLOCKED) {
> + dev_err(dev, "Invalid domain type %d to detach DMA PASID %u\n",
> +  dom->type, pasid);
> + return;
> + }
> + pasid = dom->pasid;
> + if (!pasid) {
> + dev_err(dev, "No DMA PASID attached\n");
> + return;
> + }

All WARN_ON's 

Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-10 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 02:07:01PM -0700, Jacob Pan wrote:
> +static int intel_iommu_attach_dev_pasid(struct iommu_domain *domain,
> + struct device *dev,
> + ioasid_t pasid)
> +{
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct intel_iommu *iommu = info->iommu;
> + unsigned long flags;
> + int ret = 0;
> +
> + if (!sm_supported(iommu) || !info)
> + return -ENODEV;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + /*
> +  * If the same device already has a PASID attached, just return.
> +  * DMA layer will return the PASID value to the caller.
> +  */
> + if (pasid != PASID_RID2PASID && info->pasid) {

Why check for PASID == 0 like this? Shouldn't pasid == 0 be rejected
as an invalid argument?

> + if (info->pasid == pasid)
> + ret = 0;

Doesn't this need to check that the current domain is the requested
domain as well? How can this happen anyhow - isn't it an error to
double attach?

> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 5af24befc9f1..55845a8c4f4d 100644
> +++ b/include/linux/intel-iommu.h
> @@ -627,6 +627,7 @@ struct device_domain_info {
>   struct intel_iommu *iommu; /* IOMMU used by this device */
>   struct dmar_domain *domain; /* pointer to domain */
>   struct pasid_table *pasid_table; /* pasid table */
> + ioasid_t pasid; /* DMA request with PASID */

And this seems wrong - the DMA API is not the only user of
attach_dev_pasid, so there should not be any global pasid for the
device.

I suspect this should be a counter of # of pasid domains attached so
that the special flush logic triggers

And rely on the core code to worry about assigning only one domain per
pasid - this should really be a 'set' function.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/4] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2022-05-10 Thread Jacob Pan
The current in-kernel supervisor PASID support is based on the SVM/SVA
machinery in SVA lib. The binding between a kernel PASID and kernel
mapping has many flaws. See discussions in the link below.

This patch enables in-kernel DMA by switching from SVA lib to the
standard DMA mapping APIs. Since both DMA requests with and without
PASIDs are mapped identically, there is no change to how DMA APIs are
used after the kernel PASID is enabled.

Link: https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/idxd.h  |  1 -
 drivers/dma/idxd/init.c  | 34 +-
 drivers/dma/idxd/sysfs.c |  7 ---
 3 files changed, 9 insertions(+), 33 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index ccbefd0be617..190b08bd7c08 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -277,7 +277,6 @@ struct idxd_device {
struct idxd_wq **wqs;
struct idxd_engine **engines;
 
-   struct iommu_sva *sva;
unsigned int pasid;
 
int num_groups;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index e1b5d1e4a949..e2e1c0eae6d6 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "../dmaengine.h"
@@ -466,36 +467,22 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev, struct idxd_driver_d
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-   int flags;
-   unsigned int pasid;
-   struct iommu_sva *sva;
+   u32 pasid;
+   int ret;
 
-   flags = SVM_FLAG_SUPERVISOR_MODE;
-
-   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
-   if (IS_ERR(sva)) {
-   dev_warn(>pdev->dev,
-"iommu sva bind failed: %ld\n", PTR_ERR(sva));
-   return PTR_ERR(sva);
-   }
-
-   pasid = iommu_sva_get_pasid(sva);
-   if (pasid == IOMMU_PASID_INVALID) {
-   iommu_sva_unbind_device(sva);
-   return -ENODEV;
+   ret = iommu_attach_dma_pasid(>pdev->dev, );
+   if (ret) {
+   dev_err(>pdev->dev, "No DMA PASID %d\n", ret);
+   return ret;
}
-
-   idxd->sva = sva;
idxd->pasid = pasid;
-   dev_dbg(>pdev->dev, "system pasid: %u\n", pasid);
+
return 0;
 }
 
 static void idxd_disable_system_pasid(struct idxd_device *idxd)
 {
-
-   iommu_sva_unbind_device(idxd->sva);
-   idxd->sva = NULL;
+   iommu_detach_dma_pasid(>pdev->dev);
 }
 
 static int idxd_probe(struct idxd_device *idxd)
@@ -527,10 +514,7 @@ static int idxd_probe(struct idxd_device *idxd)
else
set_bit(IDXD_FLAG_PASID_ENABLED, >flags);
}
-   } else if (!sva) {
-   dev_warn(dev, "User forced SVA off via module param.\n");
}
-
idxd_read_caps(idxd);
idxd_read_table_offsets(idxd);
 
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index dfd549685c46..a48928973bd4 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -839,13 +839,6 @@ static ssize_t wq_name_store(struct device *dev,
if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
return -EINVAL;
 
-   /*
-* This is temporarily placed here until we have SVM support for
-* dmaengine.
-*/
-   if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
-   return -EOPNOTSUPP;
-
memset(wq->name, 0, WQ_NAME_SIZE + 1);
strncpy(wq->name, buf, WQ_NAME_SIZE);
strreplace(wq->name, '\n', '\0');
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/4] iommu: Add PASID support for DMA mapping API users

2022-05-10 Thread Jacob Pan
DMA mapping API is the de facto standard for in-kernel DMA. It operates
on a per device/RID basis which is not PASID-aware.

Some modern devices such as Intel Data Streaming Accelerator, PASID is
required for certain work submissions. To allow such devices use DMA
mapping API, we need the following functionalities:
1. Provide device a way to retrieve a PASID for work submission within
the kernel
2. Enable the kernel PASID on the IOMMU for the device
3. Attach the kernel PASID to the device's default DMA domain, let it
be IOVA or physical address in case of pass-through.

This patch introduces a driver facing API that enables DMA API
PASID usage. Once enabled, device drivers can continue to use DMA APIs as
is. There is no difference in dma_handle between without PASID and with
PASID.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dma-iommu.c | 107 ++
 include/linux/dma-iommu.h |   3 ++
 include/linux/iommu.h |   2 +
 3 files changed, 112 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1ca85d37eeab..5984f3129fa2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
phys_addr_t phys;
 };
 
+static DECLARE_IOASID_SET(iommu_dma_pasid);
+
 enum iommu_dma_cookie_type {
IOMMU_DMA_IOVA_COOKIE,
IOMMU_DMA_MSI_COOKIE,
@@ -370,6 +372,111 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
domain->iova_cookie = NULL;
 }
 
+/**
+ * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the device's
+ * DMA domain.
+ * @dev: Device to be enabled
+ * @pasid: The returned kernel PASID to be used for DMA
+ *
+ * DMA request with PASID will be mapped the same way as the legacy DMA.
+ * If the device is in pass-through, PASID will also pass-through. If the
+ * device is in IOVA, the PASID will point to the same IOVA page table.
+ *
+ * @return err code or 0 on success
+ */
+int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)
+{
+   struct iommu_domain *dom;
+   ioasid_t id, max;
+   int ret = 0;
+
+   dom = iommu_get_domain_for_dev(dev);
+   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
+   return -ENODEV;
+
+   /* Only support domain types that DMA API can be used */
+   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
+   dom->type == IOMMU_DOMAIN_BLOCKED) {
+   dev_warn(dev, "Invalid domain type %d", dom->type);
+   return -EPERM;
+   }
+
+   id = dom->pasid;
+   if (!id) {
+   /*
+* First device to use PASID in its DMA domain, allocate
+* a single PASID per DMA domain is all we need, it is also
+* good for performance when it comes down to IOTLB flush.
+*/
+   max = 1U << dev->iommu->pasid_bits;
+   if (!max)
+   return -EINVAL;
+
+   id = ioasid_alloc(_dma_pasid, 1, max, dev);
+   if (id == INVALID_IOASID)
+   return -ENOMEM;
+
+   dom->pasid = id;
+   atomic_set(>pasid_users, 1);
+   }
+
+   ret = dom->ops->attach_dev_pasid(dom, dev, id);
+   if (!ret) {
+   *pasid = id;
+   atomic_inc(>pasid_users);
+   return 0;
+   }
+
+   if (atomic_dec_and_test(>pasid_users)) {
+   ioasid_free(id);
+   dom->pasid = 0;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL(iommu_attach_dma_pasid);
+
+/**
+ * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID
+ * @dev:   Device's PASID DMA to be disabled
+ *
+ * It is the device driver's responsibility to ensure no more incoming DMA
+ * requests with the kernel PASID before calling this function. IOMMU driver
+ * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
+ * drained.
+ *
+ */
+void iommu_detach_dma_pasid(struct device *dev)
+{
+   struct iommu_domain *dom;
+   ioasid_t pasid;
+
+   dom = iommu_get_domain_for_dev(dev);
+   if (!dom || !dom->ops || !dom->ops->detach_dev_pasid) {
+   dev_warn(dev, "No ops for detaching PASID %u", pasid);
+   return;
+   }
+   /* Only support DMA API managed domain type */
+   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
+   dom->type == IOMMU_DOMAIN_BLOCKED) {
+   dev_err(dev, "Invalid domain type %d to detach DMA PASID %u\n",
+dom->type, pasid);
+   return;
+   }
+
+   pasid = dom->pasid;
+   if (!pasid) {
+   dev_err(dev, "No DMA PASID attached\n");
+   return;
+   }
+   dom->ops->detach_dev_pasid(dom, dev, pasid);
+   if (atomic_dec_and_test(>pasid_users)) {
+   ioasid_free(pasid);
+   dom->pasid = 0;
+   }
+}
+EXPORT_SYMBOL(iommu_detach_dma_pasid);
+
 /**
  * 

[PATCH v3 4/4] iommu/vt-d: Delete unused SVM flag

2022-05-10 Thread Jacob Pan
Supervisor PASID for SVA/SVM is no longer supported, delete the unused
flag.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c |  2 +-
 include/linux/intel-svm.h | 13 -
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 38c33cde177e..98ec77415770 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -750,7 +750,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 * to unbind the mm while any page faults are 
outstanding.
 */
svm = pasid_private_find(req->pasid);
-   if (IS_ERR_OR_NULL(svm) || (svm->flags & 
SVM_FLAG_SUPERVISOR_MODE))
+   if (IS_ERR_OR_NULL(svm))
goto bad_req;
}
 
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index b3b125b332aa..6835a665c195 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -13,17 +13,4 @@
 #define PRQ_RING_MASK  ((0x1000 << PRQ_ORDER) - 0x20)
 #define PRQ_DEPTH  ((0x1000 << PRQ_ORDER) >> 5)
 
-/*
- * The SVM_FLAG_SUPERVISOR_MODE flag requests a PASID which can be used only
- * for access to kernel addresses. No IOTLB flushes are automatically done
- * for kernel mappings; it is valid only for access to the kernel's static
- * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
- * A future API addition may permit the use of such ranges, by means of an
- * explicit IOTLB flush call (akin to the DMA API's unmap method).
- *
- * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
- * do such IOTLB flushes automatically.
- */
-#define SVM_FLAG_SUPERVISOR_MODE   BIT(0)
-
 #endif /* __INTEL_SVM_H__ */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 0/4] Enable PASID for DMA API users

2022-05-10 Thread Jacob Pan
Some modern accelerators such as Intel's Data Streaming Accelerator (DSA)
require PASID in DMA requests to be operational. Specifically, the work
submissions with ENQCMD on shared work queues require PASIDs. The use cases
include both user DMA with shared virtual addressing (SVA) and in-kernel
DMA similar to legacy DMA w/o PASID. Here we address the latter.

DMA mapping API is the de facto standard for in-kernel DMA. However, it
operates on a per device or Requester ID(RID) basis which is not
PASID-aware. To leverage DMA API for devices relies on PASIDs, this
patchset introduces the following APIs

1. A driver facing API that enables DMA API PASID usage:
iommu_enable_pasid_dma(struct device *dev, ioasid_t );

2. An IOMMU op that allows attaching device-domain-PASID generically (will
be used beyond DMA API PASID support)

Once PASID DMA is enabled and attached to the appropriate IOMMU domain,
device drivers can continue to use DMA APIs as-is. There is no difference
in terms of mapping in dma_handle between without PASID and with PASID.
The DMA mapping performed by IOMMU will be identical for both requests, let
it be IOVA or PA in case of pass-through.

In addition, this set converts DSA driver in-kernel DMA with PASID from SVA
lib to DMA API. There have been security and functional issues with the
kernel SVA approach:
(https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
The highlights are as the following:
 - The lack of IOTLB synchronization upon kernel page table updates.
   (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
 - Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use cases yet
where DMA engines need kernel virtual addresses for in-kernel DMA.

Subsequently, cleanup is done around the usage of sva_bind_device() for
in-kernel DMA. Removing special casing code in VT-d driver and tightening
SVA lib API.

This work and idea behind it is a collaboration with many people, many
thanks to Baolu Lu, Jason Gunthorpe, Dave Jiang, and others.


ChangeLog:
v3
- Rebased on "Baolu's SVA and IOPF refactoring" series v5.

(https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v5)
This version is significantly simplified by leveraging IOMMU domain
ops, attach_dev_pasid() op is implemented differently on a DMA domain
than on a SVA domain.
We currently have no need to support multiple PASIDs per DMA domain.
(https://lore.kernel.org/lkml/20220315142216.gv11...@nvidia.com/).
Removed PASID-device list from V2, a PASID field is introduced to
struct iommu_domain instead. It is intended for DMA requests with
PASID by all devices attached to the domain.

v2
- Do not reserve a special PASID for DMA API usage. Use IOASID
  allocation instead.
- Introduced a generic device-pasid-domain attachment IOMMU op.
  Replaced the DMA API only IOMMU op.
- Removed supervisor SVA support in VT-d
- Removed unused sva_bind_device parameters
- Use IOMMU specific data instead of struct device to store PASID
  info


Jacob Pan (4):
  iommu/vt-d: Implement domain ops for attach_dev_pasid
  iommu: Add PASID support for DMA mapping API users
  dmaengine: idxd: Use DMA API for in-kernel DMA with PASID
  iommu/vt-d: Delete unused SVM flag

 drivers/dma/idxd/idxd.h |   1 -
 drivers/dma/idxd/init.c |  34 +++-
 drivers/dma/idxd/sysfs.c|   7 ---
 drivers/iommu/dma-iommu.c   | 107 
 drivers/iommu/intel/iommu.c |  81 ++-
 drivers/iommu/intel/svm.c   |   2 +-
 include/linux/dma-iommu.h   |   3 +
 include/linux/intel-iommu.h |   1 +
 include/linux/intel-svm.h   |  13 -
 include/linux/iommu.h   |   2 +
 10 files changed, 202 insertions(+), 49 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-10 Thread Jacob Pan
On VT-d platforms with scalable mode enabled, devices issue DMA requests
with PASID need to attach PASIDs to given IOMMU domains. The attach
operation involves the following:
- Programming the PASID into the device's PASID table
- Tracking device domain and the PASID relationship
- Managing IOTLB and device TLB invalidations

This patch add attach_dev_pasid functions to the default domain ops which
is used by DMA and identity domain types. It could be extended to support
other domain types whenever necessary.

Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 81 -
 include/linux/intel-iommu.h |  1 +
 2 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a51b96fa9b3a..5408418f4f4b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1562,6 +1562,10 @@ static void __iommu_flush_dev_iotlb(struct 
device_domain_info *info,
 
sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
+   if (info->pasid) {
+   qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
+info->pasid, qdep, addr, mask);
+   }
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
   qdep, addr, mask);
 }
@@ -1591,6 +1595,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
unsigned int mask = ilog2(aligned_pages);
uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
u16 did = domain->iommu_did[iommu->seq_id];
+   struct iommu_domain *iommu_domain = >domain;
 
BUG_ON(pages == 0);
 
@@ -1599,6 +1604,9 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
 
if (domain_use_first_level(domain)) {
qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
+   /* flush additional kernel DMA PASIDs attached */
+   if (iommu_domain->pasid)
+   qi_flush_piotlb(iommu, did, iommu_domain->pasid, addr, 
pages, ih);
} else {
unsigned long bitmask = aligned_pages - 1;
 
@@ -4265,10 +4273,13 @@ static void __dmar_remove_one_dev_info(struct 
device_domain_info *info)
domain = info->domain;
 
if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
-   if (dev_is_pci(info->dev) && sm_supported(iommu))
+   if (dev_is_pci(info->dev) && sm_supported(iommu)) {
intel_pasid_tear_down_entry(iommu, info->dev,
PASID_RID2PASID, false);
-
+   if (info->pasid)
+   intel_pasid_tear_down_entry(iommu, info->dev,
+   info->pasid, false);
+   }
iommu_disable_dev_iotlb(info);
domain_context_clear(info);
intel_pasid_free_table(info->dev);
@@ -4912,6 +4923,70 @@ static void intel_iommu_iotlb_sync_map(struct 
iommu_domain *domain,
}
 }
 
+static int intel_iommu_attach_dev_pasid(struct iommu_domain *domain,
+   struct device *dev,
+   ioasid_t pasid)
+{
+   struct device_domain_info *info = dev_iommu_priv_get(dev);
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct intel_iommu *iommu = info->iommu;
+   unsigned long flags;
+   int ret = 0;
+
+   if (!sm_supported(iommu) || !info)
+   return -ENODEV;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   /*
+* If the same device already has a PASID attached, just return.
+* DMA layer will return the PASID value to the caller.
+*/
+   if (pasid != PASID_RID2PASID && info->pasid) {
+   if (info->pasid == pasid)
+   ret = 0;
+   else {
+   dev_warn(dev, "Cannot attach PASID %u, %u already 
attached\n",
+pasid, info->pasid);
+   ret = -EBUSY;
+   }
+   goto out_unlock_domain;
+   }
+
+   spin_lock(>lock);
+   if (hw_pass_through && domain_type_is_si(dmar_domain))
+   ret = intel_pasid_setup_pass_through(iommu, dmar_domain,
+dev, pasid);
+   else if (domain_use_first_level(dmar_domain))
+   ret = domain_setup_first_level(iommu, dmar_domain,
+  dev, pasid);
+   else
+   ret = intel_pasid_setup_second_level(iommu, dmar_domain,
+dev, pasid);
+
+   spin_unlock(>lock);
+out_unlock_domain:
+   spin_unlock_irqrestore(_domain_lock, flags);
+   if (!ret)
+   info->pasid = pasid;
+
+   return ret;
+}
+
+static void 

Re: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-10 Thread Steve Wahl
On Tue, May 10, 2022 at 01:16:26AM +, Tian, Kevin wrote:
> > From: Steve Wahl 
> > Sent: Friday, May 6, 2022 11:26 PM
> > 
> > On Fri, May 06, 2022 at 08:12:11AM +, Tian, Kevin wrote:
> > > > From: David Woodhouse 
> > > > Sent: Friday, May 6, 2022 3:17 PM
> > > >
> > > > On Fri, 2022-05-06 at 06:49 +, Tian, Kevin wrote:
> > > > > > From: Baolu Lu 
> > > > > >
> > > > > > > --- a/include/linux/dmar.h
> > > > > > > +++ b/include/linux/dmar.h
> > > > > > > @@ -19,7 +19,7 @@
> > > > > > >   struct acpi_dmar_header;
> > > > > > >
> > > > > > >   #ifdef  CONFIG_X86
> > > > > > > -# define DMAR_UNITS_SUPPORTEDMAX_IO_APICS
> > > > > > > +# define DMAR_UNITS_SUPPORTED640
> > > > > > >   #else
> > > > > > >   # defineDMAR_UNITS_SUPPORTED64
> > > > > > >   #endif
> > > > >
> > > > > ... is it necessary to permanently do 10x increase which wastes memory
> > > > > on most platforms which won't have such need.
> > > >
> > > > I was just looking at that. It mostly adds about 3½ KiB to each struct
> > > > dmar_domain.
> > > >
> > > > I think the only actual static array is the dmar_seq_ids bitmap which
> > > > grows to 640 *bits* which is fairly negligible, and the main growth is
> > > > that it adds about 3½ KiB to each struct dmar_domain for the
> > > > iommu_refcnt[] and iommu_did[] arrays.
> > >
> > > Thanks for the quick experiment! though the added material is
> > > negligible it's cleaner to me if having a way to configure it as
> > > discussed below.
> > >
> > > >
> > > > > Does it make more sense to have a configurable approach similar to
> > > > > CONFIG_NR_CPUS? or even better can we just replace those static
> > > > > arrays with dynamic allocation so removing this restriction
> > > > > completely?
> > > >
> > > > Hotplug makes that fun, but I suppose you only need to grow the array
> > > > in a given struct dmar_domain if you actually add a device to it that's
> > > > behind a newly added IOMMU. I don't know if the complexity of making it
> > > > fully dynamic is worth it though. We could make it a config option,
> > > > and/or a command line option (perhaps automatically derived from
> > > > CONFIG_NR_CPUS).
> > >
> > > either config option or command line option is OK to me. Probably
> > > the former is simpler given no need to dynamically expand the
> > > static array. btw though deriving from CONFIG_NR_CPUS could work
> > > in this case it is unclear why tying the two together is necessary in
> > > concept, e.g. is there guarantee that the number of IOMMUs must
> > > be smaller than the number of CPUs in a platform?
> > >
> > > >
> > > > If it wasn't for hotplug, I think we'd know the right number by the
> > > > time we actually need it anyway, wouldn't we? Can we have a heuristic
> > > > for how many DMAR units are likely to be hotplugged? Is it as simple as
> > > > the ratio of present to not-yet-present CPUs in MADT?
> > >
> > > Probably. But I don't have enough knowledge on DMAR hotplug to
> > > judge (e.g. whether it's strictly tied to CPU hotplug and if yes whether
> > > there could be multiple IOMMUs hotplugged together with a CPU
> > > socket)...
> > >
> > > Thanks
> > > Kevin
> > 
> > Would anyone be more comfortable if we only increase the limit where
> > MAXSMP is set?
> > 
> > i.e.
> > 
> > #if defined(CONFIG_X86) && defined(CONFIG_MAXSMP)
> > # defineDMAR_UNITS_SUPPORTED640
> > #elif defined(CONFIG_X86)
> > # defineDMAR_UNITS_SUPPORTEDMAX_IO_APICS
> > #else
> > # defineDMAR_UNITS_SUPPORTED64
> > #endif
> > 
> > Thank you all for your time looking at this.
> > 
> 
> This works for your own configuration but it's unclear whether other
> MAXSMP platforms have the exact same requirements (different
> number of sockets, different ratio of #iommus/#sockets, etc.). In any
> case since we are at it having a generic way to extend it makes more
> sense to me.

So, to be clear, what you would like to see would be Kconfig entries
to create a config option, say "NR_DMARS", set up so the default is:

 MAXSMP?  640
 X86_64?  128
 X86_32?  64
 other64

And DMAR_UNITS_SUPPORTED gets removed, and everywhere it was used we
use CONFIG_NR_DMARS in its place?

I can give that a shot but wanted to confirm this is what you'd want
first.

Thanks,

--> Steve

-- 
Steve Wahl, Hewlett Packard Enterprise
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-10 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 05:12:04PM +1000, David Gibson wrote:
> On Mon, May 09, 2022 at 11:00:41AM -0300, Jason Gunthorpe wrote:
> > On Mon, May 09, 2022 at 04:01:52PM +1000, David Gibson wrote:
> > 
> > > > The default iommu_domain that the iommu driver creates will be used
> > > > here, it is up to the iommu driver to choose something reasonable for
> > > > use by applications like DPDK. ie PPC should probably pick its biggest
> > > > x86-like aperture.
> > > 
> > > So, using the big aperture means a very high base IOVA
> > > (1<<59)... which means that it won't work at all if you want to attach
> > > any devices that aren't capable of 64-bit DMA.
> > 
> > I'd expect to include the 32 bit window too..
> 
> I'm not entirely sure what you mean.  Are you working on the
> assumption that we've extended to allowing multiple apertures, so we'd
> default to advertising both a small/low aperture and a large/high
> aperture?

Yes

> > No, this just makes it fragile in the other direction because now
> > userspace has to know what platform specific things to ask for *or it
> > doesn't work at all*. This is not a improvement for the DPDK cases.
> 
> Um.. no.  The idea is that userspace requests *what it needs*, not
> anything platform specific.  In the case of DPDK that would be nothing
> more than the (minimum) aperture size.  Nothing platform specific
> about that.

Except a 32 bit platform can only maybe do a < 4G aperture, a 64 bit
platform can do more, but it varies how much more, etc.

There is no constant value DPDK could stuff in this request, unless it
needs a really small amount of IOVA, like 1G or something.

> > It isn't like there is some hard coded value we can put into DPDK that
> > will work on every platform. So kernel must pick for DPDK, IMHO. I
> > don't see any feasible alternative.
> 
> Yes, hence *optionally specified* base address only.

Okay, so imagine we've already done this and DPDK is not optionally
specifying anything :)

The structs can be extended so we can add this as an input to creation
when a driver can implement it.

> > The ppc specific driver would be on the generic side of qemu in its
> > viommu support framework. There is lots of host driver optimization
> > possible here with knowledge of the underlying host iommu HW. It
> > should not be connected to the qemu target.
> 
> Thinking through this...
> 
> So, I guess we could have basically the same logic I'm suggesting be
> in the qemu backend iommu driver instead.  So the target side (machine
> type, strictly speaking) would request of the host side the apertures
> it needs, and the host side driver would see if it can do that, based
> on both specific knowledge of that driver and the query reponses.

Yes, this is what I'm thinking

> ppc on x86 should work with that.. at least if the x86 aperture is
> large enough to reach up to ppc's high window.  I guess we'd have the
> option here of using either the generic host driver or the
> x86-specific driver.  The latter would mean qemu maintaining an
> x86-format shadow of the io pagetables; mildly tedious, but doable.

The appeal of having userspace page tables is performance, so it is
tedious to shadow, but it should run faster.

> So... is there any way of backing out of this gracefully.  We could
> detach the device, but in the meantime ongoing DMA maps from
> previous devices might have failed.  

This sounds like a good use case for qemu to communicate ranges - but
as I mentioned before Alex said qemu didn't know the ranges..

> We could pre-attach the new device to a new IOAS and check the
> apertures there - but when we move it to the final IOAS is it
> guaranteed that the apertures will be (at least) the intersection of
> the old and new apertures, or is that just the probable outcome. 

Should be guarenteed

> Ok.. you convinced me.  As long as we have some way to handle the
> device hotplug case, we can work with this.

I like the communicate ranges for hotplug, so long as we can actually
implement it in qemu - I'm a bit unclear on that honestly.

> Ok, I see.  That can certainly be done.  I was really hoping we could
> have a working, though non-optimized, implementation using just the
> generic interface.

Oh, sure that should largely work as well too, this is just an
additional direction people may find interesting and helps explain why
qemu should have an iommu layer inside.

> "holes" versus "windows".  We can choose either one; I think "windows"
> rather than "holes" makes more sense, but it doesn't really matter.

Yes, I picked windows aka ranges for the uAPI - we translate the holes
from the groups into windows and intersect them with the apertures.

> > > Another approach would be to give the required apertures / pagesizes
> > > in the initial creation of the domain/IOAS.  In that case they would
> > > be static for the IOAS, as well as the underlying iommu_domains: any
> > > ATTACH which would be incompatible would fail.
> > 
> > This is the device-specific 

Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-10 Thread Nicolin Chen via iommu
On Tue, May 10, 2022 at 01:55:24PM -0300, Jason Gunthorpe wrote:
> This control causes the ARM SMMU drivers to choose a stage 2
> implementation for the IO pagetable (vs the stage 1 usual default),
> however this choice has no visible impact to the VFIO user. Further qemu
> never implemented this and no other userspace user is known.
> 
> The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
> new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
> SMMU translation services to the guest operating system" however the rest
> of the API to set the guest table pointer for the stage 1 was never
> completed, or at least never upstreamed, rendering this part useless dead
> code.
> 
> Since the current patches to enable nested translation, aka userspace page
> tables, rely on iommufd and will not use the enable_nesting()
> iommu_domain_op, remove this infrastructure. However, don't cut too deep
> into the SMMU drivers for now expecting the iommufd work to pick it up -
> we still need to create S2 IO page tables.
> 
> Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
> enable_nesting iommu_domain_op.
> 
> Just in-case there is some userspace using this continue to treat
> requesting it as a NOP, but do not advertise support any more.
> 
> Signed-off-by: Jason Gunthorpe 

Sanity-tested with qemu-system-aarch64 using "iommu=nested-smmuv3"
(unmerged feature) and "iommu=none" strings on top of vfio next.

Tested-by: Nicolin Chen 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] swiotlb: Max mapping size takes min align mask into account

2022-05-10 Thread Michael Kelley (LINUX) via iommu
From: Robin Murphy  Sent: Tuesday, May 10, 2022 9:34 AM
> 
> On 2022-05-10 15:21, Tianyu Lan wrote:
> > From: Tianyu Lan 
> >
> > swiotlb_find_slots() skips slots according to io tlb aligned mask
> > calculated from min aligned mask and original physical address
> > offset. This affects max mapping size. The mapping size can't
> > achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is
> > non-zero. This will cause system boot up failure in Hyper-V
> > Isolation VM where swiotlb force is enabled. Scsi layer use return
> > value of dma_max_mapping_size() to set max segment size and it
> > finally calls swiotlb_max_mapping_size(). Hyper-V storage driver
> > sets min align mask to 4k - 1. Scsi layer may pass 256k length of
> > request buffer with 0~4k offset and Hyper-V storage driver can't
> > get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't
> > find 256k length bounce buffer with offset. Make swiotlb_max_mapping
> > _size() take min align mask into account.
> 
> Hmm, this seems a bit pessimistic - the offset can vary per mapping, so
> it feels to me like it should really be the caller's responsibility to
> account for it if they're already involved enough to care about both
> constraints. But I'm not sure how practical that would be.

Tianyu and I discussed this prior to his submitting the patch.
Presumably dma_max_mapping_size() exists so that the higher
level blk-mq code can limit the size of I/O requests to something
that will "fit" in the swiotlb when bounce buffering is enabled.
Unfortunately, the current code is just giving the wrong answer
when the offset is non-zero.  The offset would be less than
PAGE_SIZE, so the impact would be dma_max_mapping_size()
returning 252 Kbytes instead of 256 Kbytes, but only for devices
where dma min align mask is set.  And any I/O sizes less than
252 Kbytes are unaffected even when dma min align mask is set. 
Net, the impact would be only in a fairly rare edge case.

Even on ARM64 with a 64K page size, the Hyper-V storage driver
is setting the dma min align mask to only 4K (which is correct because
the Hyper-V host uses a 4K page size even if the guest is using
something larger), so again the limit becomes 252 Kbytes instead
of 256 Kbytes, and any impact is rare.

As you mentioned, how else would a caller handle this situation?

Michael

> 
> Robin.
> 
> > Signed-off-by: Tianyu Lan 
> > ---
> >   kernel/dma/swiotlb.c | 13 -
> >   1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 73a41cec9e38..0d6684ca7eab 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -743,7 +743,18 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t
> paddr, size_t size,
> >
> >   size_t swiotlb_max_mapping_size(struct device *dev)
> >   {
> > -   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
> > +   int min_align_mask = dma_get_min_align_mask(dev);
> > +   int min_align = 0;
> > +
> > +   /*
> > +* swiotlb_find_slots() skips slots according to
> > +* min align mask. This affects max mapping size.
> > +* Take it into acount here.
> > +*/
> > +   if (min_align_mask)
> > +   min_align = roundup(min_align_mask, IO_TLB_SIZE);
> > +
> > +   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE - min_align;
> >   }
> >
> >   bool is_swiotlb_active(struct device *dev)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-10 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 06:52:06PM +0100, Robin Murphy wrote:
> On 2022-05-10 17:55, Jason Gunthorpe via iommu wrote:
> > This control causes the ARM SMMU drivers to choose a stage 2
> > implementation for the IO pagetable (vs the stage 1 usual default),
> > however this choice has no visible impact to the VFIO user. Further qemu
> > never implemented this and no other userspace user is known.
> > 
> > The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
> > new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
> > SMMU translation services to the guest operating system" however the rest
> > of the API to set the guest table pointer for the stage 1 was never
> > completed, or at least never upstreamed, rendering this part useless dead
> > code.
> > 
> > Since the current patches to enable nested translation, aka userspace page
> > tables, rely on iommufd and will not use the enable_nesting()
> > iommu_domain_op, remove this infrastructure. However, don't cut too deep
> > into the SMMU drivers for now expecting the iommufd work to pick it up -
> > we still need to create S2 IO page tables.
> > 
> > Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
> > enable_nesting iommu_domain_op.
> > 
> > Just in-case there is some userspace using this continue to treat
> > requesting it as a NOP, but do not advertise support any more.
> 
> I assume the nested translation/guest SVA patches that Eric and Vivek were
> working on pre-IOMMUFD made use of this, and given that they got quite far
> along, I wouldn't be too surprised if some eager cloud vendors might have
> even deployed something based on the patches off the list. 

With upstream there is no way to make use of this flag, if someone is
using it they have other out of tree kernel, vfio, kvm and qemu
patches to make it all work.

You can see how much is still needed in Eric's tree:

https://github.com/eauger/linux/commits/v5.15-rc7-nested-v16

> I can't help feeling a little wary about removing this until IOMMUFD
> can actually offer a functional replacement - is it in the way of
> anything upcoming?

>From an upstream perspective if someone has a patched kernel to
complete the feature, then they can patch this part in as well, we
should not carry dead code like this in the kernel and in the uapi.

It is not directly in the way, but this needs to get done at some
point, I'd rather just get it out of the way.

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-10 Thread Robin Murphy

On 2022-05-10 17:55, Jason Gunthorpe via iommu wrote:

This control causes the ARM SMMU drivers to choose a stage 2
implementation for the IO pagetable (vs the stage 1 usual default),
however this choice has no visible impact to the VFIO user. Further qemu
never implemented this and no other userspace user is known.

The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
SMMU translation services to the guest operating system" however the rest
of the API to set the guest table pointer for the stage 1 was never
completed, or at least never upstreamed, rendering this part useless dead
code.

Since the current patches to enable nested translation, aka userspace page
tables, rely on iommufd and will not use the enable_nesting()
iommu_domain_op, remove this infrastructure. However, don't cut too deep
into the SMMU drivers for now expecting the iommufd work to pick it up -
we still need to create S2 IO page tables.

Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
enable_nesting iommu_domain_op.

Just in-case there is some userspace using this continue to treat
requesting it as a NOP, but do not advertise support any more.


I assume the nested translation/guest SVA patches that Eric and Vivek 
were working on pre-IOMMUFD made use of this, and given that they got 
quite far along, I wouldn't be too surprised if some eager cloud vendors 
might have even deployed something based on the patches off the list. I 
can't help feeling a little wary about removing this until IOMMUFD can 
actually offer a functional replacement - is it in the way of anything 
upcoming?


Robin.


Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 16 
  drivers/iommu/iommu.c   | 10 --
  drivers/vfio/vfio_iommu_type1.c | 12 +---
  include/linux/iommu.h   |  3 ---
  include/uapi/linux/vfio.h   |  2 +-
  6 files changed, 2 insertions(+), 57 deletions(-)

It would probably make sense for this to go through the VFIO tree with Robin's
ack for the SMMU changes.

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8fd1..b901e8973bb4ea 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2724,21 +2724,6 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_enable_nesting(struct iommu_domain *domain)

-{
-   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-   int ret = 0;
-
-   mutex_lock(_domain->init_mutex);
-   if (smmu_domain->smmu)
-   ret = -EPERM;
-   else
-   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
-   mutex_unlock(_domain->init_mutex);
-
-   return ret;
-}
-
  static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
  {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2865,7 +2850,6 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
-   .enable_nesting = arm_smmu_enable_nesting,
.free   = arm_smmu_domain_free,
}
  };
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 568cce590ccc13..239e6f6585b48d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1507,21 +1507,6 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_enable_nesting(struct iommu_domain *domain)

-{
-   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-   int ret = 0;
-
-   mutex_lock(_domain->init_mutex);
-   if (smmu_domain->smmu)
-   ret = -EPERM;
-   else
-   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
-   mutex_unlock(_domain->init_mutex);
-
-   return ret;
-}
-
  static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks)
  {
@@ -1600,7 +1585,6 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
-   .enable_nesting = arm_smmu_enable_nesting,
.set_pgtable_quirks = arm_smmu_set_pgtable_quirks,
.free   = arm_smmu_domain_free,
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 

[PATCH] dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC

2022-05-10 Thread Mikulas Patocka
We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.

This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.

Signed-off-by: Mikulas Patocka 

---
 kernel/dma/debug.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/dma/debug.c
===
--- linux-2.6.orig/kernel/dma/debug.c   2022-03-30 14:39:00.0 +0200
+++ linux-2.6/kernel/dma/debug.c2022-05-09 16:32:07.0 +0200
@@ -448,7 +448,7 @@ void debug_dma_dump_mappings(struct devi
  * other hand, consumes a single dma_debug_entry, but inserts 'nents'
  * entries into the tree.
  */
-static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);
+static RADIX_TREE(dma_active_cacheline, GFP_ATOMIC);
 static DEFINE_SPINLOCK(radix_lock);
 #define ACTIVE_CACHELINE_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
 #define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-10 Thread Jason Gunthorpe via iommu
This control causes the ARM SMMU drivers to choose a stage 2
implementation for the IO pagetable (vs the stage 1 usual default),
however this choice has no visible impact to the VFIO user. Further qemu
never implemented this and no other userspace user is known.

The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
SMMU translation services to the guest operating system" however the rest
of the API to set the guest table pointer for the stage 1 was never
completed, or at least never upstreamed, rendering this part useless dead
code.

Since the current patches to enable nested translation, aka userspace page
tables, rely on iommufd and will not use the enable_nesting()
iommu_domain_op, remove this infrastructure. However, don't cut too deep
into the SMMU drivers for now expecting the iommufd work to pick it up -
we still need to create S2 IO page tables.

Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
enable_nesting iommu_domain_op.

Just in-case there is some userspace using this continue to treat
requesting it as a NOP, but do not advertise support any more.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 16 
 drivers/iommu/iommu.c   | 10 --
 drivers/vfio/vfio_iommu_type1.c | 12 +---
 include/linux/iommu.h   |  3 ---
 include/uapi/linux/vfio.h   |  2 +-
 6 files changed, 2 insertions(+), 57 deletions(-)

It would probably make sense for this to go through the VFIO tree with Robin's
ack for the SMMU changes.

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8fd1..b901e8973bb4ea 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2724,21 +2724,6 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
 }
 
-static int arm_smmu_enable_nesting(struct iommu_domain *domain)
-{
-   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-   int ret = 0;
-
-   mutex_lock(_domain->init_mutex);
-   if (smmu_domain->smmu)
-   ret = -EPERM;
-   else
-   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
-   mutex_unlock(_domain->init_mutex);
-
-   return ret;
-}
-
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2865,7 +2850,6 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
-   .enable_nesting = arm_smmu_enable_nesting,
.free   = arm_smmu_domain_free,
}
 };
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 568cce590ccc13..239e6f6585b48d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1507,21 +1507,6 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
 }
 
-static int arm_smmu_enable_nesting(struct iommu_domain *domain)
-{
-   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-   int ret = 0;
-
-   mutex_lock(_domain->init_mutex);
-   if (smmu_domain->smmu)
-   ret = -EPERM;
-   else
-   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
-   mutex_unlock(_domain->init_mutex);
-
-   return ret;
-}
-
 static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks)
 {
@@ -1600,7 +1585,6 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
-   .enable_nesting = arm_smmu_enable_nesting,
.set_pgtable_quirks = arm_smmu_set_pgtable_quirks,
.free   = arm_smmu_domain_free,
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 857d4c2fd1a206..f33c0d569a5d03 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2561,16 +2561,6 @@ static int __init iommu_init(void)
 }
 core_initcall(iommu_init);
 
-int iommu_enable_nesting(struct iommu_domain *domain)
-{
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-   if (!domain->ops->enable_nesting)
-   return -EINVAL;
-   return domain->ops->enable_nesting(domain);
-}
-EXPORT_SYMBOL_GPL(iommu_enable_nesting);
-
 int 

Re: [PATCH] swiotlb: Max mapping size takes min align mask into account

2022-05-10 Thread Robin Murphy

On 2022-05-10 15:21, Tianyu Lan wrote:

From: Tianyu Lan 

swiotlb_find_slots() skips slots according to io tlb aligned mask
calculated from min aligned mask and original physical address
offset. This affects max mapping size. The mapping size can't
achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is
non-zero. This will cause system boot up failure in Hyper-V
Isolation VM where swiotlb force is enabled. Scsi layer use return
value of dma_max_mapping_size() to set max segment size and it
finally calls swiotlb_max_mapping_size(). Hyper-V storage driver
sets min align mask to 4k - 1. Scsi layer may pass 256k length of
request buffer with 0~4k offset and Hyper-V storage driver can't
get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't
find 256k length bounce buffer with offset. Make swiotlb_max_mapping
_size() take min align mask into account.


Hmm, this seems a bit pessimistic - the offset can vary per mapping, so 
it feels to me like it should really be the caller's responsibility to 
account for it if they're already involved enough to care about both 
constraints. But I'm not sure how practical that would be.


Robin.


Signed-off-by: Tianyu Lan 
---
  kernel/dma/swiotlb.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 73a41cec9e38..0d6684ca7eab 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -743,7 +743,18 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
paddr, size_t size,
  
  size_t swiotlb_max_mapping_size(struct device *dev)

  {
-   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
+   int min_align_mask = dma_get_min_align_mask(dev);
+   int min_align = 0;
+
+   /*
+* swiotlb_find_slots() skips slots according to
+* min align mask. This affects max mapping size.
+* Take it into acount here.
+*/
+   if (min_align_mask)
+   min_align = roundup(min_align_mask, IO_TLB_SIZE);
+
+   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE - min_align;
  }
  
  bool is_swiotlb_active(struct device *dev)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] iommu/arm-smmu: Updates for 5.19

2022-05-10 Thread Will Deacon
Hi Joerg,

Please pull these Arm SMMU updates for 5.19. The bulk of it is just adding
new device-tree compatible strings for the existing drivers, but there
are some non-critical fixes in here as well. Please see the tag for more
details.

I used the previous fixes pull as a base for this so that you shouldn't
run into any conflicts with upstream.

Cheers,

Will

--->8

The following changes since commit 4a25f2ea0e030b2fc852c4059a50181bfc5b2f57:

  iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu (2022-04-22 
11:21:30 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
tags/arm-smmu-updates

for you to fetch changes up to 628bf55b620497a105f4963ee8fb84769f7e6bb4:

  iommu/arm-smmu: Force identity domains for legacy binding (2022-05-10 
12:01:31 +0100)


Arm SMMU updates for 5.19

- Add new Qualcomm device-tree compatible strings

- Add new Nvidia device-tree compatible string for Tegra234

- Fix UAF in SMMUv3 shared virtual addressing code

- Force identity-mapped domains for users of ye olde SMMU legacy binding

- Minor cleanups


Bjorn Andersson (2):
  dt-bindings: arm-smmu: Add compatible for Qualcomm SC8280XP
  iommu/arm-smmu-qcom: Add SC8280XP support

Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3-sva: Fix mm use-after-free

Robin Murphy (1):
  iommu/arm-smmu: Force identity domains for legacy binding

Rohit Agarwal (1):
  dt-bindings: arm-smmu: Add binding for SDX65 SMMU

Thierry Reding (3):
  dt-bindings: arm-smmu: Document nvidia,memory-controller property
  dt-bindings: arm-smmu: Add compatible for Tegra234 SOC
  iommu/arm-smmu: Support Tegra234 SMMU

Yang Yingliang (2):
  iommu/arm-smmu: fix possible null-ptr-deref in arm_smmu_device_probe()
  iommu/arm-smmu-v3: check return value after calling 
platform_get_resource()

 .../devicetree/bindings/iommu/arm,smmu.yaml| 25 --
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c| 13 +--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c|  2 ++
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  1 +
 drivers/iommu/arm/arm-smmu/arm-smmu.c  |  8 ---
 6 files changed, 44 insertions(+), 8 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.15 03/19] iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu

2022-05-10 Thread Sasha Levin
From: Ashish Mhetre 

[ Upstream commit 4a25f2ea0e030b2fc852c4059a50181bfc5b2f57 ]

Tegra194 and Tegra234 SoCs have the erratum that causes walk cache
entries to not be invalidated correctly. The problem is that the walk
cache index generated for IOVA is not same across translation and
invalidation requests. This is leading to page faults when PMD entry is
released during unmap and populated with new PTE table during subsequent
map request. Disabling large page mappings avoids the release of PMD
entry and avoid translations seeing stale PMD entry in walk cache.
Fix this by limiting the page mappings to PAGE_SIZE for Tegra194 and
Tegra234 devices. This is recommended fix from Tegra hardware design
team.

Acked-by: Robin Murphy 
Reviewed-by: Krishna Reddy 
Co-developed-by: Pritesh Raithatha 
Signed-off-by: Pritesh Raithatha 
Signed-off-by: Ashish Mhetre 
Link: https://lore.kernel.org/r/20220421081504.24678-1-amhe...@nvidia.com
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c | 30 
 1 file changed, 30 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
index 01e9b50b10a1..87bf522b9d2e 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
@@ -258,6 +258,34 @@ static void nvidia_smmu_probe_finalize(struct 
arm_smmu_device *smmu, struct devi
dev_name(dev), err);
 }
 
+static int nvidia_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg,
+   struct device *dev)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   const struct device_node *np = smmu->dev->of_node;
+
+   /*
+* Tegra194 and Tegra234 SoCs have the erratum that causes walk cache
+* entries to not be invalidated correctly. The problem is that the walk
+* cache index generated for IOVA is not same across translation and
+* invalidation requests. This is leading to page faults when PMD entry
+* is released during unmap and populated with new PTE table during
+* subsequent map request. Disabling large page mappings avoids the
+* release of PMD entry and avoid translations seeing stale PMD entry in
+* walk cache.
+* Fix this by limiting the page mappings to PAGE_SIZE on Tegra194 and
+* Tegra234.
+*/
+   if (of_device_is_compatible(np, "nvidia,tegra234-smmu") ||
+   of_device_is_compatible(np, "nvidia,tegra194-smmu")) {
+   smmu->pgsize_bitmap = PAGE_SIZE;
+   pgtbl_cfg->pgsize_bitmap = smmu->pgsize_bitmap;
+   }
+
+   return 0;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -268,10 +296,12 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.global_fault = nvidia_smmu_global_fault,
.context_fault = nvidia_smmu_context_fault,
.probe_finalize = nvidia_smmu_probe_finalize,
+   .init_context = nvidia_smmu_init_context,
 };
 
 static const struct arm_smmu_impl nvidia_smmu_single_impl = {
.probe_finalize = nvidia_smmu_probe_finalize,
+   .init_context = nvidia_smmu_init_context,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.17 04/21] iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu

2022-05-10 Thread Sasha Levin
From: Ashish Mhetre 

[ Upstream commit 4a25f2ea0e030b2fc852c4059a50181bfc5b2f57 ]

Tegra194 and Tegra234 SoCs have the erratum that causes walk cache
entries to not be invalidated correctly. The problem is that the walk
cache index generated for IOVA is not same across translation and
invalidation requests. This is leading to page faults when PMD entry is
released during unmap and populated with new PTE table during subsequent
map request. Disabling large page mappings avoids the release of PMD
entry and avoid translations seeing stale PMD entry in walk cache.
Fix this by limiting the page mappings to PAGE_SIZE for Tegra194 and
Tegra234 devices. This is recommended fix from Tegra hardware design
team.

Acked-by: Robin Murphy 
Reviewed-by: Krishna Reddy 
Co-developed-by: Pritesh Raithatha 
Signed-off-by: Pritesh Raithatha 
Signed-off-by: Ashish Mhetre 
Link: https://lore.kernel.org/r/20220421081504.24678-1-amhe...@nvidia.com
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c | 30 
 1 file changed, 30 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
index 01e9b50b10a1..87bf522b9d2e 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
@@ -258,6 +258,34 @@ static void nvidia_smmu_probe_finalize(struct 
arm_smmu_device *smmu, struct devi
dev_name(dev), err);
 }
 
+static int nvidia_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg,
+   struct device *dev)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   const struct device_node *np = smmu->dev->of_node;
+
+   /*
+* Tegra194 and Tegra234 SoCs have the erratum that causes walk cache
+* entries to not be invalidated correctly. The problem is that the walk
+* cache index generated for IOVA is not same across translation and
+* invalidation requests. This is leading to page faults when PMD entry
+* is released during unmap and populated with new PTE table during
+* subsequent map request. Disabling large page mappings avoids the
+* release of PMD entry and avoid translations seeing stale PMD entry in
+* walk cache.
+* Fix this by limiting the page mappings to PAGE_SIZE on Tegra194 and
+* Tegra234.
+*/
+   if (of_device_is_compatible(np, "nvidia,tegra234-smmu") ||
+   of_device_is_compatible(np, "nvidia,tegra194-smmu")) {
+   smmu->pgsize_bitmap = PAGE_SIZE;
+   pgtbl_cfg->pgsize_bitmap = smmu->pgsize_bitmap;
+   }
+
+   return 0;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -268,10 +296,12 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.global_fault = nvidia_smmu_global_fault,
.context_fault = nvidia_smmu_context_fault,
.probe_finalize = nvidia_smmu_probe_finalize,
+   .init_context = nvidia_smmu_init_context,
 };
 
 static const struct arm_smmu_impl nvidia_smmu_single_impl = {
.probe_finalize = nvidia_smmu_probe_finalize,
+   .init_context = nvidia_smmu_init_context,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-10 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 02:17:34PM +0800, Lu Baolu wrote:

> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to mm_users
> + * @drvdata: opaque data pointer to pass to bind callback
> + *
> + * Create a bond between device and address space, allowing the device to 
> access
> + * the mm using the returned PASID. If a bond already exists between @device 
> and
> + * @mm, it is returned and an additional reference is taken. Caller must call
> + * iommu_sva_unbind_device() to release each reference.
> + *
> + * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
> + * initialize the required SVA features.
> + *
> + * On error, returns an ERR_PTR value.
> + */
> +struct iommu_sva *
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void 
> *drvdata)
> +{
> + int ret = -EINVAL;
> + struct iommu_sva *handle;
> + struct iommu_domain *domain;
> +
> + /*
> +  * TODO: Remove the drvdata parameter after kernel PASID support is
> +  * enabled for the idxd driver.
> +  */
> + if (drvdata)
> + return ERR_PTR(-EOPNOTSUPP);

Why is this being left behind? Clean up the callers too please.

> + /* Allocate mm->pasid if necessary. */
> + ret = iommu_sva_alloc_pasid(mm, 1, (1U << dev->iommu->pasid_bits) - 1);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + mutex_lock(_sva_lock);
> + /* Search for an existing bond. */
> + handle = xa_load(>iommu->sva_bonds, mm->pasid);
> + if (handle) {
> + refcount_inc(>users);
> + goto out_success;
> + }

How can there be an existing bond?

dev->iommu is per-device

The device_group_immutable_singleton() insists on a single device
group

Basically 'sva_bonds' is the same thing as the group->pasid_array.

Assuming we leave room for multi-device groups this logic should just
be

group = iommu_group_get(dev);
if (!group)
return -ENODEV;

mutex_lock(>mutex);
domain = xa_load(>pasid_array, mm->pasid);
if (!domain || domain->type != IOMMU_DOMAIN_SVA || domain->mm != mm)
domain = iommu_sva_alloc_domain(dev, mm);

?

And stick the refcount in the sva_domain

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 05/12] iommu/vt-d: Remove SVM_FLAG_SUPERVISOR_MODE support

2022-05-10 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 02:17:31PM +0800, Lu Baolu wrote:
> The current kernel DMA with PASID support is based on the SVA with a flag
> SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address
> space to a PASID of the device. The device driver programs the device with
> kernel virtual address (KVA) for DMA access. There have been security and
> functional issues with this approach:
> 
> - The lack of IOTLB synchronization upon kernel page table updates.
>   (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
> - Other than slight more protection, using kernel virtual address (KVA)
>   has little advantage over physical address. There are also no use
>   cases yet where DMA engines need kernel virtual addresses for in-kernel
>   DMA.
> 
> This removes SVM_FLAG_SUPERVISOR_MODE support in the Intel IOMMU driver.
> The device driver is suggested to handle kernel DMA with PASID through
> the kernel DMA APIs.
> 
> Link: https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/
> Signed-off-by: Jacob Pan 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/svm.c | 53 +--
>  1 file changed, 12 insertions(+), 41 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/12] iommu: Add pasid_bits field in struct dev_iommu

2022-05-10 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 02:17:28PM +0800, Lu Baolu wrote:

>  int iommu_device_register(struct iommu_device *iommu,
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 627a3ed5ee8f..afc63fce6107 100644
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2681,6 +2681,8 @@ static struct iommu_device 
> *arm_smmu_probe_device(struct device *dev)
>   smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
>   master->stall_enabled = true;
>  
> + dev->iommu->pasid_bits = master->ssid_bits;
>   return >iommu;
>  
>  err_free_master:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 2990f80c5e08..99643f897f26 100644
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4624,8 +4624,11 @@ static struct iommu_device 
> *intel_iommu_probe_device(struct device *dev)
>   if (pasid_supported(iommu)) {
>   int features = pci_pasid_features(pdev);
>  
> - if (features >= 0)
> + if (features >= 0) {
>   info->pasid_supported = features | 1;
> + dev->iommu->pasid_bits =
> + fls(pci_max_pasids(pdev)) - 1;
> + }

It is not very nice that both the iommu drivers have to duplicate the
code to read the pasid capability out of the PCI device.

IMHO it would make more sense for the iommu layer to report the
capability of its own HW block only, and for the core code to figure
out the master's limitation using a bus-specific approach.

It is also unfortunate that the enable/disable pasid is inside the
iommu driver as well - ideally the PCI driver itself would do this
when it knows it wants to use PASIDs.

The ordering interaction with ATS makes this look quite annoying
though. :(

I'm also not convinced individual IOMMU drivers should be forcing ATS
on, there are performance and functional implications here. Using ATS
or not is possibly best left as an administrator policy controlled by
the core code. Again we seem to have some mess.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] swiotlb: Max mapping size takes min align mask into account

2022-05-10 Thread Tianyu Lan
From: Tianyu Lan 

swiotlb_find_slots() skips slots according to io tlb aligned mask
calculated from min aligned mask and original physical address
offset. This affects max mapping size. The mapping size can't
achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is
non-zero. This will cause system boot up failure in Hyper-V
Isolation VM where swiotlb force is enabled. Scsi layer use return
value of dma_max_mapping_size() to set max segment size and it
finally calls swiotlb_max_mapping_size(). Hyper-V storage driver
sets min align mask to 4k - 1. Scsi layer may pass 256k length of
request buffer with 0~4k offset and Hyper-V storage driver can't
get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't
find 256k length bounce buffer with offset. Make swiotlb_max_mapping
_size() take min align mask into account.

Signed-off-by: Tianyu Lan 
---
 kernel/dma/swiotlb.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 73a41cec9e38..0d6684ca7eab 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -743,7 +743,18 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
paddr, size_t size,
 
 size_t swiotlb_max_mapping_size(struct device *dev)
 {
-   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
+   int min_align_mask = dma_get_min_align_mask(dev);
+   int min_align = 0;
+
+   /*
+* swiotlb_find_slots() skips slots according to
+* min align mask. This affects max mapping size.
+* Take it into acount here.
+*/
+   if (min_align_mask)
+   min_align = roundup(min_align_mask, IO_TLB_SIZE);
+
+   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE - min_align;
 }
 
 bool is_swiotlb_active(struct device *dev)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-10 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 02:17:29PM +0800, Lu Baolu wrote:

> This adds a pair of common domain ops for this purpose and adds helpers
> to attach/detach a domain to/from a {device, PASID}. 

I wonder if this should not have a detach op - after discussing with
Robin we can see that detach_dev is not used in updated
drivers. Instead attach_dev acts as 'set_domain'

So, it would be more symmetrical if attaching a blocking_domain to the
PASID was the way to 'detach'.

This could be made straightforward by following the sketch I showed to
have a static, global blocing_domain and providing a pointer to it in
struct iommu_ops

Then 'detach pasid' is:

iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev, pasid);

And we move away from the notion of 'detach' and in the direction that
everything continuously has a domain set. PASID would logically
default to blocking_domain, though we wouldn't track this anywhere.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 2/7] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-10 Thread James Clark



On 10/05/2022 12:18, Yicong Yang wrote:
> On 2022/5/10 17:46, James Clark wrote:
>>
>>
>> On 07/04/2022 13:58, Yicong Yang wrote:
>>> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
>>> Endpoint(RCiEP) device, providing the capability to dynamically monitor and
>>> tune the PCIe traffic, and trace the TLP headers.
>>>
>>> Add the driver for the device to enable the trace function. Register PMU
>>> device of PTT trace, then users can use trace through perf command. The
>>> driver makes use of perf AUX trace and support following events to
>>> configure the trace:
>>>
>>> - filter: select Root port or Endpoint to trace
>>> - type: select the type of traced TLP headers
>>> - direction: select the direction of traced TLP headers
>>> - format: select the data format of the traced TLP headers
>>>
>>> This patch adds the driver part of PTT trace. The perf command support of
>>> PTT trace is added in the following patch.
>>>
>>> Signed-off-by: Yicong Yang 
>>> Reviewed-by: Jonathan Cameron 
>>> ---
>>>  drivers/Makefile |   1 +
>>>  drivers/hwtracing/Kconfig|   2 +
>>>  drivers/hwtracing/ptt/Kconfig|  12 +
>>>  drivers/hwtracing/ptt/Makefile   |   2 +
>>>  drivers/hwtracing/ptt/hisi_ptt.c | 874 +++
>>>  drivers/hwtracing/ptt/hisi_ptt.h | 166 ++
>>>  6 files changed, 1057 insertions(+)
>>>  create mode 100644 drivers/hwtracing/ptt/Kconfig
>>>  create mode 100644 drivers/hwtracing/ptt/Makefile
>>>  create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c
>>>  create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h
>>>
>>> diff --git a/drivers/Makefile b/drivers/Makefile
>>> index 020780b6b4d2..662d50599467 100644
>>> --- a/drivers/Makefile
>>> +++ b/drivers/Makefile
>>> @@ -175,6 +175,7 @@ obj-$(CONFIG_USB4)  += thunderbolt/
>>>  obj-$(CONFIG_CORESIGHT)+= hwtracing/coresight/
>>>  obj-y  += hwtracing/intel_th/
>>>  obj-$(CONFIG_STM)  += hwtracing/stm/
>>> +obj-$(CONFIG_HISI_PTT) += hwtracing/ptt/
>>>  obj-$(CONFIG_ANDROID)  += android/
>>>  obj-$(CONFIG_NVMEM)+= nvmem/
>>>  obj-$(CONFIG_FPGA) += fpga/
>>> diff --git a/drivers/hwtracing/Kconfig b/drivers/hwtracing/Kconfig
>>> index 13085835a636..911ee977103c 100644
>>> --- a/drivers/hwtracing/Kconfig
>>> +++ b/drivers/hwtracing/Kconfig
>>> @@ -5,4 +5,6 @@ source "drivers/hwtracing/stm/Kconfig"
>>>  
>>>  source "drivers/hwtracing/intel_th/Kconfig"
>>>  
>>> +source "drivers/hwtracing/ptt/Kconfig"
>>> +
>>>  endmenu
>>> diff --git a/drivers/hwtracing/ptt/Kconfig b/drivers/hwtracing/ptt/Kconfig
>>> new file mode 100644
>>> index ..8902a6f27563
>>> --- /dev/null
>>> +++ b/drivers/hwtracing/ptt/Kconfig
>>> @@ -0,0 +1,12 @@
>>> +# SPDX-License-Identifier: GPL-2.0-only
>>> +config HISI_PTT
>>> +   tristate "HiSilicon PCIe Tune and Trace Device"
>>> +   depends on ARM64 || (COMPILE_TEST && 64BIT)
>>> +   depends on PCI && HAS_DMA && HAS_IOMEM && PERF_EVENTS
>>> +   help
>>> + HiSilicon PCIe Tune and Trace Device exists as a PCIe RCiEP
>>> + device, and it provides support for PCIe traffic tuning and
>>> + tracing TLP headers to the memory.
>>> +
>>> + This driver can also be built as a module. If so, the module
>>> + will be called hisi_ptt.
>>> diff --git a/drivers/hwtracing/ptt/Makefile b/drivers/hwtracing/ptt/Makefile
>>> new file mode 100644
>>> index ..908c09a98161
>>> --- /dev/null
>>> +++ b/drivers/hwtracing/ptt/Makefile
>>> @@ -0,0 +1,2 @@
>>> +# SPDX-License-Identifier: GPL-2.0
>>> +obj-$(CONFIG_HISI_PTT) += hisi_ptt.o
>>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c 
>>> b/drivers/hwtracing/ptt/hisi_ptt.c
>>> new file mode 100644
>>> index ..242b41870380
>>> --- /dev/null
>>> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
>>> @@ -0,0 +1,874 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Driver for HiSilicon PCIe tune and trace device
>>> + *
>>> + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd.
>>> + * Author: Yicong Yang 
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include "hisi_ptt.h"
>>> +
>>> +static u16 hisi_ptt_get_filter_val(struct pci_dev *pdev)
>>> +{
>>> +   if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
>>> +   return BIT(HISI_PCIE_CORE_PORT_ID(PCI_SLOT(pdev->devfn)));
>>> +
>>> +   return PCI_DEVID(pdev->bus->number, pdev->devfn);
>>> +}
>>> +
>>> +static bool hisi_ptt_wait_trace_hw_idle(struct hisi_ptt *hisi_ptt)
>>> +{
>>> +   u32 val;
>>> +
>>> +   return !readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_STS,
>>> + val, val & HISI_PTT_TRACE_IDLE,
>>> + HISI_PTT_WAIT_POLL_INTERVAL_US,
>>> + 

Re: [PATCH v7 5/7] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

2022-05-10 Thread James Clark



On 07/04/2022 13:58, Yicong Yang wrote:
> From: Qi Liu 
> 
> 'perf record' and 'perf report --dump-raw-trace' supported in this
> patch.
> 
> Example usage:
> 
> Output will contain raw PTT data and its textual representation, such
> as:
> 
> 0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x40  offset: 0
> ref: 0xa5d50c725  idx: 0  tid: -1  cpu: 0
> .
> . ... HISI PTT data: size 4194304 bytes
> .  : 00 00 00 00 Prefix
> .  0004: 08 20 00 60 Header DW0
> .  0008: ff 02 00 01 Header DW1
> .  000c: 20 08 00 00 Header DW2
> .  0010: 10 e7 44 ab Header DW3
> .  0014: 2a a8 1e 01 Time
> .  0020: 00 00 00 00 Prefix
> .  0024: 01 00 00 60 Header DW0
> .  0028: 0f 1e 00 01 Header DW1
> .  002c: 04 00 00 00 Header DW2
> .  0030: 40 00 81 02 Header DW3
> .  0034: ee 02 00 00 Time
> 
> 
> Signed-off-by: Qi Liu 
> Signed-off-by: Yicong Yang 
> ---
>  tools/perf/arch/arm/util/auxtrace.c   |  76 +-
>  tools/perf/arch/arm/util/pmu.c|   3 +
>  tools/perf/arch/arm64/util/Build  |   2 +-
>  tools/perf/arch/arm64/util/hisi_ptt.c | 195 
>  tools/perf/util/Build |   2 +
>  tools/perf/util/auxtrace.c|   4 +
>  tools/perf/util/auxtrace.h|   1 +
>  tools/perf/util/hisi-ptt-decoder/Build|   1 +
>  .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.c   | 170 ++
>  .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.h   |  28 +++
>  tools/perf/util/hisi_ptt.c| 218 ++
>  tools/perf/util/hisi_ptt.h|  28 +++
>  12 files changed, 724 insertions(+), 4 deletions(-)
>  create mode 100644 tools/perf/arch/arm64/util/hisi_ptt.c
>  create mode 100644 tools/perf/util/hisi-ptt-decoder/Build
>  create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.c
>  create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.h
>  create mode 100644 tools/perf/util/hisi_ptt.c
>  create mode 100644 tools/perf/util/hisi_ptt.h
> 
> diff --git a/tools/perf/arch/arm/util/auxtrace.c 
> b/tools/perf/arch/arm/util/auxtrace.c
> index 5fc6a2a3dbc5..393f5757c039 100644
> --- a/tools/perf/arch/arm/util/auxtrace.c
> +++ b/tools/perf/arch/arm/util/auxtrace.c
> @@ -4,9 +4,11 @@
>   * Author: Mathieu Poirier 
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "../../../util/auxtrace.h"
>  #include "../../../util/debug.h"
> @@ -14,6 +16,7 @@
>  #include "../../../util/pmu.h"
>  #include "cs-etm.h"
>  #include "arm-spe.h"
> +#include "hisi_ptt.h"
>  
>  static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
>  {
> @@ -50,6 +53,58 @@ static struct perf_pmu **find_all_arm_spe_pmus(int 
> *nr_spes, int *err)
>   return arm_spe_pmus;
>  }
>  
> +static struct perf_pmu **find_all_hisi_ptt_pmus(int *nr_ptts, int *err)
> +{
> + const char *sysfs = sysfs__mountpoint();
> + struct perf_pmu **hisi_ptt_pmus = NULL;
> + struct dirent *dent;
> + char path[PATH_MAX];
> + DIR *dir = NULL;
> + int idx = 0;
> +
> + snprintf(path, PATH_MAX, "%s" EVENT_SOURCE_DEVICE_PATH, sysfs);
> + dir = opendir(path);
> + if (!dir) {
> + pr_err("can't read directory '%s'\n", EVENT_SOURCE_DEVICE_PATH);
> + *err = -EINVAL;
> + goto out;
> + }
> +
> + while ((dent = readdir(dir))) {
> + if (strstr(dent->d_name, HISI_PTT_PMU_NAME))
> + (*nr_ptts)++;
> + }
> +
> + if (!(*nr_ptts))
> + goto out;
> +
> + hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * (*nr_ptts));
> + if (!hisi_ptt_pmus) {
> + pr_err("hisi_ptt alloc failed\n");
> + *err = -ENOMEM;
> + goto out;
> + }
> +
> + rewinddir(dir);
> + while ((dent = readdir(dir))) {
> + if (strstr(dent->d_name, HISI_PTT_PMU_NAME) && idx < 
> (*nr_ptts)) {
> + hisi_ptt_pmus[idx] = perf_pmu__find(dent->d_name);
> + if (hisi_ptt_pmus[idx]) {
> + pr_debug2("%s %d: hisi_ptt_pmu %d type %d name 
> %s\n",
> + __func__, __LINE__, idx,
> + hisi_ptt_pmus[idx]->type,
> + hisi_ptt_pmus[idx]->name);
> + idx++;
> + }
> +
> + }
> + }
> +
> +out:
> + closedir(dir);
> + return hisi_ptt_pmus;
> +}
> +
>  struct auxtrace_record
>  

Re: [PATCH v7 2/7] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-10 Thread James Clark



On 07/04/2022 13:58, Yicong Yang wrote:
> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
> Endpoint(RCiEP) device, providing the capability to dynamically monitor and
> tune the PCIe traffic, and trace the TLP headers.
> 
> Add the driver for the device to enable the trace function. Register PMU
> device of PTT trace, then users can use trace through perf command. The
> driver makes use of perf AUX trace and support following events to
> configure the trace:
> 
> - filter: select Root port or Endpoint to trace
> - type: select the type of traced TLP headers
> - direction: select the direction of traced TLP headers
> - format: select the data format of the traced TLP headers
> 
> This patch adds the driver part of PTT trace. The perf command support of
> PTT trace is added in the following patch.
> 
> Signed-off-by: Yicong Yang 
> Reviewed-by: Jonathan Cameron 
> ---
>  drivers/Makefile |   1 +
>  drivers/hwtracing/Kconfig|   2 +
>  drivers/hwtracing/ptt/Kconfig|  12 +
>  drivers/hwtracing/ptt/Makefile   |   2 +
>  drivers/hwtracing/ptt/hisi_ptt.c | 874 +++
>  drivers/hwtracing/ptt/hisi_ptt.h | 166 ++
>  6 files changed, 1057 insertions(+)
>  create mode 100644 drivers/hwtracing/ptt/Kconfig
>  create mode 100644 drivers/hwtracing/ptt/Makefile
>  create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c
>  create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h
> 
> diff --git a/drivers/Makefile b/drivers/Makefile
> index 020780b6b4d2..662d50599467 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -175,6 +175,7 @@ obj-$(CONFIG_USB4)+= thunderbolt/
>  obj-$(CONFIG_CORESIGHT)  += hwtracing/coresight/
>  obj-y+= hwtracing/intel_th/
>  obj-$(CONFIG_STM)+= hwtracing/stm/
> +obj-$(CONFIG_HISI_PTT)   += hwtracing/ptt/
>  obj-$(CONFIG_ANDROID)+= android/
>  obj-$(CONFIG_NVMEM)  += nvmem/
>  obj-$(CONFIG_FPGA)   += fpga/
> diff --git a/drivers/hwtracing/Kconfig b/drivers/hwtracing/Kconfig
> index 13085835a636..911ee977103c 100644
> --- a/drivers/hwtracing/Kconfig
> +++ b/drivers/hwtracing/Kconfig
> @@ -5,4 +5,6 @@ source "drivers/hwtracing/stm/Kconfig"
>  
>  source "drivers/hwtracing/intel_th/Kconfig"
>  
> +source "drivers/hwtracing/ptt/Kconfig"
> +
>  endmenu
> diff --git a/drivers/hwtracing/ptt/Kconfig b/drivers/hwtracing/ptt/Kconfig
> new file mode 100644
> index ..8902a6f27563
> --- /dev/null
> +++ b/drivers/hwtracing/ptt/Kconfig
> @@ -0,0 +1,12 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config HISI_PTT
> + tristate "HiSilicon PCIe Tune and Trace Device"
> + depends on ARM64 || (COMPILE_TEST && 64BIT)
> + depends on PCI && HAS_DMA && HAS_IOMEM && PERF_EVENTS
> + help
> +   HiSilicon PCIe Tune and Trace Device exists as a PCIe RCiEP
> +   device, and it provides support for PCIe traffic tuning and
> +   tracing TLP headers to the memory.
> +
> +   This driver can also be built as a module. If so, the module
> +   will be called hisi_ptt.
> diff --git a/drivers/hwtracing/ptt/Makefile b/drivers/hwtracing/ptt/Makefile
> new file mode 100644
> index ..908c09a98161
> --- /dev/null
> +++ b/drivers/hwtracing/ptt/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_HISI_PTT) += hisi_ptt.o
> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c 
> b/drivers/hwtracing/ptt/hisi_ptt.c
> new file mode 100644
> index ..242b41870380
> --- /dev/null
> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
> @@ -0,0 +1,874 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Driver for HiSilicon PCIe tune and trace device
> + *
> + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd.
> + * Author: Yicong Yang 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "hisi_ptt.h"
> +
> +static u16 hisi_ptt_get_filter_val(struct pci_dev *pdev)
> +{
> + if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
> + return BIT(HISI_PCIE_CORE_PORT_ID(PCI_SLOT(pdev->devfn)));
> +
> + return PCI_DEVID(pdev->bus->number, pdev->devfn);
> +}
> +
> +static bool hisi_ptt_wait_trace_hw_idle(struct hisi_ptt *hisi_ptt)
> +{
> + u32 val;
> +
> + return !readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_STS,
> +   val, val & HISI_PTT_TRACE_IDLE,
> +   HISI_PTT_WAIT_POLL_INTERVAL_US,
> +   HISI_PTT_WAIT_TRACE_TIMEOUT_US);
> +}
> +
> +static bool hisi_ptt_wait_dma_reset_done(struct hisi_ptt *hisi_ptt)
> +{
> + u32 val;
> +
> + return !readl_poll_timeout_atomic(hisi_ptt->iobase + 
> HISI_PTT_TRACE_WR_STS,
> +   val, !val, 
> HISI_PTT_RESET_POLL_INTERVAL_US,

Re: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

2022-05-10 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 01:38:26AM +, Tian, Kevin wrote:

> > However, tt costs nothing to have dirty tracking as long as all iommus
> > support it in the system - which seems to be the normal case today.
> > 
> > We should just always turn it on at this point.
> 
> Then still need a way to report " all iommus support it in the system"
> to userspace since many old systems don't support it at all. 

Userspace can query the iommu_domain directly, or 'try and fail' to
turn on tracking.

A device capability flag is useless without a control knob to request
a domain is created with tracking, and we don't have that, or a reason
to add that.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 2/7] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-10 Thread Yicong Yang via iommu
On 2022/5/10 20:54, James Clark wrote:
> 
> 
> On 10/05/2022 12:18, Yicong Yang wrote:
>> On 2022/5/10 17:46, James Clark wrote:
>>>
>>>
>>> On 07/04/2022 13:58, Yicong Yang wrote:
 HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
 Endpoint(RCiEP) device, providing the capability to dynamically monitor and
 tune the PCIe traffic, and trace the TLP headers.

 Add the driver for the device to enable the trace function. Register PMU
 device of PTT trace, then users can use trace through perf command. The
 driver makes use of perf AUX trace and support following events to
 configure the trace:

 - filter: select Root port or Endpoint to trace
 - type: select the type of traced TLP headers
 - direction: select the direction of traced TLP headers
 - format: select the data format of the traced TLP headers

 This patch adds the driver part of PTT trace. The perf command support of
 PTT trace is added in the following patch.

 Signed-off-by: Yicong Yang 
 Reviewed-by: Jonathan Cameron 
 ---
  drivers/Makefile |   1 +
  drivers/hwtracing/Kconfig|   2 +
  drivers/hwtracing/ptt/Kconfig|  12 +
  drivers/hwtracing/ptt/Makefile   |   2 +
  drivers/hwtracing/ptt/hisi_ptt.c | 874 +++
  drivers/hwtracing/ptt/hisi_ptt.h | 166 ++
  6 files changed, 1057 insertions(+)
  create mode 100644 drivers/hwtracing/ptt/Kconfig
  create mode 100644 drivers/hwtracing/ptt/Makefile
  create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c
  create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h

 diff --git a/drivers/Makefile b/drivers/Makefile
 index 020780b6b4d2..662d50599467 100644
 --- a/drivers/Makefile
 +++ b/drivers/Makefile
 @@ -175,6 +175,7 @@ obj-$(CONFIG_USB4) += thunderbolt/
  obj-$(CONFIG_CORESIGHT)   += hwtracing/coresight/
  obj-y += hwtracing/intel_th/
  obj-$(CONFIG_STM) += hwtracing/stm/
 +obj-$(CONFIG_HISI_PTT)+= hwtracing/ptt/
  obj-$(CONFIG_ANDROID) += android/
  obj-$(CONFIG_NVMEM)   += nvmem/
  obj-$(CONFIG_FPGA)+= fpga/
 diff --git a/drivers/hwtracing/Kconfig b/drivers/hwtracing/Kconfig
 index 13085835a636..911ee977103c 100644
 --- a/drivers/hwtracing/Kconfig
 +++ b/drivers/hwtracing/Kconfig
 @@ -5,4 +5,6 @@ source "drivers/hwtracing/stm/Kconfig"
  
  source "drivers/hwtracing/intel_th/Kconfig"
  
 +source "drivers/hwtracing/ptt/Kconfig"
 +
  endmenu
 diff --git a/drivers/hwtracing/ptt/Kconfig b/drivers/hwtracing/ptt/Kconfig
 new file mode 100644
 index ..8902a6f27563
 --- /dev/null
 +++ b/drivers/hwtracing/ptt/Kconfig
 @@ -0,0 +1,12 @@
 +# SPDX-License-Identifier: GPL-2.0-only
 +config HISI_PTT
 +  tristate "HiSilicon PCIe Tune and Trace Device"
 +  depends on ARM64 || (COMPILE_TEST && 64BIT)
 +  depends on PCI && HAS_DMA && HAS_IOMEM && PERF_EVENTS
 +  help
 +HiSilicon PCIe Tune and Trace Device exists as a PCIe RCiEP
 +device, and it provides support for PCIe traffic tuning and
 +tracing TLP headers to the memory.
 +
 +This driver can also be built as a module. If so, the module
 +will be called hisi_ptt.
 diff --git a/drivers/hwtracing/ptt/Makefile 
 b/drivers/hwtracing/ptt/Makefile
 new file mode 100644
 index ..908c09a98161
 --- /dev/null
 +++ b/drivers/hwtracing/ptt/Makefile
 @@ -0,0 +1,2 @@
 +# SPDX-License-Identifier: GPL-2.0
 +obj-$(CONFIG_HISI_PTT) += hisi_ptt.o
 diff --git a/drivers/hwtracing/ptt/hisi_ptt.c 
 b/drivers/hwtracing/ptt/hisi_ptt.c
 new file mode 100644
 index ..242b41870380
 --- /dev/null
 +++ b/drivers/hwtracing/ptt/hisi_ptt.c
 @@ -0,0 +1,874 @@
 +// SPDX-License-Identifier: GPL-2.0
 +/*
 + * Driver for HiSilicon PCIe tune and trace device
 + *
 + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd.
 + * Author: Yicong Yang 
 + */
 +
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +
 +#include "hisi_ptt.h"
 +
 +static u16 hisi_ptt_get_filter_val(struct pci_dev *pdev)
 +{
 +  if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
 +  return BIT(HISI_PCIE_CORE_PORT_ID(PCI_SLOT(pdev->devfn)));
 +
 +  return PCI_DEVID(pdev->bus->number, pdev->devfn);
 +}
 +
 +static bool hisi_ptt_wait_trace_hw_idle(struct hisi_ptt *hisi_ptt)
 +{
 +  u32 val;
 +
 +  return !readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_STS,
 +

Re: [PATCH v7 1/7] iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity

2022-05-10 Thread Yicong Yang via iommu
On 2022/5/10 19:23, Will Deacon wrote:
> On Thu, Apr 07, 2022 at 08:58:35PM +0800, Yicong Yang wrote:
>> The DMA operations of HiSilicon PTT device can only work properly with
>> identical mappings. So add a quirk for the device to force the domain
>> as passthrough.
>>
>> Signed-off-by: Yicong Yang 
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
>>  1 file changed, 16 insertions(+)
> 
> I still don't like this being part of the SMMU driver, but given that
> (a) Robin doesn't seem to agree with the objection and (b) you've been
> refreshing the patch series:
> 
> Acked-by: Will Deacon 
> 
> If you do respin, then:
> 
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 627a3ed5ee8f..5ec15ae2a9b1 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2839,6 +2839,21 @@ static int arm_smmu_dev_disable_feature(struct device 
>> *dev,
>>  }
>>  }
> 
> It might be worth adding a brief comment here to explain what this device is
> and why it needs an identity mapping.
> 
>> +#define IS_HISI_PTT_DEVICE(pdev)((pdev)->vendor == PCI_VENDOR_ID_HUAWEI 
>> && \
>> + (pdev)->device == 0xa12e)
> 

Will add a brief comment here in next version.

Thanks,
Yicong
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 29/29] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> The HPET-based hardlockup detector relies on the TSC to determine if an
> observed NMI interrupt was originated by HPET timer. Hence, this detector
> can no longer be used with an unstable TSC.
> 
> In such case, permanently stop the HPET-based hardlockup detector and
> start the perf-based detector.
> 
> Cc: Andi Kleen 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: iommu@lists.linux-foundation.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: x...@kernel.org
> Suggested-by: Thomas Gleixner 
> Reviewed-by: Tony Luck 
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Relocated the delcaration of hardlockup_detector_switch_to_perf() to
>x86/nmi.h It does not depend on HPET.
>  * Removed function stub. The shim hardlockup detector is always for x86.
> 
> Changes since v4:
>  * Added a stub version of hardlockup_detector_switch_to_perf() for
>!CONFIG_HPET_TIMER. (lkp)
>  * Reconfigure the whole lockup detector instead of unconditionally
>starting the perf-based hardlockup detector.
> 
> Changes since v3:
>  * None
> 
> Changes since v2:
>  * Introduced this patch.
> 
> Changes since v1:
>  * N/A
> ---
>  arch/x86/include/asm/nmi.h | 6 ++
>  arch/x86/kernel/tsc.c  | 2 ++
>  arch/x86/kernel/watchdog_hld.c | 6 ++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
> index 4a0d5b562c91..47752ff67d8b 100644
> --- a/arch/x86/include/asm/nmi.h
> +++ b/arch/x86/include/asm/nmi.h
> @@ -63,4 +63,10 @@ void stop_nmi(void);
>  void restart_nmi(void);
>  void local_touch_nmi(void);
>  
> +#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR
> +void hardlockup_detector_switch_to_perf(void);
> +#else
> +static inline void hardlockup_detector_switch_to_perf(void) { }
> +#endif
> +
>  #endif /* _ASM_X86_NMI_H */
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index cc1843044d88..74772ffc79d1 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1176,6 +1176,8 @@ void mark_tsc_unstable(char *reason)
>  
>   clocksource_mark_unstable(_tsc_early);
>   clocksource_mark_unstable(_tsc);
> +
> + hardlockup_detector_switch_to_perf();
>  }
>  
>  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
> diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c
> index ef11f0af4ef5..7940977c6312 100644
> --- a/arch/x86/kernel/watchdog_hld.c
> +++ b/arch/x86/kernel/watchdog_hld.c
> @@ -83,3 +83,9 @@ void watchdog_nmi_start(void)
>   if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET)
>   hardlockup_detector_hpet_start();
>  }
> +
> +void hardlockup_detector_switch_to_perf(void)
> +{
> + detector_type = X86_HARDLOCKUP_DETECTOR_PERF;

Another possible problem along the same lines here,
isn't your watchdog still running at this point? And
it uses detector_type in the switch.

> + lockup_detector_reconfigure();

Actually the detector_type switch is used in some
functions called by lockup_detector_reconfigure()
e.g., watchdog_nmi_stop, so this seems buggy even
without concurrent watchdog.

Is this switching a good idea in general? The admin
has asked for non-standard option because they want
more PMU counterss available and now it eats a
counter potentially causing a problem rather than
detecting one.

I would rather just disable with a warning if it were
up to me. If you *really* wanted to be fancy then
allow admin to re-enable via proc maybe.

Thanks,
Nick

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

2022-05-10 Thread Joao Martins
On 5/10/22 02:38, Tian, Kevin wrote:
>> From: Jason Gunthorpe 
>> Sent: Friday, May 6, 2022 7:46 PM
>>
>> On Fri, May 06, 2022 at 03:51:40AM +, Tian, Kevin wrote:
 From: Jason Gunthorpe 
 Sent: Thursday, May 5, 2022 10:08 PM

 On Thu, May 05, 2022 at 07:40:37AM +, Tian, Kevin wrote:

> In concept this is an iommu property instead of a domain property.

 Not really, domains shouldn't be changing behaviors once they are
 created. If a domain supports dirty tracking and I attach a new device
 then it still must support dirty tracking.
>>>
>>> That sort of suggests that userspace should specify whether a domain
>>> supports dirty tracking when it's created. But how does userspace
>>> know that it should create the domain in this way in the first place?
>>> live migration is triggered on demand and it may not happen in the
>>> lifetime of a VM.
>>
>> The best you could do is to look at the devices being plugged in at VM
>> startup, and if they all support live migration then request dirty
>> tracking, otherwise don't.
> 
> Yes, this is how a device capability can help.
> 
>>
>> However, tt costs nothing to have dirty tracking as long as all iommus
>> support it in the system - which seems to be the normal case today.
>>
>> We should just always turn it on at this point.
> 
> Then still need a way to report " all iommus support it in the system"
> to userspace since many old systems don't support it at all. If we all
> agree that a device capability flag would be helpful on this front (like
> you also said below), probably can start building the initial skeleton
> with that in mind?
> 

This would capture device-specific and maybe iommu-instance features, but
there's some tiny bit odd semantic here. There's nothing that
depends on the device to support any of this, but rather the IOMMU instance 
that sits
below the device which is independent of device-own capabilities e.g. PRI on 
the other
hand would be a perfect fit for a device capability (?), but dirty tracking
conveying over a device capability would be a convenience rather than an exact
hw representation.

Thinking out loud if we are going as a device/iommu capability [to see if this 
matches
what people have or not in mind]: we would add dirty-tracking feature bit via 
the existent
kAPI for iommu device features (e.g. IOMMU_DEV_FEAT_AD) and on iommufd we would 
maybe add
an IOMMUFD_CMD_DEV_GET_IOMMU_FEATURES ioctl which would have an u64 dev_id as 
input (from
the returned vfio-pci BIND_IOMMUFD @out_dev_id) and u64 features as an output 
bitmap of
synthetic feature bits, having IOMMUFD_FEATURE_AD the only one we query (and
IOMMUFD_FEATURE_{SVA,IOPF} as potentially future candidates). Qemu would then 
at start of
day would check if /all devices/ support it and it would then still do the 
blind set
tracking, but bail out preemptively if any of device-iommu don't support 
dirty-tracking. I
don't think we have any case today for having to deal with different IOMMU 
instances that
have different features.

Either that or as discussed in the beginning perhaps add an iommufd (or iommufd 
hwpt one)
ioctl  call (e.g.IOMMUFD_CMD_CAP) via a input value (e.g. subop IOMMU_FEATURES) 
which
would gives us a structure of things (e.g. for the IOMMU_FEATURES subop the 
common
featureset bitmap in all iommu instances). This would give the 'all iommus 
support it in
the system'. Albeit the device one might have more concrete longevity if 
there's further
plans aside from dirty tracking.

>>
>>> and if the user always creates domain to allow dirty tracking by default,
>>> how does it know a failed attach is due to missing dirty tracking support
>>> by the IOMMU and then creates another domain which disables dirty
>>> tracking and retry-attach again?
>>
>> The automatic logic is complicated for sure, if you had a device flag
>> it would have to figure it out that way
>>
> 
> Yes. That is the model in my mind.
> 
> Thanks
> Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-10 Thread Thomas Gleixner
On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
>> +/*
>> + * If in use, the HPET hardlockup detector relies on tsc_khz.
>> + * Reconfigure it to make use of the refined tsc_khz.
>> + */
>> +lockup_detector_reconfigure();
>
> I don't know if the API is conceptually good.
>
> You change something that the lockup detector is currently using, 
> *while* the detector is running asynchronously, and then reconfigure
> it. What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter but in principle it's
> a bad API to export.
>
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped [actually that
> looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].
>
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.
>
> So you want to disable HPET watchdog if it was enabled, then update
> wherever you're using tsc_khz, then re-enable.

The real question is whether making this refined tsc_khz value
immediately effective matters at all. IMO, it does not because up to
that point the watchdog was happily using the coarse calibrated value
and the whole use TSC to assess whether the HPET fired mechanism is just
a guestimate anyway. So what's the point of trying to guess 'more
correct'.

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: Force identity domains for legacy binding

2022-05-10 Thread Will Deacon
On Tue, 10 May 2022 09:38:58 +0100, Robin Murphy wrote:
> When using the legacy "mmu-masters" DT binding, we reject DMA domains
> since we have no guarantee of driver probe order and thus can't rely on
> client drivers getting the correct DMA ops. However, we can do better
> than fall back to the old no-default-domain behaviour now, by forcing an
> identity default domain instead. This also means that detaching from a
> VFIO domain can actually work - that looks to have been broken for over
> 6 years, so clearly isn't something that legacy binding users care
> about, but we may as well make the driver code make sense anyway.
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu: Force identity domains for legacy binding
  https://git.kernel.org/will/c/628bf55b6204

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 1/7] iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity

2022-05-10 Thread Will Deacon
On Thu, Apr 07, 2022 at 08:58:35PM +0800, Yicong Yang wrote:
> The DMA operations of HiSilicon PTT device can only work properly with
> identical mappings. So add a quirk for the device to force the domain
> as passthrough.
> 
> Signed-off-by: Yicong Yang 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
>  1 file changed, 16 insertions(+)

I still don't like this being part of the SMMU driver, but given that
(a) Robin doesn't seem to agree with the objection and (b) you've been
refreshing the patch series:

Acked-by: Will Deacon 

If you do respin, then:

> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 627a3ed5ee8f..5ec15ae2a9b1 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2839,6 +2839,21 @@ static int arm_smmu_dev_disable_feature(struct device 
> *dev,
>   }
>  }

It might be worth adding a brief comment here to explain what this device is
and why it needs an identity mapping.

> +#define IS_HISI_PTT_DEVICE(pdev) ((pdev)->vendor == PCI_VENDOR_ID_HUAWEI 
> && \
> +  (pdev)->device == 0xa12e)

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 2/7] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-10 Thread Yicong Yang via iommu
On 2022/5/10 17:46, James Clark wrote:
> 
> 
> On 07/04/2022 13:58, Yicong Yang wrote:
>> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
>> Endpoint(RCiEP) device, providing the capability to dynamically monitor and
>> tune the PCIe traffic, and trace the TLP headers.
>>
>> Add the driver for the device to enable the trace function. Register PMU
>> device of PTT trace, then users can use trace through perf command. The
>> driver makes use of perf AUX trace and support following events to
>> configure the trace:
>>
>> - filter: select Root port or Endpoint to trace
>> - type: select the type of traced TLP headers
>> - direction: select the direction of traced TLP headers
>> - format: select the data format of the traced TLP headers
>>
>> This patch adds the driver part of PTT trace. The perf command support of
>> PTT trace is added in the following patch.
>>
>> Signed-off-by: Yicong Yang 
>> Reviewed-by: Jonathan Cameron 
>> ---
>>  drivers/Makefile |   1 +
>>  drivers/hwtracing/Kconfig|   2 +
>>  drivers/hwtracing/ptt/Kconfig|  12 +
>>  drivers/hwtracing/ptt/Makefile   |   2 +
>>  drivers/hwtracing/ptt/hisi_ptt.c | 874 +++
>>  drivers/hwtracing/ptt/hisi_ptt.h | 166 ++
>>  6 files changed, 1057 insertions(+)
>>  create mode 100644 drivers/hwtracing/ptt/Kconfig
>>  create mode 100644 drivers/hwtracing/ptt/Makefile
>>  create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c
>>  create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h
>>
>> diff --git a/drivers/Makefile b/drivers/Makefile
>> index 020780b6b4d2..662d50599467 100644
>> --- a/drivers/Makefile
>> +++ b/drivers/Makefile
>> @@ -175,6 +175,7 @@ obj-$(CONFIG_USB4)   += thunderbolt/
>>  obj-$(CONFIG_CORESIGHT) += hwtracing/coresight/
>>  obj-y   += hwtracing/intel_th/
>>  obj-$(CONFIG_STM)   += hwtracing/stm/
>> +obj-$(CONFIG_HISI_PTT)  += hwtracing/ptt/
>>  obj-$(CONFIG_ANDROID)   += android/
>>  obj-$(CONFIG_NVMEM) += nvmem/
>>  obj-$(CONFIG_FPGA)  += fpga/
>> diff --git a/drivers/hwtracing/Kconfig b/drivers/hwtracing/Kconfig
>> index 13085835a636..911ee977103c 100644
>> --- a/drivers/hwtracing/Kconfig
>> +++ b/drivers/hwtracing/Kconfig
>> @@ -5,4 +5,6 @@ source "drivers/hwtracing/stm/Kconfig"
>>  
>>  source "drivers/hwtracing/intel_th/Kconfig"
>>  
>> +source "drivers/hwtracing/ptt/Kconfig"
>> +
>>  endmenu
>> diff --git a/drivers/hwtracing/ptt/Kconfig b/drivers/hwtracing/ptt/Kconfig
>> new file mode 100644
>> index ..8902a6f27563
>> --- /dev/null
>> +++ b/drivers/hwtracing/ptt/Kconfig
>> @@ -0,0 +1,12 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config HISI_PTT
>> +tristate "HiSilicon PCIe Tune and Trace Device"
>> +depends on ARM64 || (COMPILE_TEST && 64BIT)
>> +depends on PCI && HAS_DMA && HAS_IOMEM && PERF_EVENTS
>> +help
>> +  HiSilicon PCIe Tune and Trace Device exists as a PCIe RCiEP
>> +  device, and it provides support for PCIe traffic tuning and
>> +  tracing TLP headers to the memory.
>> +
>> +  This driver can also be built as a module. If so, the module
>> +  will be called hisi_ptt.
>> diff --git a/drivers/hwtracing/ptt/Makefile b/drivers/hwtracing/ptt/Makefile
>> new file mode 100644
>> index ..908c09a98161
>> --- /dev/null
>> +++ b/drivers/hwtracing/ptt/Makefile
>> @@ -0,0 +1,2 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +obj-$(CONFIG_HISI_PTT) += hisi_ptt.o
>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c 
>> b/drivers/hwtracing/ptt/hisi_ptt.c
>> new file mode 100644
>> index ..242b41870380
>> --- /dev/null
>> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
>> @@ -0,0 +1,874 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Driver for HiSilicon PCIe tune and trace device
>> + *
>> + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd.
>> + * Author: Yicong Yang 
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "hisi_ptt.h"
>> +
>> +static u16 hisi_ptt_get_filter_val(struct pci_dev *pdev)
>> +{
>> +if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
>> +return BIT(HISI_PCIE_CORE_PORT_ID(PCI_SLOT(pdev->devfn)));
>> +
>> +return PCI_DEVID(pdev->bus->number, pdev->devfn);
>> +}
>> +
>> +static bool hisi_ptt_wait_trace_hw_idle(struct hisi_ptt *hisi_ptt)
>> +{
>> +u32 val;
>> +
>> +return !readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_STS,
>> +  val, val & HISI_PTT_TRACE_IDLE,
>> +  HISI_PTT_WAIT_POLL_INTERVAL_US,
>> +  HISI_PTT_WAIT_TRACE_TIMEOUT_US);
>> +}
>> +
>> +static bool hisi_ptt_wait_dma_reset_done(struct hisi_ptt *hisi_ptt)
>> +{
>> +u32 val;
>> +
>> +return 

Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> The HPET hardlockup detector relies on tsc_khz to estimate the value of
> that the TSC will have when its HPET channel fires. A refined tsc_khz
> helps to estimate better the expected TSC value.
> 
> Using the early value of tsc_khz may lead to a large error in the expected
> TSC value. Restarting the NMI watchdog detector has the effect of kicking
> its HPET channel and make use of the refined tsc_khz.
> 
> When the HPET hardlockup is not in use, restarting the NMI watchdog is
> a noop.
> 
> Cc: Andi Kleen 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: iommu@lists.linux-foundation.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: x...@kernel.org
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Introduced this patch
> 
> Changes since v4
>  * N/A
> 
> Changes since v3
>  * N/A
> 
> Changes since v2:
>  * N/A
> 
> Changes since v1:
>  * N/A
> ---
>  arch/x86/kernel/tsc.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index cafacb2e58cc..cc1843044d88 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>   /* Inform the TSC deadline clockevent devices about the recalibration */
>   lapic_update_tsc_freq();
>  
> + /*
> +  * If in use, the HPET hardlockup detector relies on tsc_khz.
> +  * Reconfigure it to make use of the refined tsc_khz.
> +  */
> + lockup_detector_reconfigure();

I don't know if the API is conceptually good.

You change something that the lockup detector is currently using, 
*while* the detector is running asynchronously, and then reconfigure
it. What happens in the window? If this code is only used for small
adjustments maybe it does not really matter but in principle it's
a bad API to export.

lockup_detector_reconfigure as an internal API is okay because it
reconfigures things while the watchdog is stopped [actually that
looks untrue for soft dog which uses watchdog_thresh in
is_softlockup(), but that should be fixed].

You're the arch so you're allowed to stop the watchdog and configure
it, e.g., hardlockup_detector_perf_stop() is called in arch/.

So you want to disable HPET watchdog if it was enabled, then update
wherever you're using tsc_khz, then re-enable.

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 24/29] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> Prepare hardlockup_panic_setup() to handle a comma-separated list of
> options. Thus, it can continue parsing its own command-line options while
> ignoring parameters that are relevant only to specific implementations of
> the hardlockup detector. Such implementations may use an early_param to
> parse their own options.

It can't really handle comma separated list though, until the next
patch. nmi_watchdog=panic,0 does not make sense, so you lost error
handling of that.

And is it kosher to double handle options like this? I'm sure it
happens but it's ugly.

Would you consider just add a new option for x86 and avoid changing
this? Less code and patches.

Thanks,
Nick

> 
> Cc: Andi Kleen 
> Cc: Nicholas Piggin 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: iommu@lists.linux-foundation.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: x...@kernel.org
> Reviewed-by: Tony Luck 
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Corrected typo in commit message. (Tony)
> 
> Changes since v4:
>  * None
> 
> Changes since v3:
>  * None
> 
> Changes since v2:
>  * Introduced this patch.
> 
> Changes since v1:
>  * None
> ---
>  kernel/watchdog.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 9166220457bc..6443841a755f 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -73,13 +73,13 @@ void __init hardlockup_detector_disable(void)
>  
>  static int __init hardlockup_panic_setup(char *str)
>  {
> - if (!strncmp(str, "panic", 5))
> + if (parse_option_str(str, "panic"))
>   hardlockup_panic = 1;
> - else if (!strncmp(str, "nopanic", 7))
> + else if (parse_option_str(str, "nopanic"))
>   hardlockup_panic = 0;
> - else if (!strncmp(str, "0", 1))
> + else if (parse_option_str(str, "0"))
>   nmi_watchdog_user_enabled = 0;
> - else if (!strncmp(str, "1", 1))
> + else if (parse_option_str(str, "1"))
>   nmi_watchdog_user_enabled = 1;
>   return 1;
>  }
> -- 
> 2.17.1
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 9:59 am:
> Certain implementations of the hardlockup detector require support for
> Inter-Processor Interrupt shorthands. On x86, support for these can only
> be determined after all the possible CPUs have booted once (in
> smp_init()). Other architectures may not need such check.
> 
> lockup_detector_init() only performs the initializations of data
> structures of the lockup detector. Hence, there are no dependencies on
> smp_init().

I think this is the only real thing which affects other watchdog types?

Not sure if it's a big problem, the secondary CPUs coming up won't
have their watchdog active until quite late, and the primary could
implement its own timeout in __cpu_up for secondary coming up, and
IPI it to get traces if necessary which is probably more robust.

Acked-by: Nicholas Piggin 

> 
> Cc: Andi Kleen 
> Cc: Nicholas Piggin 
> Cc: Andrew Morton 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: iommu@lists.linux-foundation.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: x...@kernel.org
> Reviewed-by: Tony Luck 
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Introduced this patch
> 
> Changes since v4:
>  * N/A
> 
> Changes since v3:
>  * N/A
> 
> Changes since v2:
>  * N/A
> 
> Changes since v1:
>  * N/A
> ---
>  init/main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/init/main.c b/init/main.c
> index 98182c3c2c4b..62c52c9e4c2b 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1600,9 +1600,11 @@ static noinline void __init kernel_init_freeable(void)
>  
>   rcu_init_tasks_generic();
>   do_pre_smp_initcalls();
> - lockup_detector_init();
>  
>   smp_init();
> +
> + lockup_detector_init();
> +
>   sched_init_smp();
>  
>   padata_init();
> -- 
> 2.17.1
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: Force identity domains for legacy binding

2022-05-10 Thread Robin Murphy
When using the legacy "mmu-masters" DT binding, we reject DMA domains
since we have no guarantee of driver probe order and thus can't rely on
client drivers getting the correct DMA ops. However, we can do better
than fall back to the old no-default-domain behaviour now, by forcing an
identity default domain instead. This also means that detaching from a
VFIO domain can actually work - that looks to have been broken for over
6 years, so clearly isn't something that legacy binding users care
about, but we may as well make the driver code make sense anyway.

Suggested-by: Jason Gunthorpe 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 568cce590ccc..e7ec2b74a28d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1574,6 +1574,9 @@ static int arm_smmu_def_domain_type(struct device *dev)
struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
const struct arm_smmu_impl *impl = cfg->smmu->impl;
 
+   if (using_legacy_binding)
+   return IOMMU_DOMAIN_IDENTITY;
+
if (impl && impl->def_domain_type)
return impl->def_domain_type(dev);
 
-- 
2.35.3.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 0/9] ACPI/IORT: Support for IORT RMR node

2022-05-10 Thread Robin Murphy

On 2022-05-10 08:23, Shameerali Kolothum Thodi wrote:

Hi Joerg/Robin,

I think this series is now ready to be merged. Could you please let
me know if there is anything missing.


Fine by me - these patches have had enough review and testing now that 
even if anything else did come up, I think it would be better done as 
follow-up work on the merged code.


Cheers,
Robin.



Thanks,
Shameer


-Original Message-
From: Guohanjun (Hanjun Guo)
Sent: 05 May 2022 02:24
To: Shameerali Kolothum Thodi ;
linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
iommu@lists.linux-foundation.org
Cc: Linuxarm ; lorenzo.pieral...@arm.com;
j...@8bytes.org; robin.mur...@arm.com; w...@kernel.org; wanghuiqiang
; steven.pr...@arm.com;
sami.muja...@arm.com; j...@solid-run.com; eric.au...@redhat.com;
laurentiu.tu...@nxp.com; h...@infradead.org
Subject: Re: [PATCH v12 0/9] ACPI/IORT: Support for IORT RMR node

On 2022/5/4 0:33, Shameer Kolothum wrote:

Hi

v11 --> v12
-Minor fix in patch #4 to address the issue reported by the kernel test

robot.

-Added R-by tags by Christoph(patch #1) and Lorenzo(patch #4).
-Added T-by from Steve to all relevant patches. Many thanks!.

Please note, this series has a dependency on the ACPICA header patch
here[1].


Tested on a Kunpeng920 server machine with SMMUv3, the 3408iMR RAID
controller card works as expected,

Tested-by: Hanjun Guo 

Thanks
Hanjun

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v12 0/9] ACPI/IORT: Support for IORT RMR node

2022-05-10 Thread Shameerali Kolothum Thodi via iommu
Hi Joerg/Robin,

I think this series is now ready to be merged. Could you please let
me know if there is anything missing.

Thanks,
Shameer

> -Original Message-
> From: Guohanjun (Hanjun Guo)
> Sent: 05 May 2022 02:24
> To: Shameerali Kolothum Thodi ;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org
> Cc: Linuxarm ; lorenzo.pieral...@arm.com;
> j...@8bytes.org; robin.mur...@arm.com; w...@kernel.org; wanghuiqiang
> ; steven.pr...@arm.com;
> sami.muja...@arm.com; j...@solid-run.com; eric.au...@redhat.com;
> laurentiu.tu...@nxp.com; h...@infradead.org
> Subject: Re: [PATCH v12 0/9] ACPI/IORT: Support for IORT RMR node
> 
> On 2022/5/4 0:33, Shameer Kolothum wrote:
> > Hi
> >
> > v11 --> v12
> >-Minor fix in patch #4 to address the issue reported by the kernel test
> robot.
> >-Added R-by tags by Christoph(patch #1) and Lorenzo(patch #4).
> >-Added T-by from Steve to all relevant patches. Many thanks!.
> >
> > Please note, this series has a dependency on the ACPICA header patch
> > here[1].
> 
> Tested on a Kunpeng920 server machine with SMMUv3, the 3408iMR RAID
> controller card works as expected,
> 
> Tested-by: Hanjun Guo 
> 
> Thanks
> Hanjun
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-10 Thread David Gibson
On Mon, May 09, 2022 at 11:00:41AM -0300, Jason Gunthorpe wrote:
> On Mon, May 09, 2022 at 04:01:52PM +1000, David Gibson wrote:
> 
> > > The default iommu_domain that the iommu driver creates will be used
> > > here, it is up to the iommu driver to choose something reasonable for
> > > use by applications like DPDK. ie PPC should probably pick its biggest
> > > x86-like aperture.
> > 
> > So, using the big aperture means a very high base IOVA
> > (1<<59)... which means that it won't work at all if you want to attach
> > any devices that aren't capable of 64-bit DMA.
> 
> I'd expect to include the 32 bit window too..

I'm not entirely sure what you mean.  Are you working on the
assumption that we've extended to allowing multiple apertures, so we'd
default to advertising both a small/low aperture and a large/high
aperture?

> > Using the maximum possible window size would mean we either
> > potentially waste a lot of kernel memory on pagetables, or we use
> > unnecessarily large number of levels to the pagetable.
> 
> All drivers have this issue to one degree or another. We seem to be
> ignoring it - in any case this is a micro optimization, not a
> functional need?

Ok, fair point.

> > More generally, the problem with the interface advertising limitations
> > and it being up to userspace to work out if those are ok or not is
> > that it's fragile.  It's pretty plausible that some future IOMMU model
> > will have some new kind of limitation that can't be expressed in the
> > query structure we invented now.
> 
> The basic API is very simple - the driver needs to provide ranges of
> IOVA and map/unmap - I don't think we have a future problem here we
> need to try and guess and solve today.

Well.. maybe.  My experience of encountering hardware doing weird-arse
stuff makes me less sanguine.

> Even PPC fits this just fine, the open question for DPDK is more
> around optimization, not functional.
> 
> > But if userspace requests the capabilities it wants, and the kernel
> > acks or nacks that, we can support the new host IOMMU with existing
> > software just fine.
> 
> No, this just makes it fragile in the other direction because now
> userspace has to know what platform specific things to ask for *or it
> doesn't work at all*. This is not a improvement for the DPDK cases.

Um.. no.  The idea is that userspace requests *what it needs*, not
anything platform specific.  In the case of DPDK that would be nothing
more than the (minimum) aperture size.  Nothing platform specific
about that.

> Kernel decides, using all the kernel knowledge it has and tells the
> application what it can do - this is the basic simplified interface.
> 
> > > The iommu-driver-specific struct is the "advanced" interface and
> > > allows a user-space IOMMU driver to tightly control the HW with full
> > > HW specific knowledge. This is where all the weird stuff that is not
> > > general should go.
> > 
> > Right, but forcing anything more complicated than "give me some IOVA
> > region" to go through the advanced interface means that qemu (or any
> > hypervisor where the guest platform need not identically match the
> > host) has to have n^2 complexity to match each guest IOMMU model to
> > each host IOMMU model.
> 
> I wouldn't say n^2, but yes, qemu needs to have a userspace driver for
> the platform IOMMU, and yes it needs this to reach optimal
> behavior. We already know this is a hard requirement for using nesting
> as acceleration, I don't see why apertures are so different.

For one thing, because we only care about optimal behaviour on the
host ~= guest KVM case.  That means it's not n^2, just (roughly) one
host driver for each matching guest driver.  I'm considering the
general X on Y case - we don't need to optimize it, but it would be
nice for it to work without considering every combination separately.

> > Errr.. how do you figure?  On ppc the ranges and pagesizes are
> > definitely negotiable.  I'm not really familiar with other models, but
> > anything which allows *any* variations in the pagetable structure will
> > effectively have at least some negotiable properties.
> 
> As above, if you ask for the wrong thing then you don't get
> anything. If DPDK asks for something that works on ARM like 0 -> 4G
> then PPC and x86 will always fail. How is this improving anything to
> require applications to carefully ask for exactly the right platform
> specific ranges?

Hm, looks like I didn't sufficiently emphasize that the base address
would be optional for userspace to supply.  So userspace would request
a range *size* only, unless it needs a specific IOVA base address.  It
only requests the latter if it actually needs it - so failing in that
case is correct.  (Qemu, with or without an vIOMMU is the obvious case
for that, though I could also imagine it for a specialized driver for
some broken device which has weird limitations on what IOVA addresses
it can generate on the bus).

> It isn't like there is some hard coded value we 

[PATCH v6 12/12] iommu: Rename iommu-sva-lib.{c,h}

2022-05-10 Thread Lu Baolu
Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
for SVA implementation in iommu core.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
---
 drivers/iommu/{iommu-sva-lib.h => iommu-sva.h}  | 0
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
 drivers/iommu/intel/iommu.c | 2 +-
 drivers/iommu/intel/svm.c   | 2 +-
 drivers/iommu/io-pgfault.c  | 2 +-
 drivers/iommu/{iommu-sva-lib.c => iommu-sva.c}  | 2 +-
 drivers/iommu/Makefile  | 2 +-
 8 files changed, 7 insertions(+), 7 deletions(-)
 rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (100%)
 rename drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} (99%)

diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva.h
similarity index 100%
rename from drivers/iommu/iommu-sva-lib.h
rename to drivers/iommu/iommu-sva.h
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 56644e553c42..265b125d7dc4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -9,7 +9,7 @@
 #include 
 
 #include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"
 #include "../../io-pgtable-arm.h"
 
 struct arm_smmu_mmu_notifier {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6a10fa181827..de3b6fbf8766 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -31,7 +31,7 @@
 #include 
 
 #include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"
 
 static bool disable_bypass = true;
 module_param(disable_bypass, bool, 0444);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a5728f743c6d..1c2c92b657c7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -27,7 +27,7 @@
 #include 
 
 #include "../irq_remapping.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
 #include "pasid.h"
 #include "cap_audit.h"
 
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ca83ebc708a8..44331db060e4 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -25,7 +25,7 @@
 
 #include "pasid.h"
 #include "perf.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
 static void intel_svm_drain_prq(struct device *dev, u32 pasid);
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 9efe5259402b..2a8604013b7e 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
 
 /**
  * struct iopf_queue - IO Page Fault queue
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva.c
similarity index 99%
rename from drivers/iommu/iommu-sva-lib.c
rename to drivers/iommu/iommu-sva.c
index ea12504a9e12..1791ac1e3d34 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,7 +7,7 @@
 #include 
 #include 
 
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
 
 static DEFINE_MUTEX(iommu_sva_lock);
 static DECLARE_IOASID_SET(iommu_sva_pasid);
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 44475a9b3eea..c1763476162b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,6 +27,6 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
 obj-$(CONFIG_APPLE_DART) += apple-dart.o
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 11/12] iommu: Per-domain I/O page fault handling

2022-05-10 Thread Lu Baolu
Tweak the I/O page fault handling framework to route the page faults to
the domain and call the page fault handler retrieved from the domain.
This makes the I/O page fault handling framework possible to serve more
usage scenarios as long as they have an IOMMU domain and install a page
fault handler in it. Some unused functions are also removed to avoid
dead code.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu-sva-lib.h |  2 --
 drivers/iommu/io-pgfault.c| 64 ---
 drivers/iommu/iommu-sva-lib.c | 20 ---
 3 files changed, 7 insertions(+), 79 deletions(-)

diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 3420654c6e2f..74ce2e76321b 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -8,8 +8,6 @@
 #include 
 #include 
 
-struct mm_struct *iommu_sva_find(ioasid_t pasid);
-
 /* I/O Page fault */
 struct device;
 struct iommu_fault;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index aee9e033012f..9efe5259402b 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev, struct 
iopf_fault *iopf,
return iommu_page_response(dev, );
 }
 
-static enum iommu_page_response_code
-iopf_handle_single(struct iopf_fault *iopf)
-{
-   vm_fault_t ret;
-   struct mm_struct *mm;
-   struct vm_area_struct *vma;
-   unsigned int access_flags = 0;
-   unsigned int fault_flags = FAULT_FLAG_REMOTE;
-   struct iommu_fault_page_request *prm = >fault.prm;
-   enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
-
-   if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
-   return status;
-
-   mm = iommu_sva_find(prm->pasid);
-   if (IS_ERR_OR_NULL(mm))
-   return status;
-
-   mmap_read_lock(mm);
-
-   vma = find_extend_vma(mm, prm->addr);
-   if (!vma)
-   /* Unmapped area */
-   goto out_put_mm;
-
-   if (prm->perm & IOMMU_FAULT_PERM_READ)
-   access_flags |= VM_READ;
-
-   if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
-   access_flags |= VM_WRITE;
-   fault_flags |= FAULT_FLAG_WRITE;
-   }
-
-   if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
-   access_flags |= VM_EXEC;
-   fault_flags |= FAULT_FLAG_INSTRUCTION;
-   }
-
-   if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
-   fault_flags |= FAULT_FLAG_USER;
-
-   if (access_flags & ~vma->vm_flags)
-   /* Access fault */
-   goto out_put_mm;
-
-   ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
-   status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
-   IOMMU_PAGE_RESP_SUCCESS;
-
-out_put_mm:
-   mmap_read_unlock(mm);
-   mmput(mm);
-
-   return status;
-}
-
 static void iopf_handle_group(struct work_struct *work)
 {
struct iopf_group *group;
+   struct iommu_domain *domain;
struct iopf_fault *iopf, *next;
enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
 
group = container_of(work, struct iopf_group, work);
+   domain = iommu_get_domain_for_iopf(group->dev,
+  group->last_fault.fault.prm.pasid);
+   if (!domain || !domain->iopf_handler)
+   status = IOMMU_PAGE_RESP_INVALID;
 
list_for_each_entry_safe(iopf, next, >faults, list) {
/*
@@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct *work)
 * faults in the group if there is an error.
 */
if (status == IOMMU_PAGE_RESP_SUCCESS)
-   status = iopf_handle_single(iopf);
+   status = domain->iopf_handler(>fault,
+ domain->fault_data);
 
if (!(iopf->fault.prm.flags &
  IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 32c836e4a60e..ea12504a9e12 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -52,26 +52,6 @@ static int iommu_sva_alloc_pasid(struct mm_struct *mm,
return ret;
 }
 
-/* ioasid_find getter() requires a void * argument */
-static bool __mmget_not_zero(void *mm)
-{
-   return mmget_not_zero(mm);
-}
-
-/**
- * iommu_sva_find() - Find mm associated to the given PASID
- * @pasid: Process Address Space ID assigned to the mm
- *
- * On success a reference to the mm is taken, and must be released with 
mmput().
- *
- * Returns the mm corresponding to this PASID, or an error if not found.
- */
-struct mm_struct *iommu_sva_find(ioasid_t pasid)
-{
-   return ioasid_find(_sva_pasid, pasid, __mmget_not_zero);
-}
-EXPORT_SYMBOL_GPL(iommu_sva_find);
-
 /*
  * I/O page fault handler for 

[PATCH v6 10/12] iommu: Prepare IOMMU domain for IOPF

2022-05-10 Thread Lu Baolu
This adds some mechanisms around the iommu_domain so that the I/O page
fault handling framework could route a page fault to the domain and
call the fault handler from it.

Add pointers to the page fault handler and its private data in struct
iommu_domain. The fault handler will be called with the private data
as a parameter once a page fault is routed to the domain. Any kernel
component which owns an iommu domain could install handler and its
private parameter so that the page fault could be further routed and
handled.

A new helper iommu_get_domain_for_iopf() which retrieves attached domain
for a {device, PASID} pair is added. It will be used by the page fault
handling framework which knows {device, PASID} reported from the iommu
driver. We have a guarantee that the SVA domain doesn't go away during
IOPF handling, because unbind() waits for pending faults with
iopf_queue_flush_dev() before freeing the domain. Hence, there's no need
to synchronize life cycle of the iommu domains between the unbind() and
the interrupt threads.

This also prepares the SVA implementation to be the first consumer of
the per-domain page fault handling model.

Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Lu Baolu 
---
 include/linux/iommu.h | 12 +++
 drivers/iommu/io-pgfault.c|  7 
 drivers/iommu/iommu-sva-lib.c | 65 +++
 drivers/iommu/iommu.c | 27 +++
 4 files changed, 111 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 392b8adc3495..9405034e3013 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -103,6 +103,9 @@ struct iommu_domain {
 #ifdef CONFIG_IOMMU_SVA
struct mm_struct *mm;
 #endif
+   enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
+ void *data);
+   void *fault_data;
 };
 
 static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
@@ -687,6 +690,9 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
  struct device *dev, ioasid_t pasid);
 void iommu_detach_device_pasid(struct iommu_domain *domain,
   struct device *dev, ioasid_t pasid);
+struct iommu_domain *
+iommu_get_domain_for_iopf(struct device *dev, ioasid_t pasid);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1056,6 +1062,12 @@ static inline void iommu_detach_device_pasid(struct 
iommu_domain *domain,
 struct device *dev, ioasid_t pasid)
 {
 }
+
+static inline struct iommu_domain *
+iommu_get_domain_for_iopf(struct device *dev, ioasid_t pasid)
+{
+   return NULL;
+}
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_SVA
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 1df8c1dcae77..aee9e033012f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -181,6 +181,13 @@ static void iopf_handle_group(struct work_struct *work)
  * request completes, outstanding faults will have been dealt with by the time
  * the PASID is freed.
  *
+ * Any valid page fault will be eventually routed to an iommu domain and the
+ * page fault handler installed there will get called. The users of this
+ * handling framework should guarantee that the iommu domain could only be
+ * freed after the device has stopped generating page faults (or the iommu
+ * hardware has been set to block the page faults) and the pending page faults
+ * have been flushed.
+ *
  * Return: 0 on success and <0 on error.
  */
 int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index ef6ed87d04ba..32c836e4a60e 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -72,6 +72,69 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_find);
 
+/*
+ * I/O page fault handler for SVA
+ *
+ * Copied from io-pgfault.c with mmget_not_zero() added before
+ * mmap_read_lock().
+ */
+static enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+{
+   vm_fault_t ret;
+   struct mm_struct *mm;
+   struct vm_area_struct *vma;
+   unsigned int access_flags = 0;
+   struct iommu_domain *domain = data;
+   unsigned int fault_flags = FAULT_FLAG_REMOTE;
+   struct iommu_fault_page_request *prm = >prm;
+   enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
+
+   if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
+   return status;
+
+   mm = domain->mm;
+   if (IS_ERR_OR_NULL(mm) || !mmget_not_zero(mm))
+   return status;
+
+   mmap_read_lock(mm);
+
+   vma = find_extend_vma(mm, prm->addr);
+   if (!vma)
+   /* Unmapped area */
+   goto out_put_mm;
+
+   if (prm->perm & IOMMU_FAULT_PERM_READ)
+   access_flags |= 

[PATCH v6 09/12] iommu: Remove SVA related callbacks from iommu ops

2022-05-10 Thread Lu Baolu
These ops'es have been replaced with the dev_attach/detach_pasid domain
ops'es. There's no need for them anymore. Remove them to avoid dead
code.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
---
 include/linux/intel-iommu.h   |  4 --
 include/linux/iommu.h |  8 ---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 17 ---
 drivers/iommu/iommu-sva-lib.h |  1 -
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 41 
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  3 --
 drivers/iommu/intel/iommu.c   |  3 --
 drivers/iommu/intel/svm.c | 49 ---
 drivers/iommu/iommu-sva-lib.c |  4 +-
 9 files changed, 2 insertions(+), 128 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2397c2007cda..4a5cc796f917 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -738,10 +738,6 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 
*bus, u8 *devfn);
 extern void intel_svm_check(struct intel_iommu *iommu);
 extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
-void *drvdata);
-void intel_svm_unbind(struct iommu_sva *handle);
-u32 intel_svm_get_pasid(struct iommu_sva *handle);
 int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
 struct iommu_domain *intel_svm_domain_alloc(void);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5a3ef4d58b1f..392b8adc3495 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -215,9 +215,6 @@ struct iommu_iotlb_gather {
  * @dev_has/enable/disable_feat: per device entries to check/enable/disable
  *   iommu specific features.
  * @dev_feat_enabled: check enabled feature
- * @sva_bind: Bind process address space to device
- * @sva_unbind: Unbind process address space from device
- * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
  * @def_domain_type: device default domain type, return value:
  * - IOMMU_DOMAIN_IDENTITY: must use an identity domain
@@ -251,11 +248,6 @@ struct iommu_ops {
int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
 
-   struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
- void *drvdata);
-   void (*sva_unbind)(struct iommu_sva *handle);
-   u32 (*sva_get_pasid)(struct iommu_sva *handle);
-
int (*page_response)(struct device *dev,
 struct iommu_fault_event *evt,
 struct iommu_page_response *msg);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index e077f21e2528..15dd4c7e6d3a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -754,10 +754,6 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master 
*master);
 int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
 int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
 bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
-   void *drvdata);
-void arm_smmu_sva_unbind(struct iommu_sva *handle);
-u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
 void arm_smmu_sva_notifier_synchronize(void);
 struct iommu_domain *arm_smmu_sva_domain_alloc(void);
 #else /* CONFIG_ARM_SMMU_V3_SVA */
@@ -791,19 +787,6 @@ static inline bool arm_smmu_master_iopf_supported(struct 
arm_smmu_master *master
return false;
 }
 
-static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
-{
-   return ERR_PTR(-ENODEV);
-}
-
-static inline void arm_smmu_sva_unbind(struct iommu_sva *handle) {}
-
-static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
-{
-   return IOMMU_PASID_INVALID;
-}
-
 static inline void arm_smmu_sva_notifier_synchronize(void) {}
 
 static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 8909ea1094e3..3420654c6e2f 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -8,7 +8,6 @@
 #include 
 #include 
 
-int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);
 struct mm_struct *iommu_sva_find(ioasid_t pasid);
 
 /* I/O Page fault */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 

[PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-10 Thread Lu Baolu
The existing iommu SVA interfaces are implemented by calling the SVA
specific iommu ops provided by the IOMMU drivers. There's no need for
any SVA specific ops in iommu_ops vector anymore as we can achieve
this through the generic attach/detach_dev_pasid domain ops.

This refactors the IOMMU SVA interfaces implementation by using the
attach/detach_pasid_dev ops and align them with the concept of the
iommu domain. Put the new SVA code in the sva related file in order
to make it self-contained.

Signed-off-by: Lu Baolu 
---
 include/linux/iommu.h |  44 ++-
 drivers/iommu/iommu-sva-lib.c | 145 ++
 drivers/iommu/iommu.c |  92 -
 3 files changed, 168 insertions(+), 113 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2921e634491e..5a3ef4d58b1f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -684,12 +684,6 @@ int iommu_dev_enable_feature(struct device *dev, enum 
iommu_dev_features f);
 int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
 bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
 
-struct iommu_sva *iommu_sva_bind_device(struct device *dev,
-   struct mm_struct *mm,
-   void *drvdata);
-void iommu_sva_unbind_device(struct iommu_sva *handle);
-u32 iommu_sva_get_pasid(struct iommu_sva *handle);
-
 int iommu_device_use_default_domain(struct device *dev);
 void iommu_device_unuse_default_domain(struct device *dev);
 
@@ -1031,21 +1025,6 @@ iommu_dev_disable_feature(struct device *dev, enum 
iommu_dev_features feat)
return -ENODEV;
 }
 
-static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
-{
-   return NULL;
-}
-
-static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
-{
-}
-
-static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
-{
-   return IOMMU_PASID_INVALID;
-}
-
 static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
return NULL;
@@ -1087,6 +1066,29 @@ static inline void iommu_detach_device_pasid(struct 
iommu_domain *domain,
 }
 #endif /* CONFIG_IOMMU_API */
 
+#ifdef CONFIG_IOMMU_SVA
+struct iommu_sva *iommu_sva_bind_device(struct device *dev,
+   struct mm_struct *mm,
+   void *drvdata);
+void iommu_sva_unbind_device(struct iommu_sva *handle);
+u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+#else /* CONFIG_IOMMU_SVA */
+static inline struct iommu_sva *
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+{
+   return NULL;
+}
+
+static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
+{
+}
+
+static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+{
+   return IOMMU_PASID_INVALID;
+}
+#endif /* CONFIG_IOMMU_SVA */
+
 /**
  * iommu_map_sgtable - Map the given buffer to the IOMMU domain
  * @domain:The IOMMU domain to perform the mapping
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 106506143896..e7301514f286 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -3,6 +3,8 @@
  * Helpers for IOMMU drivers implementing SVA
  */
 #include 
+#include 
+#include 
 #include 
 
 #include "iommu-sva-lib.h"
@@ -69,3 +71,146 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
return ioasid_find(_sva_pasid, pasid, __mmget_not_zero);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_find);
+
+/*
+ * IOMMU SVA driver-oriented interfaces
+ */
+static struct iommu_domain *
+iommu_sva_alloc_domain(struct device *dev, struct mm_struct *mm)
+{
+   struct bus_type *bus = dev->bus;
+   struct iommu_domain *domain;
+
+   if (!bus || !bus->iommu_ops)
+   return NULL;
+
+   domain = bus->iommu_ops->domain_alloc(IOMMU_DOMAIN_SVA);
+   if (!domain)
+   return NULL;
+
+   mmgrab(mm);
+   domain->mm = mm;
+   domain->type = IOMMU_DOMAIN_SVA;
+
+   return domain;
+}
+
+static void iommu_sva_free_domain(struct iommu_domain *domain)
+{
+   mmdrop(domain->mm);
+   iommu_domain_free(domain);
+}
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to mm_users
+ * @drvdata: opaque data pointer to pass to bind callback
+ *
+ * Create a bond between device and address space, allowing the device to 
access
+ * the mm using the returned PASID. If a bond already exists between @device 
and
+ * @mm, it is returned and an additional reference is taken. Caller must call
+ * iommu_sva_unbind_device() to release each reference.
+ *
+ * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
+ * initialize the required SVA features.
+ *
+ * On error, returns an ERR_PTR value.
+ */
+struct 

[PATCH v6 07/12] arm-smmu-v3/sva: Add SVA domain support

2022-05-10 Thread Lu Baolu
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  6 ++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 68 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  3 +
 3 files changed, 77 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..e077f21e2528 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -759,6 +759,7 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, 
struct mm_struct *mm,
 void arm_smmu_sva_unbind(struct iommu_sva *handle);
 u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
 void arm_smmu_sva_notifier_synchronize(void);
+struct iommu_domain *arm_smmu_sva_domain_alloc(void);
 #else /* CONFIG_ARM_SMMU_V3_SVA */
 static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
 {
@@ -804,5 +805,10 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva 
*handle)
 }
 
 static inline void arm_smmu_sva_notifier_synchronize(void) {}
+
+static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+   return NULL;
+}
 #endif /* CONFIG_ARM_SMMU_V3_SVA */
 #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index c623dae1e115..9d176e836d6b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -541,3 +541,71 @@ void arm_smmu_sva_notifier_synchronize(void)
 */
mmu_notifier_synchronize();
 }
+
+static int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
+struct device *dev, ioasid_t id)
+{
+   int ret = 0;
+   struct mm_struct *mm;
+   struct iommu_sva *handle;
+
+   if (domain->type != IOMMU_DOMAIN_SVA)
+   return -EINVAL;
+
+   mm = domain->mm;
+   if (WARN_ON(!mm))
+   return -ENODEV;
+
+   mutex_lock(_lock);
+   handle = __arm_smmu_sva_bind(dev, mm);
+   if (IS_ERR(handle))
+   ret = PTR_ERR(handle);
+   mutex_unlock(_lock);
+
+   return ret;
+}
+
+static void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+   struct mm_struct *mm = domain->mm;
+   struct arm_smmu_bond *bond = NULL, *t;
+   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+   mutex_lock(_lock);
+   list_for_each_entry(t, >bonds, list) {
+   if (t->mm == mm) {
+   bond = t;
+   break;
+   }
+   }
+
+   if (!WARN_ON(!bond) && refcount_dec_and_test(>refs)) {
+   list_del(>list);
+   arm_smmu_mmu_notifier_put(bond->smmu_mn);
+   kfree(bond);
+   }
+   mutex_unlock(_lock);
+}
+
+static void arm_smmu_sva_domain_free(struct iommu_domain *domain)
+{
+   kfree(domain);
+}
+
+static const struct iommu_domain_ops arm_smmu_sva_domain_ops = {
+   .attach_dev_pasid   = arm_smmu_sva_attach_dev_pasid,
+   .detach_dev_pasid   = arm_smmu_sva_detach_dev_pasid,
+   .free   = arm_smmu_sva_domain_free,
+};
+
+struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+   struct iommu_domain *domain;
+
+   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+   if (domain)
+   domain->ops = _smmu_sva_domain_ops;
+
+   return domain;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index afc63fce6107..9daf3de7e539 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1999,6 +1999,9 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
 {
struct arm_smmu_domain *smmu_domain;
 
+   if (type == IOMMU_DOMAIN_SVA)
+   return arm_smmu_sva_domain_alloc();
+
if (type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_DMA &&
type != IOMMU_DOMAIN_DMA_FQ &&
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 05/12] iommu/vt-d: Remove SVM_FLAG_SUPERVISOR_MODE support

2022-05-10 Thread Lu Baolu
The current kernel DMA with PASID support is based on the SVA with a flag
SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address
space to a PASID of the device. The device driver programs the device with
kernel virtual address (KVA) for DMA access. There have been security and
functional issues with this approach:

- The lack of IOTLB synchronization upon kernel page table updates.
  (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
- Other than slight more protection, using kernel virtual address (KVA)
  has little advantage over physical address. There are also no use
  cases yet where DMA engines need kernel virtual addresses for in-kernel
  DMA.

This removes SVM_FLAG_SUPERVISOR_MODE support in the Intel IOMMU driver.
The device driver is suggested to handle kernel DMA with PASID through
the kernel DMA APIs.

Link: https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/
Signed-off-by: Jacob Pan 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/svm.c | 53 +--
 1 file changed, 12 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 7ee37d996e15..574aa33a 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -313,8 +313,7 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned 
int pasid,
return 0;
 }
 
-static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
-unsigned int flags)
+static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
 {
ioasid_t max_pasid = dev_is_pci(dev) ?
pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
@@ -324,8 +323,7 @@ static int intel_svm_alloc_pasid(struct device *dev, struct 
mm_struct *mm,
 
 static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
   struct device *dev,
-  struct mm_struct *mm,
-  unsigned int flags)
+  struct mm_struct *mm)
 {
struct device_domain_info *info = dev_iommu_priv_get(dev);
unsigned long iflags, sflags;
@@ -341,22 +339,18 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
 
svm->pasid = mm->pasid;
svm->mm = mm;
-   svm->flags = flags;
INIT_LIST_HEAD_RCU(>devs);
 
-   if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
-   svm->notifier.ops = _mmuops;
-   ret = mmu_notifier_register(>notifier, mm);
-   if (ret) {
-   kfree(svm);
-   return ERR_PTR(ret);
-   }
+   svm->notifier.ops = _mmuops;
+   ret = mmu_notifier_register(>notifier, mm);
+   if (ret) {
+   kfree(svm);
+   return ERR_PTR(ret);
}
 
ret = pasid_private_add(svm->pasid, svm);
if (ret) {
-   if (svm->notifier.ops)
-   mmu_notifier_unregister(>notifier, mm);
+   mmu_notifier_unregister(>notifier, mm);
kfree(svm);
return ERR_PTR(ret);
}
@@ -391,9 +385,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
}
 
/* Setup the pasid table: */
-   sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
-   PASID_FLAG_SUPERVISOR_MODE : 0;
-   sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
+   sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
spin_lock_irqsave(>lock, iflags);
ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm->pasid,
FLPT_DEFAULT_DID, sflags);
@@ -410,8 +402,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
kfree(sdev);
 free_svm:
if (list_empty(>devs)) {
-   if (svm->notifier.ops)
-   mmu_notifier_unregister(>notifier, mm);
+   mmu_notifier_unregister(>notifier, mm);
pasid_private_remove(mm->pasid);
kfree(svm);
}
@@ -821,37 +812,17 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 {
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
-   unsigned int flags = 0;
struct iommu_sva *sva;
int ret;
 
-   if (drvdata)
-   flags = *(unsigned int *)drvdata;
-
-   if (flags & SVM_FLAG_SUPERVISOR_MODE) {
-   if (!ecap_srs(iommu->ecap)) {
-   dev_err(dev, "%s: Supervisor PASID not 

[PATCH v6 06/12] iommu/vt-d: Add SVA domain support

2022-05-10 Thread Lu Baolu
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.

Signed-off-by: Lu Baolu 
---
 include/linux/intel-iommu.h |  5 
 drivers/iommu/intel/iommu.c |  2 ++
 drivers/iommu/intel/svm.c   | 48 +
 3 files changed, 55 insertions(+)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 72e5d7900e71..2397c2007cda 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -744,6 +744,7 @@ void intel_svm_unbind(struct iommu_sva *handle);
 u32 intel_svm_get_pasid(struct iommu_sva *handle);
 int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
+struct iommu_domain *intel_svm_domain_alloc(void);
 
 struct intel_svm_dev {
struct list_head list;
@@ -769,6 +770,10 @@ struct intel_svm {
 };
 #else
 static inline void intel_svm_check(struct intel_iommu *iommu) {}
+static inline struct iommu_domain *intel_svm_domain_alloc(void)
+{
+   return NULL;
+}
 #endif
 
 #ifdef CONFIG_INTEL_IOMMU_DEBUGFS
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 99643f897f26..10b1e9dcbd98 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4343,6 +4343,8 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
return domain;
case IOMMU_DOMAIN_IDENTITY:
return _domain->domain;
+   case IOMMU_DOMAIN_SVA:
+   return intel_svm_domain_alloc();
default:
return NULL;
}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 574aa33a..641ab0491ada 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -931,3 +931,51 @@ int intel_svm_page_response(struct device *dev,
mutex_unlock(_mutex);
return ret;
 }
+
+static int intel_svm_attach_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+   struct device_domain_info *info = dev_iommu_priv_get(dev);
+   struct intel_iommu *iommu = info->iommu;
+   struct mm_struct *mm = domain->mm;
+   struct iommu_sva *sva;
+   int ret = 0;
+
+   mutex_lock(_mutex);
+   sva = intel_svm_bind_mm(iommu, dev, mm);
+   if (IS_ERR(sva))
+   ret = PTR_ERR(sva);
+   mutex_unlock(_mutex);
+
+   return ret;
+}
+
+static void intel_svm_detach_dev_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t pasid)
+{
+   mutex_lock(_mutex);
+   intel_svm_unbind_mm(dev, pasid);
+   mutex_unlock(_mutex);
+}
+
+static void intel_svm_domain_free(struct iommu_domain *domain)
+{
+   kfree(domain);
+}
+
+static const struct iommu_domain_ops intel_svm_domain_ops = {
+   .attach_dev_pasid   = intel_svm_attach_dev_pasid,
+   .detach_dev_pasid   = intel_svm_detach_dev_pasid,
+   .free   = intel_svm_domain_free,
+};
+
+struct iommu_domain *intel_svm_domain_alloc(void)
+{
+   struct iommu_domain *domain;
+
+   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+   if (domain)
+   domain->ops = _svm_domain_ops;
+
+   return domain;
+}
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 04/12] iommu/sva: Basic data structures for SVA

2022-05-10 Thread Lu Baolu
Use below data structures for SVA implementation in the IOMMU core:

- struct iommu_domain (IOMMU_DOMAIN_SVA type)
  Represent a hardware pagetable that the IOMMU hardware could use for
  SVA translation. Multiple iommu domains could be bound with an SVA mm
  and each grabs a mm_count of the mm in order to make sure mm could
  only be freed after all domains have been unbound. A new mm field is
  added to struct iommu_domain and a helper is added to retrieve mm from
  a domain pointer.

- struct iommu_sva (existing)
  Represent a bond relationship between an SVA ioas and an iommu domain.
  If a bond already exists, it's reused and a reference is taken.

- struct dev_iommu::sva_bonds
  A pasid-indexed xarray to track the bonds happened on the device.

Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Lu Baolu 
---
 include/linux/iommu.h | 13 +
 drivers/iommu/iommu.c |  3 +++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ab36244d4e94..2921e634491e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -64,6 +64,9 @@ struct iommu_domain_geometry {
 #define __IOMMU_DOMAIN_PT  (1U << 2)  /* Domain is identity mapped   */
 #define __IOMMU_DOMAIN_DMA_FQ  (1U << 3)  /* DMA-API uses flush queue*/
 
+#define __IOMMU_DOMAIN_SHARED  (1U << 4)  /* Page table shared from CPU  */
+#define __IOMMU_DOMAIN_HOST_VA (1U << 5)  /* Host CPU virtual address */
+
 /*
  * This are the possible domain-types
  *
@@ -86,6 +89,8 @@ struct iommu_domain_geometry {
 #define IOMMU_DOMAIN_DMA_FQ(__IOMMU_DOMAIN_PAGING |\
 __IOMMU_DOMAIN_DMA_API |   \
 __IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_SVA   (__IOMMU_DOMAIN_SHARED |\
+__IOMMU_DOMAIN_HOST_VA)
 
 struct iommu_domain {
unsigned type;
@@ -95,6 +100,9 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+#ifdef CONFIG_IOMMU_SVA
+   struct mm_struct *mm;
+#endif
 };
 
 static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
@@ -380,6 +388,9 @@ struct dev_iommu {
struct iommu_device *iommu_dev;
void*priv;
unsigned intpasid_bits;
+#ifdef CONFIG_IOMMU_SVA
+   struct xarray   sva_bonds;
+#endif
 };
 
 int iommu_device_register(struct iommu_device *iommu,
@@ -629,6 +640,8 @@ struct iommu_fwspec {
  */
 struct iommu_sva {
struct device   *dev;
+   struct iommu_domain *domain;
+   refcount_t  users;
 };
 
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 16e8db2d86fc..1abff5fc9554 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -202,6 +202,9 @@ static struct dev_iommu *dev_iommu_get(struct device *dev)
return NULL;
 
mutex_init(>lock);
+#ifdef CONFIG_IOMMU_SVA
+   xa_init(>sva_bonds);
+#endif
dev->iommu = param;
return param;
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 01/12] dmaengine: idxd: Separate user and kernel pasid enabling

2022-05-10 Thread Lu Baolu
From: Dave Jiang 

The idxd driver always gated the pasid enabling under a single knob and
this assumption is incorrect. The pasid used for kernel operation can be
independently toggled and has no dependency on the user pasid (and vice
versa). Split the two so they are independent "enabled" flags.

Cc: Vinod Koul 
Signed-off-by: Dave Jiang 
Signed-off-by: Jacob Pan 
Link: 
https://lore.kernel.org/linux-iommu/20220315050713.2000518-10-jacob.jun@linux.intel.com/
Signed-off-by: Lu Baolu 
---
 drivers/dma/idxd/idxd.h |  6 ++
 drivers/dma/idxd/cdev.c |  4 ++--
 drivers/dma/idxd/init.c | 30 ++
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index da72eb15f610..ccbefd0be617 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -239,6 +239,7 @@ enum idxd_device_flag {
IDXD_FLAG_CONFIGURABLE = 0,
IDXD_FLAG_CMD_RUNNING,
IDXD_FLAG_PASID_ENABLED,
+   IDXD_FLAG_USER_PASID_ENABLED,
 };
 
 struct idxd_dma_dev {
@@ -469,6 +470,11 @@ static inline bool device_pasid_enabled(struct idxd_device 
*idxd)
return test_bit(IDXD_FLAG_PASID_ENABLED, >flags);
 }
 
+static inline bool device_user_pasid_enabled(struct idxd_device *idxd)
+{
+   return test_bit(IDXD_FLAG_USER_PASID_ENABLED, >flags);
+}
+
 static inline bool device_swq_supported(struct idxd_device *idxd)
 {
return (support_enqcmd && device_pasid_enabled(idxd));
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index b9b2b4a4124e..7df996deffbe 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -99,7 +99,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
ctx->wq = wq;
filp->private_data = ctx;
 
-   if (device_pasid_enabled(idxd)) {
+   if (device_user_pasid_enabled(idxd)) {
sva = iommu_sva_bind_device(dev, current->mm, NULL);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
@@ -152,7 +152,7 @@ static int idxd_cdev_release(struct inode *node, struct 
file *filep)
if (wq_shared(wq)) {
idxd_device_drain_pasid(idxd, ctx->pasid);
} else {
-   if (device_pasid_enabled(idxd)) {
+   if (device_user_pasid_enabled(idxd)) {
/* The wq disable in the disable pasid function will 
drain the wq */
rc = idxd_wq_disable_pasid(wq);
if (rc < 0)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 993a5dcca24f..e1b5d1e4a949 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -513,16 +513,19 @@ static int idxd_probe(struct idxd_device *idxd)
 
if (IS_ENABLED(CONFIG_INTEL_IDXD_SVM) && sva) {
rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA);
-   if (rc == 0) {
-   rc = idxd_enable_system_pasid(idxd);
-   if (rc < 0) {
-   iommu_dev_disable_feature(dev, 
IOMMU_DEV_FEAT_SVA);
-   dev_warn(dev, "Failed to enable PASID. No SVA 
support: %d\n", rc);
-   } else {
-   set_bit(IDXD_FLAG_PASID_ENABLED, >flags);
-   }
-   } else {
+   if (rc) {
+   /*
+* Do not bail here since legacy DMA is still
+* supported, both user and in-kernel DMA with
+* PASID rely on SVA feature.
+*/
dev_warn(dev, "Unable to turn on SVA feature.\n");
+   } else {
+   set_bit(IDXD_FLAG_USER_PASID_ENABLED, >flags);
+   if (idxd_enable_system_pasid(idxd))
+   dev_warn(dev, "No in-kernel DMA with PASID.\n");
+   else
+   set_bit(IDXD_FLAG_PASID_ENABLED, >flags);
}
} else if (!sva) {
dev_warn(dev, "User forced SVA off via module param.\n");
@@ -561,7 +564,8 @@ static int idxd_probe(struct idxd_device *idxd)
  err:
if (device_pasid_enabled(idxd))
idxd_disable_system_pasid(idxd);
-   iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
+   if (device_user_pasid_enabled(idxd) || device_pasid_enabled(idxd))
+   iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
return rc;
 }
 
@@ -574,7 +578,8 @@ static void idxd_cleanup(struct idxd_device *idxd)
idxd_cleanup_internals(idxd);
if (device_pasid_enabled(idxd))
idxd_disable_system_pasid(idxd);
-   iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
+   if (device_user_pasid_enabled(idxd) || device_pasid_enabled(idxd))
+   iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
 }
 
 static int idxd_pci_probe(struct pci_dev 

[PATCH v6 02/12] iommu: Add pasid_bits field in struct dev_iommu

2022-05-10 Thread Lu Baolu
Use this field to save the pasid/ssid bits that a device is able to
support with its IOMMU hardware. It is a generic attribute of a device
and lifting it into the per-device dev_iommu struct makes it possible
to allocate a PASID for device without calls into the IOMMU drivers.
Any iommu driver which suports PASID related features should set this
field before features are enabled on the devices.

For initialization of this field in the VT-d driver, the
info->pasid_supported is only set for PCI devices. So the status is
that non-PCI SVA hasn't been supported yet. Setting this field only for
PCI devices has no functional change.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
---
 include/linux/iommu.h   | 1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 ++
 drivers/iommu/intel/iommu.c | 5 -
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e1afe169549..b8ffaf2cb1d0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -373,6 +373,7 @@ struct dev_iommu {
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void*priv;
+   unsigned intpasid_bits;
 };
 
 int iommu_device_register(struct iommu_device *iommu,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8f..afc63fce6107 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2681,6 +2681,8 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
master->stall_enabled = true;
 
+   dev->iommu->pasid_bits = master->ssid_bits;
+
return >iommu;
 
 err_free_master:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2990f80c5e08..99643f897f26 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4624,8 +4624,11 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
if (pasid_supported(iommu)) {
int features = pci_pasid_features(pdev);
 
-   if (features >= 0)
+   if (features >= 0) {
info->pasid_supported = features | 1;
+   dev->iommu->pasid_bits =
+   fls(pci_max_pasids(pdev)) - 1;
+   }
}
 
if (info->ats_supported && ecap_prs(iommu->ecap) &&
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-10 Thread Lu Baolu
Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):

 - SVA (Shared Virtual Address)
 - kernel DMA with PASID
 - hardware-assist mediated device

This adds a pair of common domain ops for this purpose and adds helpers
to attach/detach a domain to/from a {device, PASID}. Some buses, like
PCI, route packets without considering the PASID value. Thus a DMA target
address with PASID might be treated as P2P if the address falls into the
MMIO BAR of other devices in the group. To make things simple, these
interfaces only apply to devices belonging to the singleton groups, and
the singleton is immutable in fabric i.e. not affected by hotplug.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
---
 include/linux/iommu.h | 21 +
 drivers/iommu/iommu.c | 71 +++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b8ffaf2cb1d0..ab36244d4e94 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -263,6 +263,8 @@ struct iommu_ops {
  * struct iommu_domain_ops - domain specific operations
  * @attach_dev: attach an iommu domain to a device
  * @detach_dev: detach an iommu domain from a device
+ * @attach_dev_pasid: attach an iommu domain to a pasid of device
+ * @detach_dev_pasid: detach an iommu domain from a pasid of device
  * @map: map a physically contiguous memory region to an iommu domain
  * @map_pages: map a physically contiguous set of pages of the same size to
  * an iommu domain.
@@ -283,6 +285,10 @@ struct iommu_ops {
 struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+   int (*attach_dev_pasid)(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid);
+   void (*detach_dev_pasid)(struct iommu_domain *domain,
+struct device *dev, ioasid_t pasid);
 
int (*map)(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
@@ -678,6 +684,10 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, 
void *owner);
 void iommu_group_release_dma_owner(struct iommu_group *group);
 bool iommu_group_dma_owner_claimed(struct iommu_group *group);
 
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
+void iommu_detach_device_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t pasid);
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1051,6 +1061,17 @@ static inline bool iommu_group_dma_owner_claimed(struct 
iommu_group *group)
 {
return false;
 }
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid)
+{
+   return -ENODEV;
+}
+
+static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
+struct device *dev, ioasid_t pasid)
+{
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 29906bc16371..16e8db2d86fc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -38,6 +38,7 @@ struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
+   struct xarray pasid_array;
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
@@ -630,6 +631,7 @@ struct iommu_group *iommu_group_alloc(void)
mutex_init(>mutex);
INIT_LIST_HEAD(>devices);
INIT_LIST_HEAD(>entry);
+   xa_init(>pasid_array);
 
ret = ida_simple_get(_group_ida, 0, 0, GFP_KERNEL);
if (ret < 0) {
@@ -3190,3 +3192,72 @@ bool iommu_group_dma_owner_claimed(struct iommu_group 
*group)
return user;
 }
 EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+
+static bool device_group_immutable_singleton(struct device *dev)
+{
+   struct iommu_group *group = iommu_group_get(dev);
+   int count;
+
+   if (!group)
+   return false;
+
+   mutex_lock(>mutex);
+   count = iommu_group_device_count(group);
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   if (count != 1)
+   return false;
+
+   /*
+* The PCI device could be considered to be fully isolated if all
+* devices on the path from the device to the host-PCI bridge are
+* protected from peer-to-peer DMA by ACS.
+*/
+   if (dev_is_pci(dev))
+   return pci_acs_path_enabled(to_pci_dev(dev), NULL,
+   

[PATCH v6 00/12] iommu: SVA and IOPF refactoring

2022-05-10 Thread Lu Baolu
Hi folks,

The former part of this series refactors the IOMMU SVA code by assigning
an SVA type of iommu_domain to a shared virtual address and replacing
sva_bind/unbind iommu ops with attach/detach_dev_pasid domain ops.

The latter part changes the existing I/O page fault handling framework
from only serving SVA to a generic one. Any driver or component could
handle the I/O page faults for its domain in its own way by installing
an I/O page fault handler.

This series has been functionally tested on an x86 machine and compile
tested for other architectures.

This series is also available on github:
[2] https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v6

Please review and suggest.

Best regards,
baolu

Change log:
v6:
 - Refine the SVA basic data structures.
   Link: https://lore.kernel.org/linux-iommu/YnFv0ps0Ad8v+7uH@myrica/
 - Refine arm smmuv3 sva domain allocation.
 - Fix a possible lock issue.
   Link: https://lore.kernel.org/linux-iommu/YnFydE8j8l7Q4m+b@myrica/

v5:
 - 
https://lore.kernel.org/linux-iommu/20220502014842.991097-1-baolu...@linux.intel.com/
 - Address review comments from Jean-Philippe Brucker. Very appreciated!
 - Remove redundant pci aliases check in
   device_group_immutable_singleton().
 - Treat all buses exept PCI as static in immutable singleton check.
 - As the sva_bind/unbind() have already guaranteed sva domain free only
   after iopf_queue_flush_dev(), remove the unnecessary domain refcount.
 - Move domain get() out of the list iteration in iopf_handle_group().

v4:
 - 
https://lore.kernel.org/linux-iommu/20220421052121.3464100-1-baolu...@linux.intel.com/
 - Solve the overlap with another series and make this series
   self-contained.
 - No objection to the abstraction of data structure during v3 review.
   Hence remove the RFC subject prefix.
 - Refine the immutable singleton group code according to Kevin's
   comments.

v3:
 - 
https://lore.kernel.org/linux-iommu/20220410102443.294128-1-baolu...@linux.intel.com/
 - Rework iommu_group_singleton_lockdown() by adding a flag to the group
   that positively indicates the group can never have more than one
   member, even after hot plug.
 - Abstract the data structs used for iommu sva in a separated patches to
   make it easier for review.
 - I still keep the RFC prefix in this series as above two significant
   changes need at least another round review to be finalized.
 - Several misc refinements.

v2:
 - 
https://lore.kernel.org/linux-iommu/20220329053800.3049561-1-baolu...@linux.intel.com/
 - Add sva domain life cycle management to avoid race between unbind and
   page fault handling.
 - Use a single domain for each mm.
 - Return a single sva handler for the same binding.
 - Add a new helper to meet singleton group requirement.
 - Rework the SVA domain allocation for arm smmu v3 driver and move the
   pasid_bit initialization to device probe.
 - Drop the patch "iommu: Handle IO page faults directly".
 - Add mmget_not_zero(mm) in SVA page fault handler.

v1:
 - 
https://lore.kernel.org/linux-iommu/20220320064030.2936936-1-baolu...@linux.intel.com/
 - Initial post.

Dave Jiang (1):
  dmaengine: idxd: Separate user and kernel pasid enabling

Lu Baolu (11):
  iommu: Add pasid_bits field in struct dev_iommu
  iommu: Add attach/detach_dev_pasid domain ops
  iommu/sva: Basic data structures for SVA
  iommu/vt-d: Remove SVM_FLAG_SUPERVISOR_MODE support
  iommu/vt-d: Add SVA domain support
  arm-smmu-v3/sva: Add SVA domain support
  iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
  iommu: Remove SVA related callbacks from iommu ops
  iommu: Prepare IOMMU domain for IOPF
  iommu: Per-domain I/O page fault handling
  iommu: Rename iommu-sva-lib.{c,h}

 include/linux/intel-iommu.h   |   9 +-
 include/linux/iommu.h |  99 +--
 drivers/dma/idxd/idxd.h   |   6 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  19 +-
 .../iommu/{iommu-sva-lib.h => iommu-sva.h}|   3 -
 drivers/dma/idxd/cdev.c   |   4 +-
 drivers/dma/idxd/init.c   |  30 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 111 +---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  10 +-
 drivers/iommu/intel/iommu.c   |  12 +-
 drivers/iommu/intel/svm.c | 146 --
 drivers/iommu/io-pgfault.c|  73 +
 drivers/iommu/iommu-sva-lib.c |  71 -
 drivers/iommu/iommu-sva.c | 261 ++
 drivers/iommu/iommu.c | 193 +++--
 drivers/iommu/Makefile|   2 +-
 16 files changed, 623 insertions(+), 426 deletions(-)
 rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (91%)
 delete mode 100644 drivers/iommu/iommu-sva-lib.c
 create mode 100644 drivers/iommu/iommu-sva.c

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org