Re: [PATCH v6 31/34] iommu/mediatek: Get the proper bankid for multi banks

2022-04-30 Thread Yong Wu via iommu
Hi Matthias,

Thanks very much for reviewing.

On Thu, 2022-04-28 at 16:14 +0200, Matthias Brugger wrote:
> 
> On 07/04/2022 09:57, Yong Wu wrote:
> > We preassign some ports in a special bank via the new defined
> > banks_portmsk. Put it in the plat_data means it is not expected to
> > be
> > adjusted dynamically.
> > 
> > If the iommu id in the iommu consumer's dtsi node is inside this
> > banks_portmsk, then we switch it to this special iommu bank, and
> > initialise the IOMMU bank HW.
> > 
> > Each a bank has the independent pgtable(4GB iova range). Each a
> > bank
> > is a independent iommu domain/group. Currently we don't separate
> > different
> > iova ranges inside a bank.
> > 
> > Signed-off-by: Yong Wu 
> > Reviewed-by: AngeloGioacchino Del Regno <
> > angelogioacchino.delre...@collabora.com>
> > ---
> >   drivers/iommu/mtk_iommu.c | 39
> > ---
> >   1 file changed, 36 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 0828cff97625..d42b3d35a36e 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c

[snip]

> > @@ -802,6 +828,7 @@ static struct iommu_group
> > *mtk_iommu_device_group(struct device *dev)
> > struct mtk_iommu_data *c_data = dev_iommu_priv_get(dev), *data;
> > struct list_head *hw_list = c_data->hw_list;
> > struct iommu_group *group;
> > +   unsigned int bankid, groupid;
> > int regionid;
> >   
> > data = mtk_iommu_get_frst_data(hw_list);
> > @@ -812,12 +839,18 @@ static struct iommu_group
> > *mtk_iommu_device_group(struct device *dev)
> > if (regionid < 0)
> > return ERR_PTR(regionid);
> >   
> > +   bankid = mtk_iommu_get_bank_id(dev, data->plat_data);
> 
> I think code readability would be improved if we add a new function
> like 
> mtk_iommu_get_id which call mtk_iommu_get_bankid and if necessary 
> mtk_iommu_get_regionid.

OK, I will define a new function, like mtk_iommu_get_group_id for the
readability.

> 
> > mutex_lock(>mutex);
> > -   group = data->m4u_group[regionid];
> > +   /*
> > +* If the bank function is enabled, each a bank is a iommu
> > group/domain.
> > +* otherwise, each a iova region is a iommu group/domain.
> 
> While at it:
> "If the bank function is enabled, each bank is a iommu group/domain.
> Otherwise, 
> each iova region is a iommu group/domain."

And move this comment into the new funtion.

Also of course, I will fix the other two comments and send v7.

Thanks.

> 
> Regards,
> Matthias
> 
> > +*/
> > +   groupid = bankid ? bankid : regionid;
> > +   group = data->m4u_group[groupid];
> > if (!group) {
> > group = iommu_group_alloc();
> > if (!IS_ERR(group))
> > -   data->m4u_group[regionid] = group;
> > +   data->m4u_group[groupid] = group;
> > } else {
> > iommu_group_ref_get(group);
> > }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable

2022-04-30 Thread David Gibson
On Fri, Apr 29, 2022 at 09:54:42AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 29, 2022 at 04:00:14PM +1000, David Gibson wrote:
> > > But I don't have a use case in mind? The simplified things I know
> > > about want to attach their devices then allocate valid IOVA, they
> > > don't really have a notion about what IOVA regions they are willing to
> > > accept, or necessarily do hotplug.
> > 
> > The obvious use case is qemu (or whatever) emulating a vIOMMU.  The
> > emulation code knows the IOVA windows that are expected of the vIOMMU
> > (because that's a property of the emulated platform), and requests
> > them of the host IOMMU.  If the host can supply that, you're good
> > (this doesn't necessarily mean the host windows match exactly, just
> > that the requested windows fit within the host windows).  If not,
> > you report an error.  This can be done at any point when the host
> > windows might change - so try to attach a device that can't support
> > the requested windows, and it will fail.  Attaching a device which
> > shrinks the windows, but still fits the requested windows within, and
> > you're still good to go.
> 
> We were just talking about this in another area - Alex said that qemu
> doesn't know the IOVA ranges? Is there some vIOMMU cases where it does?

Uh.. what?  We certainly know (or, rather, choose) the IOVA ranges for
ppc.  That is to say we set up the default IOVA ranges at machine
construction (those defaults have changed with machine version a
couple of times).  If the guest uses dynamic DMA windows we then
update those ranges based on the hypercalls, but at any point we know
what the IOVA windows are supposed to be.  I don't really see how x86
or anything else could not know the IOVA ranges.  Who else *could* set
the ranges when implementing a vIOMMU in TCG mode?

For the non-vIOMMU case then IOVA==GPA, so everything qemu knows about
the GPA space it also knows about the IOVA space.  Which, come to
think of it, means memory hotplug also complicates things.

> Even if yes, qemu is able to manage this on its own - it doesn't use
> the kernel IOVA allocator, so there is not a strong reason to tell the
> kernel what the narrowed ranges are.

I don't follow.  The problem for the qemu case here is if you hotplug
a device which narrows down the range to something smaller than the
guest expects.  If qemu has told the kernel the ranges it needs, that
can just fail (which is the best you can do).  If the kernel adds the
device but narrows the ranges, then you may have just put the guest
into a situation where the vIOMMU cannot do what the guest expects it
to.  If qemu can only query the windows, not specify them then it
won't know that adding a particular device will conflict with its
guest side requirements until after it's already added.  That could
mess up concurrent guest initiated map operations for existing devices
in the same guest side domain, so I don't think reversing the hotplug
after the problem is detected is enough.

> > > That is one possibility, yes. qemu seems to be using this to establish
> > > a clone ioas of an existing operational one which is another usage
> > > model.
> > 
> > Right, for qemu (or other hypervisors) the obvious choice would be to
> > create a "staging" IOAS where IOVA == GPA, then COPY that into the various
> > emulated bus IOASes.  For a userspace driver situation, I'm guessing
> > you'd map your relevant memory pool into an IOAS, then COPY to the
> > IOAS you need for whatever specific devices you're using.
> 
> qemu seems simpler, it juggled multiple containers so it literally
> just copies when it instantiates a new container and does a map in
> multi-container.

I don't follow you.  Are you talking about the vIOMMU or non vIOMMU
case?  In the vIOMMU case the different containers can be for
different guest side iommu domains with different guest-IOVA spaces,
so you can't just copy from one to another.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 5/7] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

2022-04-30 Thread Leo Yan
On Thu, Apr 07, 2022 at 08:58:39PM +0800, Yicong Yang via iommu wrote:
> From: Qi Liu 
> 
> 'perf record' and 'perf report --dump-raw-trace' supported in this
> patch.
> 
> Example usage:
> 
> Output will contain raw PTT data and its textual representation, such
> as:
> 
> 0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x40  offset: 0
> ref: 0xa5d50c725  idx: 0  tid: -1  cpu: 0
> .
> . ... HISI PTT data: size 4194304 bytes
> .  : 00 00 00 00 Prefix
> .  0004: 08 20 00 60 Header DW0
> .  0008: ff 02 00 01 Header DW1
> .  000c: 20 08 00 00 Header DW2
> .  0010: 10 e7 44 ab Header DW3
> .  0014: 2a a8 1e 01 Time
> .  0020: 00 00 00 00 Prefix
> .  0024: 01 00 00 60 Header DW0
> .  0028: 0f 1e 00 01 Header DW1
> .  002c: 04 00 00 00 Header DW2
> .  0030: 40 00 81 02 Header DW3
> .  0034: ee 02 00 00 Time
> 
> 
> Signed-off-by: Qi Liu 
> Signed-off-by: Yicong Yang 
> ---
>  tools/perf/arch/arm/util/auxtrace.c   |  76 +-
>  tools/perf/arch/arm/util/pmu.c|   3 +
>  tools/perf/arch/arm64/util/Build  |   2 +-
>  tools/perf/arch/arm64/util/hisi_ptt.c | 195 
>  tools/perf/util/Build |   2 +
>  tools/perf/util/auxtrace.c|   4 +
>  tools/perf/util/auxtrace.h|   1 +
>  tools/perf/util/hisi-ptt-decoder/Build|   1 +
>  .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.c   | 170 ++
>  .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.h   |  28 +++
>  tools/perf/util/hisi_ptt.c| 218 ++
>  tools/perf/util/hisi_ptt.h|  28 +++

It's good to divide the big patch into smaller patches, e.g. one patch
is to add PTT auxtrace (so mainly for perf record), and the second
patch is to add PTT decoder for perf decoding.

>  12 files changed, 724 insertions(+), 4 deletions(-)
>  create mode 100644 tools/perf/arch/arm64/util/hisi_ptt.c
>  create mode 100644 tools/perf/util/hisi-ptt-decoder/Build
>  create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.c
>  create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.h
>  create mode 100644 tools/perf/util/hisi_ptt.c
>  create mode 100644 tools/perf/util/hisi_ptt.h
> 
> diff --git a/tools/perf/arch/arm/util/auxtrace.c 
> b/tools/perf/arch/arm/util/auxtrace.c
> index 5fc6a2a3dbc5..393f5757c039 100644
> --- a/tools/perf/arch/arm/util/auxtrace.c
> +++ b/tools/perf/arch/arm/util/auxtrace.c
> @@ -4,9 +4,11 @@
>   * Author: Mathieu Poirier 
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "../../../util/auxtrace.h"
>  #include "../../../util/debug.h"
> @@ -14,6 +16,7 @@
>  #include "../../../util/pmu.h"
>  #include "cs-etm.h"
>  #include "arm-spe.h"
> +#include "hisi_ptt.h"
>  
>  static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
>  {
> @@ -50,6 +53,58 @@ static struct perf_pmu **find_all_arm_spe_pmus(int 
> *nr_spes, int *err)
>   return arm_spe_pmus;
>  }
>  
> +static struct perf_pmu **find_all_hisi_ptt_pmus(int *nr_ptts, int *err)
> +{
> + const char *sysfs = sysfs__mountpoint();
> + struct perf_pmu **hisi_ptt_pmus = NULL;
> + struct dirent *dent;
> + char path[PATH_MAX];
> + DIR *dir = NULL;
> + int idx = 0;
> +
> + snprintf(path, PATH_MAX, "%s" EVENT_SOURCE_DEVICE_PATH, sysfs);
> + dir = opendir(path);
> + if (!dir) {
> + pr_err("can't read directory '%s'\n", EVENT_SOURCE_DEVICE_PATH);
> + *err = -EINVAL;
> + goto out;
> + }
> +
> + while ((dent = readdir(dir))) {
> + if (strstr(dent->d_name, HISI_PTT_PMU_NAME))
> + (*nr_ptts)++;
> + }
> +
> + if (!(*nr_ptts))
> + goto out;
> +
> + hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * (*nr_ptts));
> + if (!hisi_ptt_pmus) {
> + pr_err("hisi_ptt alloc failed\n");
> + *err = -ENOMEM;
> + goto out;
> + }
> +
> + rewinddir(dir);
> + while ((dent = readdir(dir))) {
> + if (strstr(dent->d_name, HISI_PTT_PMU_NAME) && idx < 
> (*nr_ptts)) {
> + hisi_ptt_pmus[idx] = perf_pmu__find(dent->d_name);
> + if (hisi_ptt_pmus[idx]) {
> + pr_debug2("%s %d: hisi_ptt_pmu %d type %d name 
> %s\n",
> + __func__, __LINE__, idx,
> + hisi_ptt_pmus[idx]->type,
> + 

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-30 Thread Baolu Lu

On 2022/4/30 06:19, Fenghua Yu wrote:

Hi, Jean and Baolu,

On Fri, Apr 29, 2022 at 03:34:36PM +0100, Jean-Philippe Brucker wrote:

On Fri, Apr 29, 2022 at 06:51:17AM -0700, Fenghua Yu wrote:

Hi, Baolu,

On Fri, Apr 29, 2022 at 03:53:57PM +0800, Baolu Lu wrote:

On 2022/4/28 16:39, Jean-Philippe Brucker wrote:

The address space is what the OOM killer is after.  That gets refcounted
with mmget()/mmput()/mm->mm_users.  The OOM killer is satiated by the
page freeing done in __mmput()->exit_mmap().

Also, all the VMAs should be gone after exit_mmap().  So, even if
vma->vm_file was holding a reference to a device driver, that reference
should be gone by the time __mmdrop() is actually freeing the PASID.


I agree with all that. The concern was about tearing down the PASID in the
IOMMU and device from the release() MMU notifier, which would happen in
exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And
since the IOMMU drivers need to hold mm->mm_count anyway between bind()
and unbind(), I think Fenghua's fix works.


But I didn't find mmgrab()/mmdrop() get called in both arm and intel
IOMMU drivers.

$ git grep mmgrab drivers/iommu/
[no output]

Do we need to add these in a separated fix patch, or I missed anything
here?


On both ARM and X86, sva_bind() calls mmu_notifier_register()->mmgrab() and
sva_unbind() calls mmu_notifier_unregister()/mmu_notifier_put()->mmdrop().


Yes, although for Arm I realized the mmu_notifier grab wasn't sufficient
so I sent a separate fix that should go in 5.18 as well
https://lore.kernel.org/linux-iommu/20220426130444.300556-1-jean-phili...@linaro.org/
The Arm driver still touches the arch mm context after mmu_notifier_put().
I don't think X86 has that problem.


I think so too. On X86, the mm is not used after mmu_notifier_unregister().

In case of supervisor mode SVM (i.e. svm is bound to init_mm), the code
is right too because init_mm and its PASID cannot be dropped and
mmu_notifier_register()/mmu_notifier_unregister() are not called.

So I think no extra fix patch is needed on X86.


Thanks, Fenghua and Jean. It's clear to me now.

Jean, another quick question about the iommu_sva_bind_device()

/**
 * iommu_sva_bind_device() - Bind a process address space to a device
 * @dev: the device
 * @mm: the mm to bind, caller must hold a reference to it
 * @drvdata: opaque data pointer to pass to bind callback

This interface requires the caller to take a reference to mm. Which
reference should it take, mm->mm_count or mm->mm_users? It's better to
make it explicit in this comment.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 18/19] iommu/intel: Access/Dirty bit support for SL domains

2022-04-30 Thread Baolu Lu

On 2022/4/29 05:09, Joao Martins wrote:

--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5089,6 +5089,113 @@ static void intel_iommu_iotlb_sync_map(struct 
iommu_domain *domain,
}
  }
  
+static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,

+ bool enable)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct device_domain_info *info;
+   unsigned long flags;
+   int ret = -EINVAL;


if (domain_use_first_level(dmar_domain))
return -EOPNOTSUPP;


+
+   spin_lock_irqsave(_domain_lock, flags);
+   if (list_empty(_domain->devices)) {
+   spin_unlock_irqrestore(_domain_lock, flags);
+   return ret;
+   }


I agreed with Kevin's suggestion in his reply.


+
+   list_for_each_entry(info, _domain->devices, link) {
+   if (!info->dev || (info->domain != dmar_domain))
+   continue;


This check is redundant.


+
+   /* Dirty tracking is second-stage level SM only */
+   if ((info->domain && domain_use_first_level(info->domain)) ||
+   !ecap_slads(info->iommu->ecap) ||
+   !sm_supported(info->iommu) || !intel_iommu_sm) {
+   ret = -EOPNOTSUPP;
+   continue;


Perhaps break and return -EOPNOTSUPP directly here? We are not able to
support a mixed mode, right?


+   }
+
+   ret = intel_pasid_setup_dirty_tracking(info->iommu, 
info->domain,
+info->dev, PASID_RID2PASID,
+enable);
+   if (ret)
+   break;
+   }
+   spin_unlock_irqrestore(_domain_lock, flags);
+
+   /*
+* We need to flush context TLB and IOTLB with any cached translations
+* to force the incoming DMA requests for have its IOTLB entries tagged
+* with A/D bits
+*/
+   intel_flush_iotlb_all(domain);
+   return ret;
+}


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu