Re: [PATCH 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-04-09 Thread Jacob Pan
Hi Jean-Philippe,

On Fri, 9 Apr 2021 12:22:21 +0200, Jean-Philippe Brucker
 wrote:

> On Thu, Apr 08, 2021 at 10:08:55AM -0700, Jacob Pan wrote:
> > The void* drvdata parameter isn't really used in iommu_sva_bind_device()
> > API,  
> 
> Right, it used to be a cookie passed to the device driver in the exit_mm()
> callback, but that went away with edcc40d2ab5f ("iommu: Remove
> iommu_sva_ops::mm_exit()")
> 
> > the current IDXD code "borrows" the drvdata for a VT-d private flag
> > for supervisor SVA usage.
> > 
> > Supervisor/Privileged mode request is a generic feature. It should be
> > promoted from the VT-d vendor driver to the generic code.
> > 
> > This patch replaces void* drvdata with a unsigned int flags parameter
> > and adjusts callers accordingly.  
> 
> Thanks for cleaning this up. Making flags unsigned long seems more common
> (I suggested int without thinking). But it doesn't matter much, we won't
> get to 32 flags.
> 
I was just thinking unsigned int is 32 bit for both 32 and 64 bit machine.

> > 
> > Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
> > Suggested-by: Jean-Philippe Brucker 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/dma/idxd/cdev.c |  2 +-
> >  drivers/dma/idxd/init.c |  6 +++---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c |  2 +-
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  4 ++--
> >  drivers/iommu/intel/Kconfig |  1 +
> >  drivers/iommu/intel/svm.c   | 18 ++
> >  drivers/iommu/iommu.c   |  9 ++---
> >  drivers/misc/uacce/uacce.c  |  2 +-
> >  include/linux/intel-iommu.h |  2 +-
> >  include/linux/intel-svm.h   | 17 ++---
> >  include/linux/iommu.h   | 19
> > --- 11 files changed, 40 insertions(+), 42 deletions(-)
> > 
> > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
> > index 0db9b82..21ec82b 100644
> > --- a/drivers/dma/idxd/cdev.c
> > +++ b/drivers/dma/idxd/cdev.c
> > @@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode,
> > struct file *filp) filp->private_data = ctx;
> >  
> > if (device_pasid_enabled(idxd)) {
> > -   sva = iommu_sva_bind_device(dev, current->mm, NULL);
> > +   sva = iommu_sva_bind_device(dev, current->mm, 0);
> > if (IS_ERR(sva)) {
> > rc = PTR_ERR(sva);
> > dev_err(dev, "pasid allocation failed: %d\n",
> > rc); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> > index 085a0c3..cdc85f1 100644
> > --- a/drivers/dma/idxd/init.c
> > +++ b/drivers/dma/idxd/init.c
> > @@ -300,13 +300,13 @@ static struct idxd_device *idxd_alloc(struct
> > pci_dev *pdev) 
> >  static int idxd_enable_system_pasid(struct idxd_device *idxd)
> >  {
> > -   int flags;
> > +   unsigned int flags;
> > unsigned int pasid;
> > struct iommu_sva *sva;
> >  
> > -   flags = SVM_FLAG_SUPERVISOR_MODE;
> > +   flags = IOMMU_SVA_BIND_SUPERVISOR;
> >  
> > -   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
> > +   sva = iommu_sva_bind_device(>pdev->dev, NULL, flags);
> > if (IS_ERR(sva)) {
> > dev_warn(>pdev->dev,
> >  "iommu sva bind failed: %ld\n", PTR_ERR(sva));
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index
> > bb251ca..23e287e 100644 ---
> > a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -354,7 +354,7 @@
> > __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) }
> >  
> >  struct iommu_sva *
> > -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void
> > *drvdata) +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
> > unsigned int flags)  
> 
> Could you add a check on flags:
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index
> bb251cab61f3..145ceb2fc5da 100644 ---
> a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -354,12 +354,15 @@
> __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) }
> 
>  struct iommu_sva *
> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void
> *drvdata) +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
> unsigned int flags) {
>   struct iommu_sva *handle;
>   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> 
> + if (flags)
> + return ERR_PTR(-EINVAL);
> +
yes, will do.

>   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
>   return ERR_PTR(-EINVAL);
> 
> 
> 
> >  {
> > struct iommu_sva *handle;
> > struct iommu_domain *domain = 

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread David Coe

On 09/04/2021 09:58, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be programmed
and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.

Unfortunatly, there is no documentation of since which generation
of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume
that the buggy platforms are less likely to be in used, and it should
be relatively safe to remove this legacy logic.


Thanks for explaining the 'context', Suravee.


Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd/init.c | 24 +---
  1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 648cdfd03074..247cdda5d683 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
  }
  
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,

-   u8 fxn, u64 *value, bool is_write);
-
  static void init_iommu_perf_ctr(struct amd_iommu *iommu)
  {
+   u64 val;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg = 0;
  
  	if (!iommu_feature(iommu, FEATURE_PC))

return;
  
  	amd_iommu_pc_present = true;
  
-	/* save the value to restore, if writable */

-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
-   goto pc_false;
-
-   /* Check if the performance counters can be written to */
-   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2))
-   goto pc_false;
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
-   goto pc_false;
-
pci_info(pdev, "IOMMU performance counters supported\n");
  
  	val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);

@@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu)
iommu->max_counters = (u8) ((val >> 7) & 0xf);
  
  	return;

-
-pc_false:
-   pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
-   amd_iommu_pc_present = false;
-   return;
  }
  
  static ssize_t amd_iommu_show_cap(struct device *dev,




I'll test your revert + update IOMMU patch on my Ryzen 2400G and 4700U 
most likely over the weekend. Very interesting!


Please be aware that your original IOMMU patch has already reached the 
imminent release of Ubuntu 21.04 (Hirsute). I've taken the liberty of 
adding Alex Hung (lead kernel developer at Ubuntu) to the circulation list.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] swiotlb: Make SWIOTLB_NO_FORCE perform no allocation

2021-04-09 Thread Florian Fainelli



On 4/9/2021 12:32 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Apr 08, 2021 at 08:13:07PM -0700, Florian Fainelli wrote:
>>
>>
>> On 3/24/2021 1:42 AM, Christoph Hellwig wrote:
>>> On Mon, Mar 22, 2021 at 06:53:49PM -0700, Florian Fainelli wrote:
 When SWIOTLB_NO_FORCE is used, there should really be no allocations of
 default_nslabs to occur since we are not going to use those slabs. If a
 platform was somehow setting swiotlb_no_force and a later call to
 swiotlb_init() was to be made we would still be proceeding with
 allocating the default SWIOTLB size (64MB), whereas if swiotlb=noforce
 was set on the kernel command line we would have only allocated 2KB.

 This would be inconsistent and the point of initializing default_nslabs
 to 1, was intended to allocate the minimum amount of memory possible, so
 simply remove that minimal allocation period.

 Signed-off-by: Florian Fainelli 
>>>
>>> Looks good,
>>>
>>> Reviewed-by: Christoph Hellwig 
>>>
>>
>> Thanks! Konrad, can you apply this patch to your for-linus-5.13 branch
>> if you are also happy with it?
> 
> It should be now visible?

Not seeing it here:

https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/log/?h=devel/for-linus-5.13
-- 
Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Shuah Khan

On 4/9/21 2:00 PM, Shuah Khan wrote:

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be 
programmed

and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.

Unfortunatly, there is no documentation of since which generation
of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume
that the buggy platforms are less likely to be in used, and it should
be relatively safe to remove this legacy logic.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ 


Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---



Tested-by: Shuah Khan 



Revert + this patch - same as my test on Ryzen 5

On AMD Ryzen 7 4700G with Radeon Graphics

These look real odd to me. Let me know if I should look further.

sudo ./perf stat -e 'amd_iommu_0/cmd_processed/, 
amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, 
amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, 
amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, 
amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, 
amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, 
amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, 
amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, 
amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, 
amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, 
amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, 
amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10


 Performance counter stats for 'system wide':

17,761,952,514,865,374  amd_iommu_0/cmd_processed/ 
   (33.28%)
18,582,155,570,607,472   amd_iommu_0/cmd_processed_inv/ 
(33.32%)
 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 
(33.36%)
5,056,087,645,262,255   amd_iommu_0/int_dte_hit/ 
 (33.40%)
32,831,106,446,308,888   amd_iommu_0/int_dte_mis/ 
  (33.44%)
13,461,819,655,591,296   amd_iommu_0/mem_dte_hit/ 
  (33.45%)
208,555,436,221,050,464   amd_iommu_0/mem_dte_mis/ 
   (33.47%)
196,824,154,635,609,888   amd_iommu_0/mem_iommu_tlb_pde_hit/ 
 (33.46%)
193,552,630,440,410,144   amd_iommu_0/mem_iommu_tlb_pde_mis/ 
 (33.45%)
176,936,647,809,098,368   amd_iommu_0/mem_iommu_tlb_pte_hit/ 
 (33.41%)
184,737,401,623,626,464   amd_iommu_0/mem_iommu_tlb_pte_mis/ 
 (33.37%)
 0   amd_iommu_0/mem_pass_excl/ 
 (33.33%)
 0   amd_iommu_0/mem_pass_pretrans/ 
 (33.30%)
 0   amd_iommu_0/mem_pass_untrans/ 
(33.28%)
 0   amd_iommu_0/mem_target_abort/ 
(33.27%)
245,383,212,924,004,288   amd_iommu_0/mem_trans_total/ 
   (33.27%)
 0   amd_iommu_0/page_tbl_read_gst/ 
 (33.28%)
262,267,045,917,967,264   amd_iommu_0/page_tbl_read_nst/ 
 (33.27%)
256,308,216,913,137,600   amd_iommu_0/page_tbl_read_tot/ 
 (33.28%)
 0   amd_iommu_0/smi_blk/ 
   (33.27%)
 0   amd_iommu_0/smi_recv/ 
   (33.27%)
 0   amd_iommu_0/tlb_inv/ 
   (33.27%)
 0   amd_iommu_0/vapic_int_guest/ 
   (33.26%)
38,913,544,420,579,888   amd_iommu_0/vapic_int_non_guest/ 
  (33.27%)


  10.003967760 seconds time elapsed

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Shuah Khan

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be programmed
and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.

Unfortunatly, there is no documentation of since which generation
of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume
that the buggy platforms are less likely to be in used, and it should
be relatively safe to remove this legacy logic.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---



Tested-by: Shuah Khan 

thanks,
-- Shuah

Results with this patch on AMD Ryzen 5 PRO 2400GE w/ Radeon Vega
Graphics

sudo ./perf stat -e 'amd_iommu_0/cmd_processed/, 
amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, 
amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, 
amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, 
amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, 
amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, 
amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, 
amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, 
amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, 
amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, 
amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, 
amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10


 Performance counter stats for 'system wide':

   156  amd_iommu_0/cmd_processed/ 
(33.30%)
80   amd_iommu_0/cmd_processed_inv/ 
 (33.38%)
 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 
(33.40%)
 0   amd_iommu_0/int_dte_hit/ 
   (33.43%)
   325   amd_iommu_0/int_dte_mis/ 
   (33.44%)
 1,951   amd_iommu_0/mem_dte_hit/ 
   (33.45%)
 7,589   amd_iommu_0/mem_dte_mis/ 
   (33.49%)
   325   amd_iommu_0/mem_iommu_tlb_pde_hit/ 
 (33.45%)
 2,460   amd_iommu_0/mem_iommu_tlb_pde_mis/ 
 (33.41%)
 2,510   amd_iommu_0/mem_iommu_tlb_pte_hit/ 
 (33.38%)
 5,526   amd_iommu_0/mem_iommu_tlb_pte_mis/ 
 (33.33%)
 0   amd_iommu_0/mem_pass_excl/ 
 (33.29%)
 0   amd_iommu_0/mem_pass_pretrans/ 
 (33.28%)
 1,556   amd_iommu_0/mem_pass_untrans/ 
(33.27%)
 0   amd_iommu_0/mem_target_abort/ 
(33.26%)
 3,112   amd_iommu_0/mem_trans_total/ 
   (33.29%)
 0   amd_iommu_0/page_tbl_read_gst/ 
 (33.29%)
 1,813   amd_iommu_0/page_tbl_read_nst/ 
 (33.25%)
 2,242   amd_iommu_0/page_tbl_read_tot/ 
 (33.27%)
 0   amd_iommu_0/smi_blk/ 
   (33.29%)
 0   amd_iommu_0/smi_recv/ 
   (33.28%)
 0   amd_iommu_0/tlb_inv/ 
   (33.28%)
 0   amd_iommu_0/vapic_int_guest/ 
   (33.25%)
 0   amd_iommu_0/vapic_int_non_guest/ 
   (33.26%)


  10.003200316 seconds time elapsed

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu: io-pgtable: add DART pagetable format

2021-04-09 Thread Arnd Bergmann
On Fri, Apr 9, 2021 at 6:56 PM Sven Peter  wrote:
> On Wed, Apr 7, 2021, at 12:44, Will Deacon wrote:
> > On Sun, Mar 28, 2021 at 09:40:07AM +0200, Sven Peter wrote:
> >
> > > +   cfg->pgsize_bitmap &= SZ_16K;
> > > +   if (!cfg->pgsize_bitmap)
> > > +   return NULL;
> >
> > This is worrying (and again, I don't think this belongs here). How is this
> > thing supposed to work if the CPU is using 4k pages?
>
> This SoC is just full of fun surprises!
> I didn't even think about that case since I've always been using 16k pages so 
> far.
>
> I've checked again and wasn't able to find any way to configure the pagesize
> of the IOMMU. There seem to be variants of this IP in older iPhones which
> support a 4k pagesize but to the best of my knowledge this is hard wired
> and not configurable in software.
>
> When booting with 4k pages I hit the BUG_ON in iova.c that ensures that the
> iommu pagesize has to be <= the cpu page size.
>
> I see two options here and I'm not sure I like either of them:
>
> 1) Just don't support 4k CPU pages together with IOMMU translations and only
>allow full bypass mode there.
>This would however mean that PCIe (i.e. ethernet, usb ports on the Mac
>mini) and possibly Thunderbolt support would not be possible since these
>devices don't seem to like iommu bypass mode at all.

It should be possible to do a fake bypass mode by just programming a
static page table for as much address space as you can, and then
use swiotlb to address any memory beyond that. This won't perform
well because it requires bounce buffers for any high memory, but it
can be a last resort if a dart instance cannot do normal bypass mode.

> 2) I've had a brief discussion on IRC with Arnd about this [1] and he pointed
>out that the dma_map_sg API doesn't make any guarantees about the returned
>iovas and that it might be possible to make this work at least for devices
>that go through the normal DMA API.
>
>I've then replaced the page size check with a WARN_ON in iova.c just to see
>what happens. At least normal devices that go through the DMA API seem to
>work with my configuration. iommu_dma_alloc took the iommu_dma_alloc_remap
>path which was called with the cpu page size but then used
>domain->pgsize_bitmap to increase that to 16k. So this kinda works out, but
>there are other functions in dma-iommu.c that I believe rely on the fact 
> that
>the iommu can map single cpu pages. This feels very fragile right now and
>would probably require some rather invasive changes.

The other second-to-last resort here would be to duplicate the code from
the dma-iommu code and implement the dma-mapping API directly on
top of the dart hardware instead of the iommu layer. This would probably
be much faster than the swiotlb on top of a bypass or a linear map,
but it's a really awful abstraction that would require adding special cases
into a lot of generic code.

>Any driver that tries to use the iommu API directly could be trouble
>as well if they make similar assumptions.

I think pretty much all drivers using the iommu API directly already
depends on having a matching page size.  I don't see any way to use
e.g. PCI device assignment using vfio, or a GPU driver with per-process
contexts when the iotlb page size is larger than the CPU's.

>Is this something you would even want to support in the iommu subsytem
>and is it even possible to do this in a sane way?

I don't know how hard it is to do adjust the dma-iommu implementation
to allow this, but as long as we can work out the DT binding to support
both normal dma-iommu mode with 16KB pages and some kind of
passthrough mode (emulated or not) with 4KB pages, it can be left
as a possible optimization for later.

Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] swiotlb: Make SWIOTLB_NO_FORCE perform no allocation

2021-04-09 Thread Konrad Rzeszutek Wilk
On Thu, Apr 08, 2021 at 08:13:07PM -0700, Florian Fainelli wrote:
> 
> 
> On 3/24/2021 1:42 AM, Christoph Hellwig wrote:
> > On Mon, Mar 22, 2021 at 06:53:49PM -0700, Florian Fainelli wrote:
> >> When SWIOTLB_NO_FORCE is used, there should really be no allocations of
> >> default_nslabs to occur since we are not going to use those slabs. If a
> >> platform was somehow setting swiotlb_no_force and a later call to
> >> swiotlb_init() was to be made we would still be proceeding with
> >> allocating the default SWIOTLB size (64MB), whereas if swiotlb=noforce
> >> was set on the kernel command line we would have only allocated 2KB.
> >>
> >> This would be inconsistent and the point of initializing default_nslabs
> >> to 1, was intended to allocate the minimum amount of memory possible, so
> >> simply remove that minimal allocation period.
> >>
> >> Signed-off-by: Florian Fainelli 
> > 
> > Looks good,
> > 
> > Reviewed-by: Christoph Hellwig 
> > 
> 
> Thanks! Konrad, can you apply this patch to your for-linus-5.13 branch
> if you are also happy with it?

It should be now visible?
> -- 
> Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-09 Thread Jacob Pan
Hi Lu,

On Fri, 9 Apr 2021 20:45:22 +0800, Lu Baolu 
wrote:

> > -int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t
> > max) +int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max)
> >   {
> > int ret = 0;
> > ioasid_t pasid;
> > +   struct mm_struct *mm;
> >   
> > if (min == INVALID_IOASID || max == INVALID_IOASID ||
> > min == 0 || max < min)
> > return -EINVAL;
> >   
> > mutex_lock(_sva_lock);
> > +   mm = get_task_mm(current);  
> 
> How could we allocate a supervisor PASID through iommu_sva_alloc_pasid()
> if we always use current->mm here?
I don't think you can. But I guess the current callers of this function do
not need supervisor PASID.
In reply to Jean, I suggest we split this function into mm->pasid
assignment and keep using ioasid_alloc() directly, then supervisor PASID is
caller's bind choice.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-09 Thread Jacob Pan
Hi Jean-Philippe,

On Fri, 9 Apr 2021 12:11:47 +0200, Jean-Philippe Brucker
 wrote:

> On Thu, Apr 08, 2021 at 10:08:56AM -0700, Jacob Pan wrote:
> > diff --git a/drivers/iommu/iommu-sva-lib.c
> > b/drivers/iommu/iommu-sva-lib.c index bd41405..bd99f6b 100644
> > --- a/drivers/iommu/iommu-sva-lib.c
> > +++ b/drivers/iommu/iommu-sva-lib.c
> > @@ -12,27 +12,33 @@ static DECLARE_IOASID_SET(iommu_sva_pasid);
> >  
> >  /**
> >   * iommu_sva_alloc_pasid - Allocate a PASID for the mm
> > - * @mm: the mm
> >   * @min: minimum PASID value (inclusive)
> >   * @max: maximum PASID value (inclusive)
> >   *
> > - * Try to allocate a PASID for this mm, or take a reference to the
> > existing one
> > - * provided it fits within the [@min, @max] range. On success the
> > PASID is
> > - * available in mm->pasid, and must be released with
> > iommu_sva_free_pasid().
> > + * Try to allocate a PASID for the current mm, or take a reference to
> > the
> > + * existing one provided it fits within the [@min, @max] range. On
> > success
> > + * the PASID is available in the current mm->pasid, and must be
> > released with
> > + * iommu_sva_free_pasid().
> >   * @min must be greater than 0, because 0 indicates an unused
> > mm->pasid. *
> >   * Returns 0 on success and < 0 on error.
> >   */
> > -int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t
> > max) +int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max)
> >  {
> > int ret = 0;
> > ioasid_t pasid;
> > +   struct mm_struct *mm;
> >  
> > if (min == INVALID_IOASID || max == INVALID_IOASID ||
> > min == 0 || max < min)
> > return -EINVAL;
> >  
> > mutex_lock(_sva_lock);
> > +   mm = get_task_mm(current);
> > +   if (!mm) {
> > +   ret = -EINVAL;
> > +   goto out_unlock;
> > +   }  
> 
> I still think it would be more elegant to keep the choice of context in
> iommu_sva_bind_device() and pass it down to leaf functions such as
> iommu_sva_alloc_pasid(). The patch is trying to solve two separate

I agree if iommu_sva_alloc_pasid() is a leaf function, but it is a public
function, e.g. called by smmu code:
/* Allocate a PASID for this mm if necessary */
ret = iommu_sva_alloc_pasid(1, (1U << master->ssid_bits) - 1);
If we give mm as parameter, it will give callers the illusion that this
mm doesn't have to be current->mm.

Should we make it into a leaf function by splitting iommu_sva_alloc_pasid()
into two parts?
1. iommu_sva_assign_pasid() //a new leaf helper function does mm->pasid
assignment
2. ioasid_alloc()

in iommu_sva_bind_device(), we do:
1. handle = driver ops->sva_bind(dev, mm, flags);
2. pasid = sva_get_pasid(handle);
3. iommu_sva_assign_pasid(mm, pasid)

In vendor driver sva_bind(), it just use ioasid_alloc directly with custom
range. e.g. arm-smmu-v3-sva.c
- ret = iommu_sva_alloc_pasid(1, (1U << master->ssid_bits) - 1);
+ ret = ioasid_alloc(_sva_pasid, 1, (1U << master->ssid_bits);
   
> problems:
> 
> * We don't have a use-case for binding the mm of a remote process (and
>   it's supposedly difficult for device drivers to do it securely). So OK,
>   we remove the mm argument from iommu_sva_bind_device() and use the
>   current mm. But the IOMMU driver isn't going to do get_task_mm(current)
>   every time it needs the mm being bound, it will take it from
>   iommu_sva_bind_device(). Likewise iommu_sva_alloc_pasid() shouldn't need
>   to bother with get_task_mm().
> 
> * cgroup accounting for IOASIDs needs to be on the current task. Removing
>   the mm parameter from iommu_sva_alloc_pasid() doesn't help with that.
>   Sure it indicates that iommu_sva_alloc_pasid() needs a specific task
>   context but that's only for cgroup purpose, and I'd rather pass the
>   cgroup down from iommu_sva_bind_device() anyway (but am fine with
>   keeping it within ioasid_alloc() for now). Plus it's an internal helper,
>   easy for us to check that the callers are doing the right thing.
> 
With the above split, we really just have one allocation function:
ioasid_alloc(), so it can manage current cgroup accounting within. Would
this work?

> > if (mm->pasid) {
> > if (mm->pasid >= min && mm->pasid <= max)
> > ioasid_get(mm->pasid);
> > @@ -45,22 +51,32 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm,
> > ioasid_t min, ioasid_t max) else
> > mm->pasid = pasid;
> > }
> > +   mmput(mm);
> > +out_unlock:
> > mutex_unlock(_sva_lock);
> > return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
> >  
> >  /**
> > - * iommu_sva_free_pasid - Release the mm's PASID
> > + * iommu_sva_free_pasid - Release the current mm's PASID
> >   * @mm: the mm
> >   *
> >   * Drop one reference to a PASID allocated with iommu_sva_alloc_pasid()
> >   */
> > -void iommu_sva_free_pasid(struct mm_struct *mm)
> > +void iommu_sva_free_pasid(void)
> >  {
> > +   struct mm_struct *mm;
> > +
> > mutex_lock(_sva_lock);
> > +  

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Shuah Khan

On 4/9/21 10:37 AM, Shuah Khan wrote:

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be 
programmed

and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.


unnecessarily



Unfortunatly, there is no documentation of since which generation


Unfortunately,

Rephrase suggestion:
Unfortunately, it is unclear when the PMC HW bug fixed.


of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume


Based


that the buggy platforms are less likely to be in used, and it should


in use


be relatively safe to remove this legacy logic.




Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ 


Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd/init.c | 24 +---
  1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 648cdfd03074..247cdda5d683 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)

  return 0;
  }
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 
cntr,

-    u8 fxn, u64 *value, bool is_write);
-
  static void init_iommu_perf_ctr(struct amd_iommu *iommu)
  {
+    u64 val;
  struct pci_dev *pdev = iommu->dev;
-    u64 val = 0xabcd, val2 = 0, save_reg = 0;


Why not leave this u64 val here? Having the pdev assignment as the
first line makes it easier to read/follow.


  if (!iommu_feature(iommu, FEATURE_PC))
  return;
  amd_iommu_pc_present = true;
-    /* save the value to restore, if writable */
-    if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
-    goto pc_false;
-
-    /* Check if the performance counters can be written to */
-    if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-    (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-    (val != val2))


Aha - this is going away anyway. Please ignore my comment on 1/2
about parenthesis around (val != val2) being unnecessary.


-    goto pc_false;
-
-    /* restore */
-    if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
-    goto pc_false;
-
  pci_info(pdev, "IOMMU performance counters supported\n");
  val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
@@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct 
amd_iommu *iommu)

  iommu->max_counters = (u8) ((val >> 7) & 0xf);
  return;
-
-pc_false:
-    pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
-    amd_iommu_pc_present = false;
-    return;
  }
  static ssize_t amd_iommu_show_cap(struct device *dev,





thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] Revert "iommu/amd: Fix performance counter initialization"

2021-04-09 Thread Shuah Khan

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

From: Paul Menzel 

This reverts commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b.

The original commit tries to address an issue, where PMC power-gating
causing the IOMMU PMC pre-init test to fail on certain desktop/mobile
platforms where the power-gating is normally enabled.

There have been several reports that the workaround still does not
guarantee to work, and can add up to 100 ms (on the worst case)
to the boot process on certain platforms such as the MSI B350M MORTAR
with AMD Ryzen 3 2200G.

Therefore, revert this commit as a prelude to removing the pre-init
test.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Signed-off-by: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
Note: I have revised the commit message to add more detail
   and remove uncessary information.

  drivers/iommu/amd/init.c | 45 ++--
  1 file changed, 11 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 321f5906e6ed..648cdfd03074 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -12,7 +12,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
@@ -257,8 +256,6 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
  static int amd_iommu_enable_interrupts(void);
  static int __init iommu_go_to_state(enum iommu_init_state state);
  static void init_device_table_dma(void);
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
-   u8 fxn, u64 *value, bool is_write);
  
  static bool amd_iommu_pre_enabled = true;
  
@@ -1717,11 +1714,13 @@ static int __init init_iommu_all(struct acpi_table_header *table)

return 0;
  }
  
-static void __init init_iommu_perf_ctr(struct amd_iommu *iommu)

+static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
+   u8 fxn, u64 *value, bool is_write);
+
+static void init_iommu_perf_ctr(struct amd_iommu *iommu)
  {
-   int retry;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg, save_src;
+   u64 val = 0xabcd, val2 = 0, save_reg = 0;
  
  	if (!iommu_feature(iommu, FEATURE_PC))

return;
@@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu 
*iommu)
amd_iommu_pc_present = true;
  
  	/* save the value to restore, if writable */

-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, false))
-   goto pc_false;
-
-   /*
-* Disable power gating by programing the performance counter
-* source to 20 (i.e. counts the reads and writes from/to IOMMU
-* Reserved Register [MMIO Offset 1FF8h] that are ignored.),
-* which never get incremented during this init phase.
-* (Note: The event is also deprecated.)
-*/
-   val = 20;
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 8, , true))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
goto pc_false;
  
  	/* Check if the performance counters can be written to */

-   val = 0xabcd;
-   for (retry = 5; retry; retry--) {
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, , true) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 0, , false) ||
-   val2)
-   break;
-
-   /* Wait about 20 msec for power gating to disable and retry. */
-   msleep(20);
-   }
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, true))
+   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
+   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
+   (val != val2))


Probably don't need parentheses around 'val != val2'


goto pc_false;
  
-	if (val != val2)

+   /* restore */
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
goto pc_false;
  
  	pci_info(pdev, "IOMMU performance counters supported\n");




thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: dart: fix call_kern.cocci warnings

2021-04-09 Thread Sven Peter via iommu
On Sun, Apr 4, 2021, at 17:26, Julia Lawall wrote:
> From: kernel test robot 
> 
> Function apple_dart_attach_stream called on line 519 inside
> lock on line 509 but uses GFP_KERNEL

Thanks! Fixed for v3.


Best,


Sven
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu: io-pgtable: add DART pagetable format

2021-04-09 Thread Sven Peter via iommu



On Wed, Apr 7, 2021, at 12:44, Will Deacon wrote:
> On Sun, Mar 28, 2021 at 09:40:07AM +0200, Sven Peter wrote:
[...]
> >  
> > +static struct io_pgtable *
> > +apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> > +{
> > +   struct arm_lpae_io_pgtable *data;
> > +
> > +   if (cfg->ias > 36)
> > +   return NULL;
> > +   if (cfg->oas > 36)
> > +   return NULL;
> > +
> > +   if (!cfg->coherent_walk)
> > +   return NULL;
> 
> This all feels like IOMMU-specific limitations leaking into the page-table
> code here; it doesn't feel so unlikely that future implementations of this
> IP might have greater addressing capabilities, for example, and so I don't
> see why the page-table code needs to police this.

That's true, this really doesn't belong here.
I'll fix it for the next version and make sure to keep iommu-specific
limitations inside the driver itself.


> 
> > +   cfg->pgsize_bitmap &= SZ_16K;
> > +   if (!cfg->pgsize_bitmap)
> > +   return NULL;
> 
> This is worrying (and again, I don't think this belongs here). How is this
> thing supposed to work if the CPU is using 4k pages?

This SoC is just full of fun surprises!
I didn't even think about that case since I've always been using 16k pages so 
far.

I've checked again and wasn't able to find any way to configure the pagesize
of the IOMMU. There seem to be variants of this IP in older iPhones which
support a 4k pagesize but to the best of my knowledge this is hard wired
and not configurable in software.

When booting with 4k pages I hit the BUG_ON in iova.c that ensures that the
iommu pagesize has to be <= the cpu page size.

I see two options here and I'm not sure I like either of them:

1) Just don't support 4k CPU pages together with IOMMU translations and only
   allow full bypass mode there.
   This would however mean that PCIe (i.e. ethernet, usb ports on the Mac
   mini) and possibly Thunderbolt support would not be possible since these
   devices don't seem to like iommu bypass mode at all.

2) I've had a brief discussion on IRC with Arnd about this [1] and he pointed
   out that the dma_map_sg API doesn't make any guarantees about the returned
   iovas and that it might be possible to make this work at least for devices
   that go through the normal DMA API.

   I've then replaced the page size check with a WARN_ON in iova.c just to see
   what happens. At least normal devices that go through the DMA API seem to
   work with my configuration. iommu_dma_alloc took the iommu_dma_alloc_remap
   path which was called with the cpu page size but then used
   domain->pgsize_bitmap to increase that to 16k. So this kinda works out, but
   there are other functions in dma-iommu.c that I believe rely on the fact that
   the iommu can map single cpu pages. This feels very fragile right now and
   would probably require some rather invasive changes.

   Any driver that tries to use the iommu API directly could be trouble
   as well if they make similar assumptions.

   Is this something you would even want to support in the iommu subsytem
   and is it even possible to do this in a sane way?


Best,


Sven


[1] https://freenode.irclog.whitequark.org/asahi/2021-04-07#29609786;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu: dart: Add DART iommu driver

2021-04-09 Thread Sven Peter via iommu



On Wed, Apr 7, 2021, at 12:42, Will Deacon wrote:
> On Sun, Mar 28, 2021 at 09:40:09AM +0200, Sven Peter wrote:
> > Apple's new SoCs use iommus for almost all peripherals. These Device
> > Address Resolution Tables must be setup before these peripherals can
> > act as DMA masters.
> > 
> > Signed-off-by: Sven Peter 
> > ---
> >  MAINTAINERS  |   1 +
> >  drivers/iommu/Kconfig|  14 +
> >  drivers/iommu/Makefile   |   1 +
> >  drivers/iommu/apple-dart-iommu.c | 858 +++
> >  4 files changed, 874 insertions(+)
> >  create mode 100644 drivers/iommu/apple-dart-iommu.c
> 
> [...]
> 
> > +/* must be called with held domain->lock */
> > +static int apple_dart_attach_stream(struct apple_dart_domain *domain,
> > +   struct apple_dart *dart, u32 sid)
> > +{
> > +   unsigned long flags;
> > +   struct apple_dart_stream *stream;
> > +   struct io_pgtable_cfg *pgtbl_cfg;
> > +   int ret;
> > +
> > +   list_for_each_entry(stream, >streams, stream_head) {
> > +   if (stream->dart == dart && stream->sid == sid) {
> > +   stream->num_devices++;
> > +   return 0;
> > +   }
> > +   }
> > +
> > +   spin_lock_irqsave(>lock, flags);
> > +
> > +   if (WARN_ON(dart->used_sids & BIT(sid))) {
> > +   ret = -EINVAL;
> > +   goto error;
> > +   }
> > +
> > +   stream = kzalloc(sizeof(*stream), GFP_KERNEL);
> > +   if (!stream) {
> > +   ret = -ENOMEM;
> > +   goto error;
> > +   }
> 
> Just in case you missed it, a cocci bot noticed that you're using GFP_KERNEL
> to allocate while holding a spinlock here:
> 
> https://lore.kernel.org/r/alpine.DEB.2.22.394.2104041724340.2958@hadrien
> 

Thanks for the reminder!
I haven't replied yet because that one was found later when the bot picked up
a (slightly earlier) version that Marc was using to bring up pcie I believe.
I'll fix it for the next version.


Sven
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Shuah Khan

On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote:

In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be programmed
and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.


unnecessarily



Unfortunatly, there is no documentation of since which generation


Unfortunately,

Rephrase suggestion:
Unfortunately, it is unclear when the PMC HW bug fixed.


of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume


Based


that the buggy platforms are less likely to be in used, and it should


in use


be relatively safe to remove this legacy logic.




Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd/init.c | 24 +---
  1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 648cdfd03074..247cdda5d683 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
  }
  
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,

-   u8 fxn, u64 *value, bool is_write);
-
  static void init_iommu_perf_ctr(struct amd_iommu *iommu)
  {
+   u64 val;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg = 0;
  
  	if (!iommu_feature(iommu, FEATURE_PC))

return;
  
  	amd_iommu_pc_present = true;
  
-	/* save the value to restore, if writable */

-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
-   goto pc_false;
-
-   /* Check if the performance counters can be written to */
-   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2))
-   goto pc_false;
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
-   goto pc_false;
-
pci_info(pdev, "IOMMU performance counters supported\n");
  
  	val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);

@@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu)
iommu->max_counters = (u8) ((val >> 7) & 0xf);
  
  	return;

-
-pc_false:
-   pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
-   amd_iommu_pc_present = false;
-   return;
  }
  
  static ssize_t amd_iommu_show_cap(struct device *dev,




I will test this patch and the revert on my two AMD systems and update
you in a day or two.

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA

2021-04-09 Thread David Hildenbrand

On 09.04.21 15:35, Arnd Bergmann wrote:

On Fri, Apr 9, 2021 at 1:21 PM David Hildenbrand  wrote:


Random drivers should not override a user configuration of core knobs
(e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA,
which depends on CMA, if possible; however, these drivers also have to
tolerate if DMA_CMA is not available/functioning, for example, if no CMA
area for DMA_CMA use has been setup via "cma=X". In the worst case, the
driver cannot do it's job properly in some configurations.

For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and DMA_CMA if
available") documents
 While this is no build dependency, etnaviv will only work correctly
 on most systems if CMA and DMA_CMA are enabled. Select both options
 if available to avoid users ending up with a non-working GPU due to
 a lacking kernel config.
So etnaviv really wants to have DMA_CMA, however, can deal with some cases
where it is not available.

Let's introduce WANT_DMA_CMA and use it in most cases where drivers
select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because
of recursive dependency issues).

We'll assume that any driver that selects DRM_GEM_CMA_HELPER or
DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible.

With this change, distributions can disable CONFIG_CMA or
CONFIG_DMA_CMA, without it silently getting enabled again by random
drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA
and CONFIG_DMA_CMA if they are unspecified and any driver is around that
selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or
DRM_KMS_CMA_HELPER.

For example, if any driver selects WANT_DMA_CMA and we do a
"make olddefconfig":

1. With "# CONFIG_CMA is not set" and no specification of
"CONFIG_DMA_CMA"

-> CONFIG_DMA_CMA won't be part of .config

2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA

Contiguous Memory Allocator (CMA) [Y/n/?] (NEW)
DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW)

3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set"

-> CONFIG_DMA_CMA will be removed from .config

Note: drivers/remoteproc seems to be special; commit c51e882cd711
("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains that
there is a real dependency to DMA_CMA for it to work; leave that dependency
in place and don't convert it to a soft dependency.


I don't think this dependency is fundamentally different from the others,
though davinci machines tend to have less memory than a lot of the
other machines, so it's more likely to fail without CMA.



I was also unsure - and Lucas had similar thoughts. If you want, I can 
send a v4 also taking care of this.


Thanks!


Regardless of this,

Reviewed-by: Arnd Bergmann 




--
Thanks,

David / dhildenb

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA

2021-04-09 Thread Arnd Bergmann
On Fri, Apr 9, 2021 at 1:21 PM David Hildenbrand  wrote:
>
> Random drivers should not override a user configuration of core knobs
> (e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA,
> which depends on CMA, if possible; however, these drivers also have to
> tolerate if DMA_CMA is not available/functioning, for example, if no CMA
> area for DMA_CMA use has been setup via "cma=X". In the worst case, the
> driver cannot do it's job properly in some configurations.
>
> For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and DMA_CMA if
> available") documents
> While this is no build dependency, etnaviv will only work correctly
> on most systems if CMA and DMA_CMA are enabled. Select both options
> if available to avoid users ending up with a non-working GPU due to
> a lacking kernel config.
> So etnaviv really wants to have DMA_CMA, however, can deal with some cases
> where it is not available.
>
> Let's introduce WANT_DMA_CMA and use it in most cases where drivers
> select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because
> of recursive dependency issues).
>
> We'll assume that any driver that selects DRM_GEM_CMA_HELPER or
> DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible.
>
> With this change, distributions can disable CONFIG_CMA or
> CONFIG_DMA_CMA, without it silently getting enabled again by random
> drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA
> and CONFIG_DMA_CMA if they are unspecified and any driver is around that
> selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or
> DRM_KMS_CMA_HELPER.
>
> For example, if any driver selects WANT_DMA_CMA and we do a
> "make olddefconfig":
>
> 1. With "# CONFIG_CMA is not set" and no specification of
>"CONFIG_DMA_CMA"
>
> -> CONFIG_DMA_CMA won't be part of .config
>
> 2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA
>
> Contiguous Memory Allocator (CMA) [Y/n/?] (NEW)
> DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW)
>
> 3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set"
>
> -> CONFIG_DMA_CMA will be removed from .config
>
> Note: drivers/remoteproc seems to be special; commit c51e882cd711
> ("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains that
> there is a real dependency to DMA_CMA for it to work; leave that dependency
> in place and don't convert it to a soft dependency.

I don't think this dependency is fundamentally different from the others,
though davinci machines tend to have less memory than a lot of the
other machines, so it's more likely to fail without CMA.

Regardless of this,

Reviewed-by: Arnd Bergmann 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 1/2] iommu/tegra-smmu: Defer attachment of display clients

2021-04-09 Thread Dmitry Osipenko
08.04.2021 17:07, Dmitry Osipenko пишет:
>> Whatever happened to the idea of creating identity mappings based on the
>> obscure tegra_fb_mem (or whatever it was called) command-line option? Is
>> that command-line not universally passed to the kernel from bootloaders
>> that initialize display?
> This is still a good idea! The command-line isn't universally passed
> just because it's up to a user to override the cmdline and then we get a
> hang (a very slow boot in reality) on T30 since display client takes out
> the whole memory bus with the constant SMMU faults. For example I don't
> have that cmdline option in my current setups.
> 

Thinking a bit more about the identity.. have you consulted with the
memory h/w people about whether it's always safe to enable ASID in a
middle of DMA request?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-09 Thread Lu Baolu

Hi,

On 2021/4/9 1:08, Jacob Pan wrote:

  /**
   * iommu_sva_alloc_pasid - Allocate a PASID for the mm
- * @mm: the mm
   * @min: minimum PASID value (inclusive)
   * @max: maximum PASID value (inclusive)
   *
- * Try to allocate a PASID for this mm, or take a reference to the existing one
- * provided it fits within the [@min, @max] range. On success the PASID is
- * available in mm->pasid, and must be released with iommu_sva_free_pasid().
+ * Try to allocate a PASID for the current mm, or take a reference to the
+ * existing one provided it fits within the [@min, @max] range. On success
+ * the PASID is available in the current mm->pasid, and must be released with
+ * iommu_sva_free_pasid().
   * @min must be greater than 0, because 0 indicates an unused mm->pasid.
   *
   * Returns 0 on success and < 0 on error.
   */
-int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
+int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max)
  {
int ret = 0;
ioasid_t pasid;
+   struct mm_struct *mm;
  
  	if (min == INVALID_IOASID || max == INVALID_IOASID ||

min == 0 || max < min)
return -EINVAL;
  
  	mutex_lock(_sva_lock);

+   mm = get_task_mm(current);


How could we allocate a supervisor PASID through iommu_sva_alloc_pasid()
if we always use current->mm here?


+   if (!mm) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
if (mm->pasid) {
if (mm->pasid >= min && mm->pasid <= max)
ioasid_get(mm->pasid);
@@ -45,22 +51,32 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
min, ioasid_t max)
else
mm->pasid = pasid;
}
+   mmput(mm);
+out_unlock:
mutex_unlock(_sva_lock);
return ret;
  }
  EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA

2021-04-09 Thread Linus Walleij
On Fri, Apr 9, 2021 at 1:20 PM David Hildenbrand  wrote:

> Random drivers should not override a user configuration of core knobs
> (e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA,
> which depends on CMA, if possible; however, these drivers also have to
> tolerate if DMA_CMA is not available/functioning, for example, if no CMA
> area for DMA_CMA use has been setup via "cma=X". In the worst case, the
> driver cannot do it's job properly in some configurations.

Looks good to me. At least a lot better than what we have.
Reviewed-by: Linus Walleij 

> Let's see if this approach is better for soft dependencies (and if we
> actually have some hard dependencies in there). This is the follow-up
> of
>   https://lkml.kernel.org/r/20210408092011.52763-1-da...@redhat.com
>   https://lkml.kernel.org/r/20210408100523.63356-1-da...@redhat.com

You can just add these to the commit message with Link:
when applying so people can easily find the discussion from the
commit.

> I was wondering if it would make sense in some drivers to warn if either
> CONFIG_DMA_CMA is not available or if DRM_CMA has not been configured
> properly - just to give people a heads up that something might more likely
> go wrong; that would, however, be future work.

I think the frameworks  (DRM_*_CMA_HELPER)
should pr_info something about it so the individual drivers
don't have to sanity check their entire world.

Yours,
Linus Walleij
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA

2021-04-09 Thread Lucas Stach
Am Freitag, dem 09.04.2021 um 13:20 +0200 schrieb David Hildenbrand:
> Random drivers should not override a user configuration of core knobs
> (e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA,
> which depends on CMA, if possible; however, these drivers also have to
> tolerate if DMA_CMA is not available/functioning, for example, if no CMA
> area for DMA_CMA use has been setup via "cma=X". In the worst case, the
> driver cannot do it's job properly in some configurations.
> 
> For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and DMA_CMA if
> available") documents
>   While this is no build dependency, etnaviv will only work correctly
>   on most systems if CMA and DMA_CMA are enabled. Select both options
>   if available to avoid users ending up with a non-working GPU due to
>   a lacking kernel config.
> So etnaviv really wants to have DMA_CMA, however, can deal with some cases
> where it is not available.
> 
> Let's introduce WANT_DMA_CMA and use it in most cases where drivers
> select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because
> of recursive dependency issues).
> 
> We'll assume that any driver that selects DRM_GEM_CMA_HELPER or
> DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible.
> 
> With this change, distributions can disable CONFIG_CMA or
> CONFIG_DMA_CMA, without it silently getting enabled again by random
> drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA
> and CONFIG_DMA_CMA if they are unspecified and any driver is around that
> selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or
> DRM_KMS_CMA_HELPER.
> 
> For example, if any driver selects WANT_DMA_CMA and we do a
> "make olddefconfig":
> 
> 1. With "# CONFIG_CMA is not set" and no specification of
>    "CONFIG_DMA_CMA"
> 
> -> CONFIG_DMA_CMA won't be part of .config
> 
> 2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA
> 
> Contiguous Memory Allocator (CMA) [Y/n/?] (NEW)
> DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW)
> 
> 3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set"
> 
> -> CONFIG_DMA_CMA will be removed from .config
> 
> Note: drivers/remoteproc seems to be special; commit c51e882cd711
> ("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains that
> there is a real dependency to DMA_CMA for it to work; leave that dependency
> in place and don't convert it to a soft dependency.

Hm, to me this sounds much like the reasoning for the etnaviv
dependency. There is no actual build dependency, as the allocations are
done through the DMA API, but for the allocations to succeed you most
likely want CMA to be enabled. But that's just an observation from the
outside, I have no real clue about the remoteproc drivers.

As far as the etnaviv changes are concerned:
Acked-by: Lucas Stach 

> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: Thomas Zimmermann 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Joel Stanley 
> Cc: Andrew Jeffery 
> Cc: Lucas Stach 
> Cc: Russell King 
> Cc: Christian Gmeiner 
> Cc: Paul Cercueil 
> Cc: Linus Walleij 
> Cc: Christoph Hellwig 
> Cc: Marek Szyprowski 
> Cc: Robin Murphy 
> Cc: Andrew Morton 
> Cc: Mike Rapoport 
> Cc: Arnd Bergmann 
> Cc: Bartlomiej Zolnierkiewicz 
> Cc: Eric Anholt 
> Cc: Michal Simek 
> Cc: Masahiro Yamada 
> Cc: "Alexander A. Klimov" 
> Cc: Peter Collingbourne 
> Cc: Suman Anna 
> Cc: Jason Gunthorpe 
> Cc: dri-de...@lists.freedesktop.org
> Cc: linux-asp...@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: etna...@lists.freedesktop.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-fb...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Signed-off-by: David Hildenbrand 
> ---
> 
> Let's see if this approach is better for soft dependencies (and if we
> actually have some hard dependencies in there). This is the follow-up
> of
>   https://lkml.kernel.org/r/20210408092011.52763-1-da...@redhat.com
>   https://lkml.kernel.org/r/20210408100523.63356-1-da...@redhat.com
> 
> I was wondering if it would make sense in some drivers to warn if either
> CONFIG_DMA_CMA is not available or if DRM_CMA has not been configured
> properly - just to give people a heads up that something might more likely
> go wrong; that would, however, be future work.
> 
> v2 -> v3:
> - Don't use "imply" but instead use a new WANT_DMA_CMA and make the default
>   of CMA and DMA_CMA depend on it.
> - Also adjust ingenic, mcde, tve200; these sound like soft dependencies as
>   well (although DMA_CMA is really desired)
> 
> v1 -> v2:
> - Fix DRM_CMA -> DMA_CMA
> 
> ---
>  drivers/gpu/drm/Kconfig | 2 ++
>  drivers/gpu/drm/aspeed/Kconfig  | 2 --
>  drivers/gpu/drm/etnaviv/Kconfig | 3 +--
>  drivers/gpu/drm/ingenic/Kconfig | 1 -
>  drivers/gpu/drm/mcde/Kconfig| 1 -
>  drivers/gpu/drm/tve200/Kconfig  | 1 -
>  drivers/video/fbdev/Kconfig | 2 +-
>  kernel/dma/Kconfig  | 7 +++
>  mm/Kconfig  | 1 +
>  9 

Re: [PATCH] iommu: exynos: remove unneeded local variable initialization

2021-04-09 Thread Marek Szyprowski
On 08.04.2021 22:16, Krzysztof Kozlowski wrote:
> The initialization of 'fault_addr' local variable is not needed as it is
> shortly after overwritten.
>
> Addresses-Coverity: Unused value
> Signed-off-by: Krzysztof Kozlowski 

Acked-by: Marek Szyprowski 

> ---
>   drivers/iommu/exynos-iommu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index de324b4eedfe..8fa9a591fb96 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -407,7 +407,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
> *dev_id)
>   struct sysmmu_drvdata *data = dev_id;
>   const struct sysmmu_fault_info *finfo;
>   unsigned int i, n, itype;
> - sysmmu_iova_t fault_addr = -1;
> + sysmmu_iova_t fault_addr;
>   unsigned short reg_status, reg_clear;
>   int ret = -ENOSYS;
>   

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-04-09 Thread Lu Baolu

Hi Jacob,

On 2021/4/9 1:08, Jacob Pan wrote:

The void* drvdata parameter isn't really used in iommu_sva_bind_device()
API, the current IDXD code "borrows" the drvdata for a VT-d private flag
for supervisor SVA usage.

Supervisor/Privileged mode request is a generic feature. It should be
promoted from the VT-d vendor driver to the generic code.

This patch replaces void* drvdata with a unsigned int flags parameter
and adjusts callers accordingly.

Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Jacob Pan 
---
  drivers/dma/idxd/cdev.c |  2 +-
  drivers/dma/idxd/init.c |  6 +++---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c |  2 +-
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  4 ++--
  drivers/iommu/intel/Kconfig |  1 +
  drivers/iommu/intel/svm.c   | 18 ++
  drivers/iommu/iommu.c   |  9 ++---
  drivers/misc/uacce/uacce.c  |  2 +-
  include/linux/intel-iommu.h |  2 +-
  include/linux/intel-svm.h   | 17 ++---
  include/linux/iommu.h   | 19 ---
  11 files changed, 40 insertions(+), 42 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 0db9b82..21ec82b 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
filp->private_data = ctx;
  
  	if (device_pasid_enabled(idxd)) {

-   sva = iommu_sva_bind_device(dev, current->mm, NULL);
+   sva = iommu_sva_bind_device(dev, current->mm, 0);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 085a0c3..cdc85f1 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -300,13 +300,13 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev)
  
  static int idxd_enable_system_pasid(struct idxd_device *idxd)

  {
-   int flags;
+   unsigned int flags;
unsigned int pasid;
struct iommu_sva *sva;
  
-	flags = SVM_FLAG_SUPERVISOR_MODE;

+   flags = IOMMU_SVA_BIND_SUPERVISOR;


With SVM_FLAG_SUPERVISOR_MODE removed, I guess

#include 

in this file could be removed as well.

  
-	sva = iommu_sva_bind_device(>pdev->dev, NULL, );

+   sva = iommu_sva_bind_device(>pdev->dev, NULL, flags);
if (IS_ERR(sva)) {
dev_warn(>pdev->dev,
 "iommu sva bind failed: %ld\n", PTR_ERR(sva));
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251ca..23e287e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -354,7 +354,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
  }
  
  struct iommu_sva *

-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
  {
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index f985817..b971d4d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -711,7 +711,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master 
*master);
  int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
  int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
  struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
-   void *drvdata);
+   unsigned int flags);
  void arm_smmu_sva_unbind(struct iommu_sva *handle);
  u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
  void arm_smmu_sva_notifier_synchronize(void);
@@ -742,7 +742,7 @@ static inline int arm_smmu_master_disable_sva(struct 
arm_smmu_master *master)
  }
  
  static inline struct iommu_sva *

-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
  {
return ERR_PTR(-ENODEV);
  }
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 28a3d15..5415052 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -41,6 +41,7 @@ config INTEL_IOMMU_SVM
select PCI_PRI
select MMU_NOTIFIER
select IOASID
+   select IOMMU_SVA_LIB


Why?


help
  Shared Virtual Memory (SVM) provides a facility for devices
  to access DMA 

[PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA

2021-04-09 Thread David Hildenbrand
Random drivers should not override a user configuration of core knobs
(e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA,
which depends on CMA, if possible; however, these drivers also have to
tolerate if DMA_CMA is not available/functioning, for example, if no CMA
area for DMA_CMA use has been setup via "cma=X". In the worst case, the
driver cannot do it's job properly in some configurations.

For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and DMA_CMA if
available") documents
While this is no build dependency, etnaviv will only work correctly
on most systems if CMA and DMA_CMA are enabled. Select both options
if available to avoid users ending up with a non-working GPU due to
a lacking kernel config.
So etnaviv really wants to have DMA_CMA, however, can deal with some cases
where it is not available.

Let's introduce WANT_DMA_CMA and use it in most cases where drivers
select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because
of recursive dependency issues).

We'll assume that any driver that selects DRM_GEM_CMA_HELPER or
DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible.

With this change, distributions can disable CONFIG_CMA or
CONFIG_DMA_CMA, without it silently getting enabled again by random
drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA
and CONFIG_DMA_CMA if they are unspecified and any driver is around that
selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or
DRM_KMS_CMA_HELPER.

For example, if any driver selects WANT_DMA_CMA and we do a
"make olddefconfig":

1. With "# CONFIG_CMA is not set" and no specification of
   "CONFIG_DMA_CMA"

-> CONFIG_DMA_CMA won't be part of .config

2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA

Contiguous Memory Allocator (CMA) [Y/n/?] (NEW)
DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW)

3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set"

-> CONFIG_DMA_CMA will be removed from .config

Note: drivers/remoteproc seems to be special; commit c51e882cd711
("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains that
there is a real dependency to DMA_CMA for it to work; leave that dependency
in place and don't convert it to a soft dependency.

Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Joel Stanley 
Cc: Andrew Jeffery 
Cc: Lucas Stach 
Cc: Russell King 
Cc: Christian Gmeiner 
Cc: Paul Cercueil 
Cc: Linus Walleij 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Arnd Bergmann 
Cc: Bartlomiej Zolnierkiewicz 
Cc: Eric Anholt 
Cc: Michal Simek 
Cc: Masahiro Yamada 
Cc: "Alexander A. Klimov" 
Cc: Peter Collingbourne 
Cc: Suman Anna 
Cc: Jason Gunthorpe 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-asp...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: etna...@lists.freedesktop.org
Cc: linux-m...@vger.kernel.org
Cc: linux-fb...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: David Hildenbrand 
---

Let's see if this approach is better for soft dependencies (and if we
actually have some hard dependencies in there). This is the follow-up
of
  https://lkml.kernel.org/r/20210408092011.52763-1-da...@redhat.com
  https://lkml.kernel.org/r/20210408100523.63356-1-da...@redhat.com

I was wondering if it would make sense in some drivers to warn if either
CONFIG_DMA_CMA is not available or if DRM_CMA has not been configured
properly - just to give people a heads up that something might more likely
go wrong; that would, however, be future work.

v2 -> v3:
- Don't use "imply" but instead use a new WANT_DMA_CMA and make the default
  of CMA and DMA_CMA depend on it.
- Also adjust ingenic, mcde, tve200; these sound like soft dependencies as
  well (although DMA_CMA is really desired)

v1 -> v2:
- Fix DRM_CMA -> DMA_CMA

---
 drivers/gpu/drm/Kconfig | 2 ++
 drivers/gpu/drm/aspeed/Kconfig  | 2 --
 drivers/gpu/drm/etnaviv/Kconfig | 3 +--
 drivers/gpu/drm/ingenic/Kconfig | 1 -
 drivers/gpu/drm/mcde/Kconfig| 1 -
 drivers/gpu/drm/tve200/Kconfig  | 1 -
 drivers/video/fbdev/Kconfig | 2 +-
 kernel/dma/Kconfig  | 7 +++
 mm/Kconfig  | 1 +
 9 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 85b79a7fee63..6f9989adfa93 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -201,12 +201,14 @@ config DRM_TTM_HELPER
 config DRM_GEM_CMA_HELPER
bool
depends on DRM
+   select WANT_DMA_CMA
help
  Choose this if you need the GEM CMA helper functions
 
 config DRM_KMS_CMA_HELPER
bool
depends on DRM
+   select WANT_DMA_CMA
select DRM_GEM_CMA_HELPER
help
  Choose this if you need the KMS CMA helper functions
diff --git a/drivers/gpu/drm/aspeed/Kconfig b/drivers/gpu/drm/aspeed/Kconfig

Re: [PATCH 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-04-09 Thread Jean-Philippe Brucker
On Thu, Apr 08, 2021 at 10:08:55AM -0700, Jacob Pan wrote:
> The void* drvdata parameter isn't really used in iommu_sva_bind_device()
> API,

Right, it used to be a cookie passed to the device driver in the exit_mm()
callback, but that went away with edcc40d2ab5f ("iommu: Remove
iommu_sva_ops::mm_exit()")

> the current IDXD code "borrows" the drvdata for a VT-d private flag
> for supervisor SVA usage.
> 
> Supervisor/Privileged mode request is a generic feature. It should be
> promoted from the VT-d vendor driver to the generic code.
> 
> This patch replaces void* drvdata with a unsigned int flags parameter
> and adjusts callers accordingly.

Thanks for cleaning this up. Making flags unsigned long seems more common
(I suggested int without thinking). But it doesn't matter much, we won't
get to 32 flags.

> 
> Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
> Suggested-by: Jean-Philippe Brucker 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/dma/idxd/cdev.c |  2 +-
>  drivers/dma/idxd/init.c |  6 +++---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c |  2 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  4 ++--
>  drivers/iommu/intel/Kconfig |  1 +
>  drivers/iommu/intel/svm.c   | 18 ++
>  drivers/iommu/iommu.c   |  9 ++---
>  drivers/misc/uacce/uacce.c  |  2 +-
>  include/linux/intel-iommu.h |  2 +-
>  include/linux/intel-svm.h   | 17 ++---
>  include/linux/iommu.h   | 19 ---
>  11 files changed, 40 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
> index 0db9b82..21ec82b 100644
> --- a/drivers/dma/idxd/cdev.c
> +++ b/drivers/dma/idxd/cdev.c
> @@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode, struct 
> file *filp)
>   filp->private_data = ctx;
>  
>   if (device_pasid_enabled(idxd)) {
> - sva = iommu_sva_bind_device(dev, current->mm, NULL);
> + sva = iommu_sva_bind_device(dev, current->mm, 0);
>   if (IS_ERR(sva)) {
>   rc = PTR_ERR(sva);
>   dev_err(dev, "pasid allocation failed: %d\n", rc);
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index 085a0c3..cdc85f1 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
> @@ -300,13 +300,13 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
> *pdev)
>  
>  static int idxd_enable_system_pasid(struct idxd_device *idxd)
>  {
> - int flags;
> + unsigned int flags;
>   unsigned int pasid;
>   struct iommu_sva *sva;
>  
> - flags = SVM_FLAG_SUPERVISOR_MODE;
> + flags = IOMMU_SVA_BIND_SUPERVISOR;
>  
> - sva = iommu_sva_bind_device(>pdev->dev, NULL, );
> + sva = iommu_sva_bind_device(>pdev->dev, NULL, flags);
>   if (IS_ERR(sva)) {
>   dev_warn(>pdev->dev,
>"iommu sva bind failed: %ld\n", PTR_ERR(sva));
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index bb251ca..23e287e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -354,7 +354,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
> *mm)
>  }
>  
>  struct iommu_sva *
> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
> +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int 
> flags)

Could you add a check on flags:

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..145ceb2fc5da 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -354,12 +354,15 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
 }

 struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
 {
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);

+   if (flags)
+   return ERR_PTR(-EINVAL);
+
if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
return ERR_PTR(-EINVAL);



>  {
>   struct iommu_sva *handle;
>   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index f985817..b971d4d 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -711,7 +711,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master 
> 

Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-09 Thread Jean-Philippe Brucker
On Thu, Apr 08, 2021 at 10:08:56AM -0700, Jacob Pan wrote:
> diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> index bd41405..bd99f6b 100644
> --- a/drivers/iommu/iommu-sva-lib.c
> +++ b/drivers/iommu/iommu-sva-lib.c
> @@ -12,27 +12,33 @@ static DECLARE_IOASID_SET(iommu_sva_pasid);
>  
>  /**
>   * iommu_sva_alloc_pasid - Allocate a PASID for the mm
> - * @mm: the mm
>   * @min: minimum PASID value (inclusive)
>   * @max: maximum PASID value (inclusive)
>   *
> - * Try to allocate a PASID for this mm, or take a reference to the existing 
> one
> - * provided it fits within the [@min, @max] range. On success the PASID is
> - * available in mm->pasid, and must be released with iommu_sva_free_pasid().
> + * Try to allocate a PASID for the current mm, or take a reference to the
> + * existing one provided it fits within the [@min, @max] range. On success
> + * the PASID is available in the current mm->pasid, and must be released with
> + * iommu_sva_free_pasid().
>   * @min must be greater than 0, because 0 indicates an unused mm->pasid.
>   *
>   * Returns 0 on success and < 0 on error.
>   */
> -int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
> +int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max)
>  {
>   int ret = 0;
>   ioasid_t pasid;
> + struct mm_struct *mm;
>  
>   if (min == INVALID_IOASID || max == INVALID_IOASID ||
>   min == 0 || max < min)
>   return -EINVAL;
>  
>   mutex_lock(_sva_lock);
> + mm = get_task_mm(current);
> + if (!mm) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }

I still think it would be more elegant to keep the choice of context in
iommu_sva_bind_device() and pass it down to leaf functions such as
iommu_sva_alloc_pasid(). The patch is trying to solve two separate
problems:

* We don't have a use-case for binding the mm of a remote process (and
  it's supposedly difficult for device drivers to do it securely). So OK,
  we remove the mm argument from iommu_sva_bind_device() and use the
  current mm. But the IOMMU driver isn't going to do get_task_mm(current)
  every time it needs the mm being bound, it will take it from
  iommu_sva_bind_device(). Likewise iommu_sva_alloc_pasid() shouldn't need
  to bother with get_task_mm().

* cgroup accounting for IOASIDs needs to be on the current task. Removing
  the mm parameter from iommu_sva_alloc_pasid() doesn't help with that.
  Sure it indicates that iommu_sva_alloc_pasid() needs a specific task
  context but that's only for cgroup purpose, and I'd rather pass the
  cgroup down from iommu_sva_bind_device() anyway (but am fine with
  keeping it within ioasid_alloc() for now). Plus it's an internal helper,
  easy for us to check that the callers are doing the right thing.

>   if (mm->pasid) {
>   if (mm->pasid >= min && mm->pasid <= max)
>   ioasid_get(mm->pasid);
> @@ -45,22 +51,32 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
> min, ioasid_t max)
>   else
>   mm->pasid = pasid;
>   }
> + mmput(mm);
> +out_unlock:
>   mutex_unlock(_sva_lock);
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
>  
>  /**
> - * iommu_sva_free_pasid - Release the mm's PASID
> + * iommu_sva_free_pasid - Release the current mm's PASID
>   * @mm: the mm
>   *
>   * Drop one reference to a PASID allocated with iommu_sva_alloc_pasid()
>   */
> -void iommu_sva_free_pasid(struct mm_struct *mm)
> +void iommu_sva_free_pasid(void)
>  {
> + struct mm_struct *mm;
> +
>   mutex_lock(_sva_lock);
> + mm = get_task_mm(current);
> + if (!mm)
> + goto out_unlock;
> +

More importantly, could we at least dissociate free_pasid() from the
current process?  Otherwise drivers can't clean up from a workqueue (as
amdkfd does) or from an rcu callback. Given that iommu_sva_unbind_device()
takes the SVA handle owned by whomever did bind(), there shouldn't be any
security issue. For the cgroup problem, ioasid.c could internally keep
track of the cgroup used during allocation rather than assuming the
context of ioasid_put() is the same as ioasid_get()

>   if (ioasid_put(mm->pasid))
>   mm->pasid = 0;
> + mmput(mm);
> +out_unlock:
>   mutex_unlock(_sva_lock);
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_free_pasid);
> diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
> index b40990a..278b8b4 100644
> --- a/drivers/iommu/iommu-sva-lib.h
> +++ b/drivers/iommu/iommu-sva-lib.h
> @@ -8,8 +8,8 @@
>  #include 
>  #include 
>  
> -int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);
> -void iommu_sva_free_pasid(struct mm_struct *mm);
> +int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max);
> +void iommu_sva_free_pasid(void);
>  struct mm_struct *iommu_sva_find(ioasid_t pasid);
>  
>  #endif /* _IOMMU_SVA_LIB_H */
> diff --git a/drivers/iommu/iommu.c 

RE: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node

2021-04-09 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 09 April 2021 10:51
> To: Shameerali Kolothum Thodi ;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org; de...@acpica.org
> Cc: Linuxarm ; steven.pr...@arm.com; Guohanjun
> (Hanjun Guo) ; sami.muja...@arm.com;
> robin.mur...@arm.com; wanghuiqiang 
> Subject: Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node
> 
> Hi Shameer,
> 
> On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> > RFC v1 --> v2:
> >  - Added a generic interface for IOMMU drivers to retrieve all the
> >    RMR info associated with a given IOMMU.
> >  - SMMUv3 driver gets the RMR list during probe() and installs
> >    bypass STEs for all the SIDs in the RMR list. This is to keep
> >    the ongoing traffic alive(if any) during SMMUv3 reset. This is
> >based on the suggestions received for v1 to take care of the
> >EFI framebuffer use case. Only sanity tested for now.
> >  - During the probe/attach device, SMMUv3 driver reserves any
> >    RMR region associated with the device such that there is a unity
> >    mapping for them in SMMU.
> > ---
> >
> > The series adds support to IORT RMR nodes specified in IORT
> > Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory
> > ranges that are used by endpoints and require a unity mapping
> > in SMMU.
> >
> > We have faced issues with 3408iMR RAID controller cards which
> > fail to boot when SMMU is enabled. This is because these controllers
> > make use of host memory for various caching related purposes and when
> > SMMU is enabled the iMR firmware fails to access these memory regions
> > as there is no mapping for them. IORT RMR provides a way for UEFI to
> > describe and report these memory regions so that the kernel can make
> > a unity mapping for these in SMMU.
> >
> > RFC because, Patch #1 is to update the actbl2.h and should be done
> > through acpica update. I have send out a pull request[1] for that.
> 
> What is the state of this series? I wondered if I should consider using
> it for nested SMMU to avoid handling nested binding for MSI, as
> suggested by Jean. Are there any blocker?

There were few issues with the revision E spec and those are now sorted 
with an updated E.b.

The ACPICA pull request for E.b is now merged[1] and the Linux header
update patch[2] is also out there(I think it is now queued for 5.13).

I will soon respin this series.

Thanks,
Shameer

1. https://github.com/acpica/acpica/pull/682
2. 
https://lore.kernel.org/linux-acpi/20210406213028.718796-22-erik.kan...@intel.com/

> 
> Thanks
> 
> Eric
> >
> > Tests:
> >
> > With a UEFI, that reports the RMR for the dev,
> > 
> > [16F0h 5872   1] Type : 06
> > [16F1h 5873   2]   Length : 007C
> > [16F3h 5875   1] Revision : 00
> > [1038h 0056   2] Reserved : 
> > [1038h 0056   2]   Identifier : 
> > [16F8h 5880   4]Mapping Count : 0001
> > [16FCh 5884   4]   Mapping Offset : 0040
> >
> > [1700h 5888   4]Number of RMR Descriptors : 0002
> > [1704h 5892   4]RMR Descriptor Offset : 0018
> >
> > [1708h 5896   8]  Base Address of RMR : E640
> > [1710h 5904   8]Length of RMR : 0010
> > [1718h 5912   4] Reserved : 
> >
> > [171Ch 5916   8]  Base Address of RMR : 27B0
> > [1724h 5924   8]Length of RMR : 00C0
> > [172Ch 5932   4] Reserved : 
> >
> > [1730h 5936   4]   Input base : 
> > [1734h 5940   4] ID Count : 0001
> > [1738h 5944   4]  Output Base : 0003
> > [173Ch 5948   4] Output Reference : 0064
> > [1740h 5952   4]Flags (decoded below) : 0001
> >Single Mapping : 1
> > ...
> >
> > Without the series the RAID controller initialization fails as
> > below,
> >
> > ...
> > [   12.631117] megaraid_sas :03:00.0: FW supports sync
> cache: Yes
> > [   12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is
> called outbound_intr_mask:0x4009
> > [   18.776377] megaraid_sas :03:00.0: Init cmd return status FAILED
> for SCSI host 0
> > [   23.019383] megaraid_sas :03:00.0: Waiting for FW to come to
> ready state
> > [  106.684281] megaraid_sas :03:00.0: FW in FAULT state, Fault
> code:0x1 subcode:0x0 func:megasas_transition_to_ready
> > [  106.695186] megaraid_sas :03:00.0: System Register set:
> > [  106.889787] megaraid_sas :03:00.0: Failed to transition controller to
> ready for scsi0.
> > [  106.910475] megaraid_sas :03:00.0: Failed from megasas_init_fw
> 6407
> > estuary:/$
> >
> > With the series, now the kernel has direct mapping for the dev as
> > below,
> >

Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node

2021-04-09 Thread Auger Eric
Hi Shameer,

On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> RFC v1 --> v2:
>  - Added a generic interface for IOMMU drivers to retrieve all the 
>    RMR info associated with a given IOMMU.
>  - SMMUv3 driver gets the RMR list during probe() and installs
>    bypass STEs for all the SIDs in the RMR list. This is to keep
>    the ongoing traffic alive(if any) during SMMUv3 reset. This is
>based on the suggestions received for v1 to take care of the
>EFI framebuffer use case. Only sanity tested for now.
>  - During the probe/attach device, SMMUv3 driver reserves any
>    RMR region associated with the device such that there is a unity
>    mapping for them in SMMU.
> ---    
> 
> The series adds support to IORT RMR nodes specified in IORT
> Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory
> ranges that are used by endpoints and require a unity mapping
> in SMMU.
> 
> We have faced issues with 3408iMR RAID controller cards which
> fail to boot when SMMU is enabled. This is because these controllers
> make use of host memory for various caching related purposes and when
> SMMU is enabled the iMR firmware fails to access these memory regions
> as there is no mapping for them. IORT RMR provides a way for UEFI to
> describe and report these memory regions so that the kernel can make
> a unity mapping for these in SMMU.
> 
> RFC because, Patch #1 is to update the actbl2.h and should be done
> through acpica update. I have send out a pull request[1] for that.

What is the state of this series? I wondered if I should consider using
it for nested SMMU to avoid handling nested binding for MSI, as
suggested by Jean. Are there any blocker?

Thanks

Eric
> 
> Tests:
> 
> With a UEFI, that reports the RMR for the dev,
> 
> [16F0h 5872   1] Type : 06
> [16F1h 5873   2]   Length : 007C
> [16F3h 5875   1] Revision : 00
> [1038h 0056   2] Reserved : 
> [1038h 0056   2]   Identifier : 
> [16F8h 5880   4]Mapping Count : 0001
> [16FCh 5884   4]   Mapping Offset : 0040
> 
> [1700h 5888   4]Number of RMR Descriptors : 0002
> [1704h 5892   4]RMR Descriptor Offset : 0018
> 
> [1708h 5896   8]  Base Address of RMR : E640
> [1710h 5904   8]Length of RMR : 0010
> [1718h 5912   4] Reserved : 
> 
> [171Ch 5916   8]  Base Address of RMR : 27B0
> [1724h 5924   8]Length of RMR : 00C0
> [172Ch 5932   4] Reserved : 
> 
> [1730h 5936   4]   Input base : 
> [1734h 5940   4] ID Count : 0001
> [1738h 5944   4]  Output Base : 0003
> [173Ch 5948   4] Output Reference : 0064
> [1740h 5952   4]Flags (decoded below) : 0001
>Single Mapping : 1
> ...
> 
> Without the series the RAID controller initialization fails as
> below,
> 
> ...
> [   12.631117] megaraid_sas :03:00.0: FW supports sync cache: Yes 
>   
> [   12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is 
> called outbound_intr_mask:0x4009  
>  
> [   18.776377] megaraid_sas :03:00.0: Init cmd return status FAILED for 
> SCSI host 0   
>   
> [   23.019383] megaraid_sas :03:00.0: Waiting for FW to come to ready 
> state 
> [  106.684281] megaraid_sas :03:00.0: FW in FAULT state, Fault 
> code:0x1 subcode:0x0 func:megasas_transition_to_ready 
>
> [  106.695186] megaraid_sas :03:00.0: System Register set:
>   
> [  106.889787] megaraid_sas :03:00.0: Failed to transition controller to 
> ready for scsi0.  
>  
> [  106.910475] megaraid_sas :03:00.0: Failed from megasas_init_fw 6407
>   
> estuary:/$
> 
> With the series, now the kernel has direct mapping for the dev as
> below,
> 
> estuary:/$ cat /sys/kernel/iommu_groups/0/reserved_regions
>   
> 0x0800 0x080f msi 
>   
> 0x27b0 0x286f direct  
>   
> 0xe640 0xe64f direct  
>   
> estuary:/$
> 
> 
> [   12.254318] megaraid_sas :03:00.0: megasas_disable_intr_fusion is 
> called outbound_intr_mask:0x4009  
>  
> [   12.739089] megaraid_sas :03:00.0: FW provided supportMaxExtLDs: 0 
>  max_lds: 32  
> 
> [   12.746628] megaraid_sas :03:00.0: controller type   : 

Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-09 Thread Kunkun Jiang

On 2021/4/9 16:31, Auger Eric wrote:

Hi Kunkun,

On 4/9/21 6:48 AM, Kunkun Jiang wrote:

Hi Eric,

On 2021/4/8 20:30, Auger Eric wrote:

Hi Kunkun,

On 4/1/21 2:37 PM, Kunkun Jiang wrote:

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:

With nested stage support, soon we will need to invalidate
S1 contexts and ranges tagged with an unmanaged asid, this
latter being managed by the guest. So let's introduce 2 helpers
that allow to invalidate with externally managed ASIDs

Signed-off-by: Eric Auger 

---

v13 -> v14
- Actually send the NH_ASID command (reported by Xingang Wang)
---
    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38
-
    1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5579ec4fccc8..4c19a1114de4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
arm_smmu_domain *smmu_domain, int ssid,
    }
      /* IO_PGTABLE API */
-static void arm_smmu_tlb_inv_context(void *cookie)
+static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
*smmu_domain,
+   int ext_asid)
    {
-    struct arm_smmu_domain *smmu_domain = cookie;
    struct arm_smmu_device *smmu = smmu_domain->smmu;
    struct arm_smmu_cmdq_ent cmd;
    @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
*cookie)
     * insertion to guarantee those are observed before the TLBI.
Do be
     * careful, 007.
     */
-    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+    if (ext_asid >= 0) { /* guest stage 1 invalidation */
+    cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
+    cmd.tlbi.asid    = ext_asid;
+    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
+    arm_smmu_cmdq_issue_cmd(smmu, );
+    arm_smmu_cmdq_issue_sync(smmu);
+    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
    arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
    } else {
    cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
@@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void
*cookie)
    arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
    }
    +static void arm_smmu_tlb_inv_context(void *cookie)
+{
+    struct arm_smmu_domain *smmu_domain = cookie;
+
+    __arm_smmu_tlb_inv_context(smmu_domain, -1);
+}
+
    static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
     unsigned long iova, size_t size,
     size_t granule,
@@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
arm_smmu_cmdq_ent *cmd,
    arm_smmu_cmdq_batch_submit(smmu, );
    }


Here is the part of code in __arm_smmu_tlb_inv_range():

  if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
  /* Get the leaf page size */
  tg = __ffs(smmu_domain->domain.pgsize_bitmap);

  /* Convert page size of 12,14,16 (log2) to 1,2,3 */
  cmd->tlbi.tg = (tg - 10) / 2;

  /* Determine what level the granule is at */
  cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));

  num_pages = size >> tg;
  }

When pSMMU supports RIL, we get the leaf page size by
__ffs(smmu_domain->
domain.pgsize_bitmap). In nested mode, it is determined by host
PAGE_SIZE. If
the host kernel and guest kernel has different translation granule (e.g.
host 16K,
guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi
command.

Do you have any idea about this issue?

I think this is the same issue as the one reported by Chenxiang

https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c29901...@hisilicon.com/


In case RIL is not supported by the host, next version will use the
smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range

Thanks

Eric

I think they are different. In normal cases, when we want to invalidate the
cache of stage 1, we should use the granule size supported by vSMMU to
implement and issue an tlbi command if pSMMU supports RIL.

But in the current __arm_smmu_tlb_inv_range(), it always uses the granule
size supported by host.
(tg = __ffs(smmu_domain->domain.pgsize_bitmap);)

Let me explain more clearly.
Preconditions of this issue:
1. pSMMU supports RIL
2. host and guest use different translation granule (e.g. host 16K,
guest 4K)

this is not clear to me. See below.

Guest wants to invalidate 4K, so info->granule_size = 4K.
In __arm_smmu_tlb_inv_range(),   if pSMMU supports RIL and host 16K,
tg = 14, tlbi.tg = 2, tlbi.ttl = 4, tlbi.scale = 0, tlbi.num = -1. It is
an incorrect
tlbi command.

If the guest uses 4K granule, this means the pSMMU also supports 4K
granule. Otherwise the corresponding CD is invalid (TG0/TG1 field desc).
So in that case isn't it valid to send a RIL invalidation with tg = 12,
right?

Dose "tg = 12" come from the smallest pSMMU 

[PATCH 1/2] Revert "iommu/amd: Fix performance counter initialization"

2021-04-09 Thread Suravee Suthikulpanit
From: Paul Menzel 

This reverts commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b.

The original commit tries to address an issue, where PMC power-gating
causing the IOMMU PMC pre-init test to fail on certain desktop/mobile
platforms where the power-gating is normally enabled.

There have been several reports that the workaround still does not
guarantee to work, and can add up to 100 ms (on the worst case)
to the boot process on certain platforms such as the MSI B350M MORTAR
with AMD Ryzen 3 2200G.

Therefore, revert this commit as a prelude to removing the pre-init
test.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Signed-off-by: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
Note: I have revised the commit message to add more detail
  and remove uncessary information.

 drivers/iommu/amd/init.c | 45 ++--
 1 file changed, 11 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 321f5906e6ed..648cdfd03074 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -257,8 +256,6 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
 static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum iommu_init_state state);
 static void init_device_table_dma(void);
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
-   u8 fxn, u64 *value, bool is_write);
 
 static bool amd_iommu_pre_enabled = true;
 
@@ -1717,11 +1714,13 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
 }
 
-static void __init init_iommu_perf_ctr(struct amd_iommu *iommu)
+static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
+   u8 fxn, u64 *value, bool is_write);
+
+static void init_iommu_perf_ctr(struct amd_iommu *iommu)
 {
-   int retry;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg, save_src;
+   u64 val = 0xabcd, val2 = 0, save_reg = 0;
 
if (!iommu_feature(iommu, FEATURE_PC))
return;
@@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu 
*iommu)
amd_iommu_pc_present = true;
 
/* save the value to restore, if writable */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, false))
-   goto pc_false;
-
-   /*
-* Disable power gating by programing the performance counter
-* source to 20 (i.e. counts the reads and writes from/to IOMMU
-* Reserved Register [MMIO Offset 1FF8h] that are ignored.),
-* which never get incremented during this init phase.
-* (Note: The event is also deprecated.)
-*/
-   val = 20;
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 8, , true))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
goto pc_false;
 
/* Check if the performance counters can be written to */
-   val = 0xabcd;
-   for (retry = 5; retry; retry--) {
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, , true) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 0, , false) ||
-   val2)
-   break;
-
-   /* Wait about 20 msec for power gating to disable and retry. */
-   msleep(20);
-   }
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, true))
+   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
+   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
+   (val != val2))
goto pc_false;
 
-   if (val != val2)
+   /* restore */
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
goto pc_false;
 
pci_info(pdev, "IOMMU performance counters supported\n");
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Suravee Suthikulpanit
In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be programmed
and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.

Unfortunatly, there is no documentation of since which generation
of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume
that the buggy platforms are less likely to be in used, and it should
be relatively safe to remove this legacy logic.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/init.c | 24 +---
 1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 648cdfd03074..247cdda5d683 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
 }
 
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
-   u8 fxn, u64 *value, bool is_write);
-
 static void init_iommu_perf_ctr(struct amd_iommu *iommu)
 {
+   u64 val;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg = 0;
 
if (!iommu_feature(iommu, FEATURE_PC))
return;
 
amd_iommu_pc_present = true;
 
-   /* save the value to restore, if writable */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
-   goto pc_false;
-
-   /* Check if the performance counters can be written to */
-   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2))
-   goto pc_false;
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
-   goto pc_false;
-
pci_info(pdev, "IOMMU performance counters supported\n");
 
val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
@@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu)
iommu->max_counters = (u8) ((val >> 7) & 0xf);
 
return;
-
-pc_false:
-   pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
-   amd_iommu_pc_present = false;
-   return;
 }
 
 static ssize_t amd_iommu_show_cap(struct device *dev,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] iommu/amd: Revert and remove failing PMC test

2021-04-09 Thread Suravee Suthikulpanit
This has prevented PMC to work on more recent desktop/mobile platforms,
where the PMC power-gating is normally enabled. After consulting
with HW designers and IOMMU maintainer, we have decide to remove
the legacy test altogether to avoid future PMC enabling issues.

Thanks the community for helping to test, investigate, provide data
and report issues on several platforms in the field.

Regards,
Suravee 

Paul Menzel (1):
  Revert "iommu/amd: Fix performance counter initialization"

Suravee Suthikulpanit (1):
  iommu/amd: Remove performance counter pre-initialization test

 drivers/iommu/amd/init.c | 49 ++--
 1 file changed, 2 insertions(+), 47 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-09 Thread Auger Eric
Hi Kunkun,

On 4/9/21 6:48 AM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/4/8 20:30, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 4/1/21 2:37 PM, Kunkun Jiang wrote:
>>> Hi Eric,
>>>
>>> On 2021/2/24 4:56, Eric Auger wrote:
 With nested stage support, soon we will need to invalidate
 S1 contexts and ranges tagged with an unmanaged asid, this
 latter being managed by the guest. So let's introduce 2 helpers
 that allow to invalidate with externally managed ASIDs

 Signed-off-by: Eric Auger 

 ---

 v13 -> v14
 - Actually send the NH_ASID command (reported by Xingang Wang)
 ---
    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38
 -
    1 file changed, 29 insertions(+), 9 deletions(-)

 diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 index 5579ec4fccc8..4c19a1114de4 100644
 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
 arm_smmu_domain *smmu_domain, int ssid,
    }
      /* IO_PGTABLE API */
 -static void arm_smmu_tlb_inv_context(void *cookie)
 +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
 *smmu_domain,
 +   int ext_asid)
    {
 -    struct arm_smmu_domain *smmu_domain = cookie;
    struct arm_smmu_device *smmu = smmu_domain->smmu;
    struct arm_smmu_cmdq_ent cmd;
    @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
 *cookie)
     * insertion to guarantee those are observed before the TLBI.
 Do be
     * careful, 007.
     */
 -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
 +    cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
 +    cmd.tlbi.asid    = ext_asid;
 +    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
 +    arm_smmu_cmdq_issue_cmd(smmu, );
 +    arm_smmu_cmdq_issue_sync(smmu);
 +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
    arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
    } else {
    cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
 @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void
 *cookie)
    arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
    }
    +static void arm_smmu_tlb_inv_context(void *cookie)
 +{
 +    struct arm_smmu_domain *smmu_domain = cookie;
 +
 +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
 +}
 +
    static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
     unsigned long iova, size_t size,
     size_t granule,
 @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
 arm_smmu_cmdq_ent *cmd,
    arm_smmu_cmdq_batch_submit(smmu, );
    }
    
>>> Here is the part of code in __arm_smmu_tlb_inv_range():
  if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
  /* Get the leaf page size */
  tg = __ffs(smmu_domain->domain.pgsize_bitmap);

  /* Convert page size of 12,14,16 (log2) to 1,2,3 */
  cmd->tlbi.tg = (tg - 10) / 2;

  /* Determine what level the granule is at */
  cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));

  num_pages = size >> tg;
  }
>>> When pSMMU supports RIL, we get the leaf page size by
>>> __ffs(smmu_domain->
>>> domain.pgsize_bitmap). In nested mode, it is determined by host
>>> PAGE_SIZE. If
>>> the host kernel and guest kernel has different translation granule (e.g.
>>> host 16K,
>>> guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi
>>> command.
>>>
>>> Do you have any idea about this issue?
>> I think this is the same issue as the one reported by Chenxiang
>>
>> https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c29901...@hisilicon.com/
>>
>>
>> In case RIL is not supported by the host, next version will use the
>> smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range
>>
>> Thanks
>>
>> Eric
> I think they are different. In normal cases, when we want to invalidate the
> cache of stage 1, we should use the granule size supported by vSMMU to
> implement and issue an tlbi command if pSMMU supports RIL.
> 
> But in the current __arm_smmu_tlb_inv_range(), it always uses the granule
> size supported by host.
> (tg = __ffs(smmu_domain->domain.pgsize_bitmap);)
> 
> Let me explain more clearly.
> Preconditions of this issue:
> 1. pSMMU supports RIL
> 2. host and guest use different translation granule (e.g. host 16K,
> guest 4K)
this is not clear to me. See below.
> 
> Guest wants to invalidate 4K, so info->granule_size =