Re: [PATCH v2] iommu/mediatek: fix NULL pointer dereference when printing dev_name

2022-04-25 Thread Yong Wu via iommu
On Mon, 2022-04-25 at 11:03 +0100, Robin Murphy wrote:
> On 2022-04-25 09:24, Miles Chen via iommu wrote:
> > When larbdev is NULL (in the case I hit, the node is incorrectly
> > set
> > iommus = < NUM>), it will cause device_link_add() fail and
> > kernel crashes when we try to print dev_name(larbdev).
> > 
> > Fix it by adding a NULL pointer check before
> > device_link_add/device_link_remove.
> > 
> > It should work for normal correct setting and avoid the crash
> > caused
> > by my incorrect setting.
> > 
> > Error log:
> > [   18.189042][  T301] Unable to handle kernel NULL pointer
> > dereference at virtual address 0050
> > [   18.190247][  T301] Mem abort info:
> > [   18.190255][  T301]   ESR = 0x9605
> > [   18.190263][  T301]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [   18.192142][  T301]   SET = 0, FnV = 0
> > [   18.192151][  T301]   EA = 0, S1PTW = 0
> > [   18.194710][  T301]   FSC = 0x05: level 1 translation fault
> > [   18.195424][  T301] Data abort info:
> > [   18.195888][  T301]   ISV = 0, ISS = 0x0005
> > [   18.196500][  T301]   CM = 0, WnR = 0
> > [   18.196977][  T301] user pgtable: 4k pages, 39-bit VAs,
> > pgdp=000104f9e000
> > [   18.197889][  T301] [0050] pgd=,
> > p4d=, pud=
> > [   18.199220][  T301] Internal error: Oops: 9605 [#1] PREEMPT
> > SMP
> > [   18.343152][  T301] Kernel Offset: 0x144408 from
> > 0xffc00800
> > [   18.343988][  T301] PHYS_OFFSET: 0x4000
> > [   18.344519][  T301] pstate: a045 (NzCv daif +PAN -UAO)
> > [   18.345213][  T301] pc : mtk_iommu_probe_device+0xf8/0x118
> > [mtk_iommu]
> > [   18.346050][  T301] lr : mtk_iommu_probe_device+0xd0/0x118
> > [mtk_iommu]
> > [   18.346884][  T301] sp : ffc00a5635e0
> > [   18.347392][  T301] x29: ffc00a5635e0 x28: ffd44a46c1d8
> > [   18.348156][  T301] x27: ff80c39a8000 x26: ffd44a80cc38
> > [   18.348917][  T301] x25:  x24: ffd44a80cc38
> > [   18.349677][  T301] x23: ffd44e4da4c6 x22: ffd44a80cc38
> > [   18.350438][  T301] x21: ff80cecd1880 x20: 
> > [   18.351198][  T301] x19: ff80c439f010 x18: ffc00a50d0c0
> > [   18.351959][  T301] x17:  x16: 0004
> > [   18.352719][  T301] x15: 0004 x14: ffd44eb5d420
> > [   18.353480][  T301] x13: 0ad2 x12: 0003
> > [   18.354241][  T301] x11: fad2 x10: c000fad2
> > [   18.355003][  T301] x9 : a0d288d8d7142d00 x8 : a0d288d8d7142d00
> > [   18.355763][  T301] x7 : ffd44c2bc640 x6 : 
> > [   18.356524][  T301] x5 : 0080 x4 : 0001
> > [   18.357284][  T301] x3 :  x2 : 0005
> > [   18.358045][  T301] x1 :  x0 : 
> > [   18.360208][  T301] Hardware name: MT6873 (DT)
> > [   18.360771][  T301] Call trace:
> > [   18.361168][  T301]  dump_backtrace+0xf8/0x1f0
> > [   18.361737][  T301]  dump_stack_lvl+0xa8/0x11c
> > [   18.362305][  T301]  dump_stack+0x1c/0x2c
> > [   18.362816][  T301]  mrdump_common_die+0x184/0x40c [mrdump]
> > [   18.363575][  T301]  ipanic_die+0x24/0x38 [mrdump]
> > [   18.364230][  T301]  atomic_notifier_call_chain+0x128/0x2b8
> > [   18.364937][  T301]  die+0x16c/0x568
> > [   18.365394][  T301]  __do_kernel_fault+0x1e8/0x214
> > [   18.365402][  T301]  do_page_fault+0xb8/0x678
> > [   18.366934][  T301]  do_translation_fault+0x48/0x64
> > [   18.368645][  T301]  do_mem_abort+0x68/0x148
> > [   18.368652][  T301]  el1_abort+0x40/0x64
> > [   18.368660][  T301]  el1h_64_sync_handler+0x54/0x88
> > [   18.368668][  T301]  el1h_64_sync+0x68/0x6c
> > [   18.368673][  T301]  mtk_iommu_probe_device+0xf8/0x118
> > [mtk_iommu]
> > [   18.369840][  T301]  __iommu_probe_device+0x12c/0x358
> > [   18.370880][  T301]  iommu_probe_device+0x3c/0x31c
> > [   18.372026][  T301]  of_iommu_configure+0x200/0x274
> > [   18.373587][  T301]  of_dma_configure_id+0x1b8/0x230
> > [   18.375200][  T301]  platform_dma_configure+0x24/0x3c
> > [   18.376456][  T301]  really_probe+0x110/0x504
> > [   18.376464][  T301]  __driver_probe_device+0xb4/0x188
> > [   18.376472][  T301]  driver_probe_device+0x5c/0x2b8
> > [   18.376481][  T301]  __driver_attach+0x338/0x42c
> > [   18.377992][  T301]  bus_add_driver+0x218/0x4c8
> > [   18.379389][  T301]  driver_register+0x84/0x17c
> > [   18.380580][  T301]  __platform_driver_register+0x28/0x38
> > ...
> > 
> > Reported-by: kernel test robot 
> > Fixes: 635319a4a744 ("media: iommu/mediatek: Add device_link
> > between the consumer and the larb devices")
> > Signed-off-by: Miles Chen 
> > 
> > ---
> > 
> > Change since v1
> > fix a build warning reported by kernel test robot
> > https://lore.kernel.org/lkml/202204231446.iykdz674-...@intel.com/
> > 
> > ---
> >   drivers/iommu/mtk_iommu.c| 13 -
> >   drivers/iommu/mtk_iommu_v1.c | 13 -
> > 

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Zhangfei Gao



On 2022/4/26 下午12:36, Fenghua Yu wrote:

On Tue, Apr 26, 2022 at 12:28:00PM +0800, Zhangfei Gao wrote:

Hi, Jean

On 2022/4/26 上午12:13, Jean-Philippe Brucker wrote:

Hi Jacob,

On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote:

Hi Jean-Philippe,

On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
 wrote:


On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:

On 4/25/22 06:53, Jean-Philippe Brucker wrote:

On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com
wrote:

On 5.17
fops_release is called automatically, as well as
iommu_sva_unbind_device. On 5.18-rc1.
fops_release is not called, have to manually call close(fd)

Right that's weird

Looks it is caused by the fix patch, via mmget, which may add
refcount of fd.

Yes indirectly I think: when the process mmaps the queue,
mmap_region() takes a reference to the uacce fd. That reference is
released either by explicit close() or munmap(), or by exit_mmap()
(which is triggered by mmput()). Since there is an mm->fd dependency,
we cannot add a fd->mm dependency, so no mmget()/mmput() in
bind()/unbind().

I guess we should go back to refcounted PASIDs instead, to avoid
freeing them until unbind().

Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
nothing else simple we can do.

How does the IOMMU hardware know that all activity to a given PASID is
finished?  That activity should, today, be independent of an mm or a
fd's lifetime.

In the case of uacce, it's tied to the fd lifetime: opening an accelerator
queue calls iommu_sva_bind_device(), which sets up the PASID context in
the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
destroys the PASID context (after the device driver stopped all DMA for
this PASID).


For VT-d, it is essentially the same flow except managed by the individual
drivers such as DSA.
If free() happens before unbind(), we deactivate the PASIDs and suppress
faults from the device. When the unbind finally comes, we finalize the
PASID teardown. It seems we have a need for an intermediate state where
PASID is "pending free"?

Yes we do have that state, though I'm not sure we need to make it explicit
in the ioasid allocator.

Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
also part of Lu's rework [1].

Move mm_pasid_drop to __mmdrop looks workable.

The nginx works since ioasid is not freed when master exit until nginx stop.

The ioasid does not free immediately when fops_release->unbind finished.
Instead, __mmdrop happens a bit lazy,  which has no issue though
I passed 1 times exit without unbind test, the pasid allocation is ok.

Thanks


diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..60f417f69367 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -792,6 +792,8 @@ void __mmdrop(struct mm_struct *mm)
     mmu_notifier_subscriptions_destroy(mm);
     check_mm(mm);
     put_user_ns(mm->user_ns);
+   mm_pasid_drop(mm);
     free_mm(mm);
  }
  EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1190,7 +1192,6 @@ static inline void __mmput(struct mm_struct *mm)
     }
     if (mm->binfmt)
     module_put(mm->binfmt->module);
-   mm_pasid_drop(mm);
     mmdrop(mm);
  }

Thank you very much, Zhangfei!

I just now sent out an identical patch. It works on X86 as well.

So seems the patch is the right fix.

Either you can send out the patch or I add your Signed-off-by? Either way
is OK for me.

Thanks Fenghua,
It does not matter. Have added tested-by.
I was in stress test for checking the pasid free, since it was freed lazily.

Thank all for the help, a bit nervous, since it is rc4 now.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Zhangfei Gao



On 2022/4/26 下午12:20, Fenghua Yu wrote:

Hi, Jean and Zhangfei,

On Mon, Apr 25, 2022 at 05:13:02PM +0100, Jean-Philippe Brucker wrote:

Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
also part of Lu's rework [1].

Is this a right fix for the issue? Could you please test it on ARM?
I don't have an ARM machine.

Thanks.

-Fenghua

 From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001
From: Fenghua Yu 
Date: Fri, 15 Apr 2022 00:51:33 -0700
Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue

A PASID might be still used on ARM after it is freed in __mmput().

process:
open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm
exit();
exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1
exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure

To avoid the use-after-free issue, free the PASID after no device uses it,
i.e. after all devices are unbound from the mm.

sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count.
__mmdrop() is called only after mm->mm_count is zero. So freeing the PASID
in __mmdrop() guarantees the PASID is safely freed only after no device
is bound to the mm.

Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free 
it on mm exit")

Reported-by: Zhangfei Gao 
Suggested-by: Jean-Philippe Brucker 
Suggested-by: Jacob Pan 
Signed-off-by: Fenghua Yu 

Thanks for the fix.

Tested-by: Zhangfei Gao 



---
  kernel/fork.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..35a3beff140b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm)
mmu_notifier_subscriptions_destroy(mm);
check_mm(mm);
put_user_ns(mm->user_ns);
+   mm_pasid_drop(mm);
free_mm(mm);
  }
  EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm)
}
if (mm->binfmt)
module_put(mm->binfmt->module);
-   mm_pasid_drop(mm);
mmdrop(mm);
  }
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Fenghua Yu
On Tue, Apr 26, 2022 at 12:28:00PM +0800, Zhangfei Gao wrote:
> Hi, Jean
> 
> On 2022/4/26 上午12:13, Jean-Philippe Brucker wrote:
> > Hi Jacob,
> > 
> > On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote:
> > > Hi Jean-Philippe,
> > > 
> > > On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
> > >  wrote:
> > > 
> > > > On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:
> > > > > On 4/25/22 06:53, Jean-Philippe Brucker wrote:
> > > > > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com
> > > > > > wrote:
> > > > > > > > > On 5.17
> > > > > > > > > fops_release is called automatically, as well as
> > > > > > > > > iommu_sva_unbind_device. On 5.18-rc1.
> > > > > > > > > fops_release is not called, have to manually call close(fd)
> > > > > > > > Right that's weird
> > > > > > > Looks it is caused by the fix patch, via mmget, which may add
> > > > > > > refcount of fd.
> > > > > > Yes indirectly I think: when the process mmaps the queue,
> > > > > > mmap_region() takes a reference to the uacce fd. That reference is
> > > > > > released either by explicit close() or munmap(), or by exit_mmap()
> > > > > > (which is triggered by mmput()). Since there is an mm->fd 
> > > > > > dependency,
> > > > > > we cannot add a fd->mm dependency, so no mmget()/mmput() in
> > > > > > bind()/unbind().
> > > > > > 
> > > > > > I guess we should go back to refcounted PASIDs instead, to avoid
> > > > > > freeing them until unbind().
> > > > > Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
> > > > > nothing else simple we can do.
> > > > > 
> > > > > How does the IOMMU hardware know that all activity to a given PASID is
> > > > > finished?  That activity should, today, be independent of an mm or a
> > > > > fd's lifetime.
> > > > In the case of uacce, it's tied to the fd lifetime: opening an 
> > > > accelerator
> > > > queue calls iommu_sva_bind_device(), which sets up the PASID context in
> > > > the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
> > > > destroys the PASID context (after the device driver stopped all DMA for
> > > > this PASID).
> > > > 
> > > For VT-d, it is essentially the same flow except managed by the individual
> > > drivers such as DSA.
> > > If free() happens before unbind(), we deactivate the PASIDs and suppress
> > > faults from the device. When the unbind finally comes, we finalize the
> > > PASID teardown. It seems we have a need for an intermediate state where
> > > PASID is "pending free"?
> > Yes we do have that state, though I'm not sure we need to make it explicit
> > in the ioasid allocator.
> > 
> > Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
> > we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
> > also part of Lu's rework [1].
> 
> Move mm_pasid_drop to __mmdrop looks workable.
> 
> The nginx works since ioasid is not freed when master exit until nginx stop.
> 
> The ioasid does not free immediately when fops_release->unbind finished.
> Instead, __mmdrop happens a bit lazy,  which has no issue though
> I passed 1 times exit without unbind test, the pasid allocation is ok.
> 
> Thanks
> 
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9796897560ab..60f417f69367 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -792,6 +792,8 @@ void __mmdrop(struct mm_struct *mm)
>     mmu_notifier_subscriptions_destroy(mm);
>     check_mm(mm);
>     put_user_ns(mm->user_ns);
> +   mm_pasid_drop(mm);
>     free_mm(mm);
>  }
>  EXPORT_SYMBOL_GPL(__mmdrop);
> @@ -1190,7 +1192,6 @@ static inline void __mmput(struct mm_struct *mm)
>     }
>     if (mm->binfmt)
>     module_put(mm->binfmt->module);
> -   mm_pasid_drop(mm);
>     mmdrop(mm);
>  }

Thank you very much, Zhangfei!

I just now sent out an identical patch. It works on X86 as well.

So seems the patch is the right fix.

Either you can send out the patch or I add your Signed-off-by? Either way
is OK for me.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Zhangfei Gao

Hi, Jean

On 2022/4/26 上午12:13, Jean-Philippe Brucker wrote:

Hi Jacob,

On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote:

Hi Jean-Philippe,

On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
 wrote:


On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:

On 4/25/22 06:53, Jean-Philippe Brucker wrote:

On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com
wrote:

On 5.17
fops_release is called automatically, as well as
iommu_sva_unbind_device. On 5.18-rc1.
fops_release is not called, have to manually call close(fd)

Right that's weird

Looks it is caused by the fix patch, via mmget, which may add
refcount of fd.

Yes indirectly I think: when the process mmaps the queue,
mmap_region() takes a reference to the uacce fd. That reference is
released either by explicit close() or munmap(), or by exit_mmap()
(which is triggered by mmput()). Since there is an mm->fd dependency,
we cannot add a fd->mm dependency, so no mmget()/mmput() in
bind()/unbind().

I guess we should go back to refcounted PASIDs instead, to avoid
freeing them until unbind().

Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
nothing else simple we can do.

How does the IOMMU hardware know that all activity to a given PASID is
finished?  That activity should, today, be independent of an mm or a
fd's lifetime.

In the case of uacce, it's tied to the fd lifetime: opening an accelerator
queue calls iommu_sva_bind_device(), which sets up the PASID context in
the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
destroys the PASID context (after the device driver stopped all DMA for
this PASID).


For VT-d, it is essentially the same flow except managed by the individual
drivers such as DSA.
If free() happens before unbind(), we deactivate the PASIDs and suppress
faults from the device. When the unbind finally comes, we finalize the
PASID teardown. It seems we have a need for an intermediate state where
PASID is "pending free"?

Yes we do have that state, though I'm not sure we need to make it explicit
in the ioasid allocator.

Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
also part of Lu's rework [1].


Move mm_pasid_drop to __mmdrop looks workable.

The nginx works since ioasid is not freed when master exit until nginx stop.

The ioasid does not free immediately when fops_release->unbind finished.
Instead, __mmdrop happens a bit lazy,  which has no issue though
I passed 1 times exit without unbind test, the pasid allocation is ok.

Thanks


diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..60f417f69367 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -792,6 +792,8 @@ void __mmdrop(struct mm_struct *mm)
    mmu_notifier_subscriptions_destroy(mm);
    check_mm(mm);
    put_user_ns(mm->user_ns);
+   mm_pasid_drop(mm);
    free_mm(mm);
 }
 EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1190,7 +1192,6 @@ static inline void __mmput(struct mm_struct *mm)
    }
    if (mm->binfmt)
    module_put(mm->binfmt->module);
-   mm_pasid_drop(mm);
    mmdrop(mm);
 }



Thanks,
Jean

[1] 
https://lore.kernel.org/linux-iommu/20220421052121.3464100-9-baolu...@linux.intel.com/


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Fenghua Yu
Hi, Jean and Zhangfei,

On Mon, Apr 25, 2022 at 05:13:02PM +0100, Jean-Philippe Brucker wrote:
> Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
> we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
> also part of Lu's rework [1].

Is this a right fix for the issue? Could you please test it on ARM?
I don't have an ARM machine.

Thanks.

-Fenghua

>From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001
From: Fenghua Yu 
Date: Fri, 15 Apr 2022 00:51:33 -0700
Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue

A PASID might be still used on ARM after it is freed in __mmput().

process:
open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm
exit();
exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1
exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure

To avoid the use-after-free issue, free the PASID after no device uses it,
i.e. after all devices are unbound from the mm.

sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count.
__mmdrop() is called only after mm->mm_count is zero. So freeing the PASID
in __mmdrop() guarantees the PASID is safely freed only after no device
is bound to the mm.

Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and 
free it on mm exit")

Reported-by: Zhangfei Gao 
Suggested-by: Jean-Philippe Brucker 
Suggested-by: Jacob Pan 
Signed-off-by: Fenghua Yu 
---
 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..35a3beff140b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm)
mmu_notifier_subscriptions_destroy(mm);
check_mm(mm);
put_user_ns(mm->user_ns);
+   mm_pasid_drop(mm);
free_mm(mm);
 }
 EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm)
}
if (mm->binfmt)
module_put(mm->binfmt->module);
-   mm_pasid_drop(mm);
mmdrop(mm);
 }
 
-- 
2.32.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/2] PCI/ACPI: Support Microsoft's "DmaProperty"

2022-04-25 Thread Rajat Jain via iommu
The "DmaProperty" is supported and currently documented and used by
Microsoft [link 1 below], to flag internal PCIe root ports that need
DMA protection [link 2 below]. We have discussed with them and reached
a common understanding that they shall change their MSDN documentation
to say that the same property can be used to protect any PCI device,
and not just internal PCIe root ports (since there is no point
introducing yet another property for arbitrary PCI devices). This helps
with security from internal devices that offer an attack surface for
DMA attacks (e.g. internal network devices).

Support DmaProperty to mark DMA from a PCI device as untrusted.

Link: [1] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-internal-pcie-ports-accessible-to-users-and-requiring-dma-protection
Link: [2] 
https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt
Signed-off-by: Rajat Jain 
Reviewed-by: Mika Westerberg 
Acked-by: Rafael J. Wysocki 
---
v6: * Take care of Bjorn's comments:
   - Update the commit log
   - Rename to pci_dev_has_dma_property()
   - Use acpi_dev_get_property()
v5: * Reorder the patches in the series
v4: * Add the GUID. 
* Update the comment and commitlog.
v3: * Use Microsoft's documented property "DmaProperty"
* Resctrict to ACPI only

 drivers/acpi/property.c |  3 +++
 drivers/pci/pci-acpi.c  | 21 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
index 12bbfe833609..bafe35c301ac 100644
--- a/drivers/acpi/property.c
+++ b/drivers/acpi/property.c
@@ -48,6 +48,9 @@ static const guid_t prp_guids[] = {
/* Storage device needs D3 GUID: 5025030f-842f-4ab4-a561-99a5189762d0 */
GUID_INIT(0x5025030f, 0x842f, 0x4ab4,
  0xa5, 0x61, 0x99, 0xa5, 0x18, 0x97, 0x62, 0xd0),
+   /* DmaProperty for PCI devices GUID: 
70d24161-6dd5-4c9e-8070-705531292865 */
+   GUID_INIT(0x70d24161, 0x6dd5, 0x4c9e,
+ 0x80, 0x70, 0x70, 0x55, 0x31, 0x29, 0x28, 0x65),
 };
 
 /* ACPI _DSD data subnodes GUID: dbb8e3e6-5886-4ba6-8795-1319f52a966b */
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 3ae435beaf0a..d7c6ba48486f 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1369,12 +1369,33 @@ static void pci_acpi_set_external_facing(struct pci_dev 
*dev)
dev->external_facing = 1;
 }
 
+static int pci_dev_has_dma_property(struct pci_dev *dev)
+{
+   struct acpi_device *adev;
+   const union acpi_object *obj;
+
+   adev = ACPI_COMPANION(>dev);
+   if (!adev)
+   return 0;
+
+   /*
+* Property also used by Microsoft Windows for same purpose,
+* (to implement DMA protection from a device, using the IOMMU).
+*/
+   if (!acpi_dev_get_property(adev, "DmaProperty", ACPI_TYPE_INTEGER,
+  ) && obj->integer.value == 1)
+   return 1;
+
+   return 0;
+}
+
 void pci_acpi_setup(struct device *dev, struct acpi_device *adev)
 {
struct pci_dev *pci_dev = to_pci_dev(dev);
 
pci_acpi_optimize_delay(pci_dev, adev->handle);
pci_acpi_set_external_facing(pci_dev);
+   pci_dev->untrusted |= pci_dev_has_dma_property(pci_dev);
pci_acpi_add_edr_notifier(pci_dev);
 
pci_acpi_add_pm_notifier(adev, pci_dev);
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 2/2] PCI: Rename pci_dev->untrusted to pci_dev->untrusted_dma

2022-04-25 Thread Rajat Jain via iommu
Rename the field to make it more clear, that the device can execute DMA
attacks on the system, and thus the system may need protection from
such attacks from this device.

No functional change intended.

Signed-off-by: Rajat Jain 
Reviewed-by: Mika Westerberg 
Acked-by: Rafael J. Wysocki 
---
v6: No change in this patch, rebased on top of changes in other patch.
v5: Use "untrusted_dma" as property name, based on feedback.
Reorder the patches in the series.
v4: Initial version, created based on comments on other patch

 drivers/iommu/dma-iommu.c   | 6 +++---
 drivers/iommu/intel/iommu.c | 2 +-
 drivers/iommu/iommu.c   | 2 +-
 drivers/pci/ats.c   | 2 +-
 drivers/pci/pci-acpi.c  | 2 +-
 drivers/pci/pci.c   | 2 +-
 drivers/pci/probe.c | 8 
 drivers/pci/quirks.c| 2 +-
 include/linux/pci.h | 5 +++--
 9 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..aeee4be7614d 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -497,14 +497,14 @@ static int iova_reserve_iommu_regions(struct device *dev,
return ret;
 }
 
-static bool dev_is_untrusted(struct device *dev)
+static bool dev_has_untrusted_dma(struct device *dev)
 {
-   return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
+   return dev_is_pci(dev) && to_pci_dev(dev)->untrusted_dma;
 }
 
 static bool dev_use_swiotlb(struct device *dev)
 {
-   return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
+   return IS_ENABLED(CONFIG_SWIOTLB) && dev_has_untrusted_dma(dev);
 }
 
 /**
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index df5c62ecf942..b88f47391140 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4843,7 +4843,7 @@ static bool intel_iommu_is_attach_deferred(struct device 
*dev)
  */
 static bool risky_device(struct pci_dev *pdev)
 {
-   if (pdev->untrusted) {
+   if (pdev->untrusted_dma) {
pci_info(pdev,
 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted 
PCI link\n",
 pdev->vendor, pdev->device);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2c45b85b9fc..d8d3133e2947 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1525,7 +1525,7 @@ static int iommu_get_def_domain_type(struct device *dev)
 {
const struct iommu_ops *ops = dev_iommu_ops(dev);
 
-   if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted)
+   if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted_dma)
return IOMMU_DOMAIN_DMA;
 
if (ops->def_domain_type)
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index c967ad6e2626..477c16ba9341 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -42,7 +42,7 @@ bool pci_ats_supported(struct pci_dev *dev)
if (!dev->ats_cap)
return false;
 
-   return (dev->untrusted == 0);
+   return (dev->untrusted_dma == 0);
 }
 EXPORT_SYMBOL_GPL(pci_ats_supported);
 
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index d7c6ba48486f..7c2784e7e954 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1395,7 +1395,7 @@ void pci_acpi_setup(struct device *dev, struct 
acpi_device *adev)
 
pci_acpi_optimize_delay(pci_dev, adev->handle);
pci_acpi_set_external_facing(pci_dev);
-   pci_dev->untrusted |= pci_dev_has_dma_property(pci_dev);
+   pci_dev->untrusted_dma |= pci_dev_has_dma_property(pci_dev);
pci_acpi_add_edr_notifier(pci_dev);
 
pci_acpi_add_pm_notifier(adev, pci_dev);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9ecce435fb3f..1fb0eb8646c8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -958,7 +958,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
ctrl |= (cap & PCI_ACS_UF);
 
/* Enable Translation Blocking for external devices and noats */
-   if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
+   if (pci_ats_disabled() || dev->external_facing || dev->untrusted_dma)
ctrl |= (cap & PCI_ACS_TB);
 
pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 17a969942d37..d2a9b26fcede 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1587,7 +1587,7 @@ static void set_pcie_thunderbolt(struct pci_dev *dev)
dev->is_thunderbolt = 1;
 }
 
-static void set_pcie_untrusted(struct pci_dev *dev)
+static void pci_set_untrusted_dma(struct pci_dev *dev)
 {
struct pci_dev *parent;
 
@@ -1596,8 +1596,8 @@ static void set_pcie_untrusted(struct pci_dev *dev)
 * untrusted as well.
 */
parent = pci_upstream_bridge(dev);
-   if (parent && (parent->untrusted || parent->external_facing))
-   dev->untrusted = true;
+   if (parent && (parent->untrusted_dma || 

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Jacob Pan
Hi Jean-Philippe,

On Mon, 25 Apr 2022 17:13:02 +0100, Jean-Philippe Brucker
 wrote:

> Hi Jacob,
> 
> On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote:
> > Hi Jean-Philippe,
> > 
> > On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
> >  wrote:
> >   
> > > On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:  
> > > > On 4/25/22 06:53, Jean-Philippe Brucker wrote:
> > > > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com
> > > > > wrote:
> > > >  On 5.17
> > > >  fops_release is called automatically, as well as
> > > >  iommu_sva_unbind_device. On 5.18-rc1.
> > > >  fops_release is not called, have to manually call close(fd)
> > > > >>> Right that's weird
> > > > >> Looks it is caused by the fix patch, via mmget, which may add
> > > > >> refcount of fd.
> > > > > Yes indirectly I think: when the process mmaps the queue,
> > > > > mmap_region() takes a reference to the uacce fd. That reference is
> > > > > released either by explicit close() or munmap(), or by exit_mmap()
> > > > > (which is triggered by mmput()). Since there is an mm->fd
> > > > > dependency, we cannot add a fd->mm dependency, so no
> > > > > mmget()/mmput() in bind()/unbind().
> > > > > 
> > > > > I guess we should go back to refcounted PASIDs instead, to avoid
> > > > > freeing them until unbind().
> > > > 
> > > > Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
> > > > nothing else simple we can do.
> > > > 
> > > > How does the IOMMU hardware know that all activity to a given PASID
> > > > is finished?  That activity should, today, be independent of an mm
> > > > or a fd's lifetime.
> > > 
> > > In the case of uacce, it's tied to the fd lifetime: opening an
> > > accelerator queue calls iommu_sva_bind_device(), which sets up the
> > > PASID context in the IOMMU. Closing the queue calls
> > > iommu_sva_unbind_device() which destroys the PASID context (after the
> > > device driver stopped all DMA for this PASID).
> > >   
> > For VT-d, it is essentially the same flow except managed by the
> > individual drivers such as DSA.
> > If free() happens before unbind(), we deactivate the PASIDs and suppress
> > faults from the device. When the unbind finally comes, we finalize the
> > PASID teardown. It seems we have a need for an intermediate state where
> > PASID is "pending free"?  
> 
> Yes we do have that state, though I'm not sure we need to make it explicit
> in the ioasid allocator.
> 
IMHO, making it explicit would fail ioasid_get() on a "pending free" PASID.
Making free a one-way trip and prevent further complications.

> Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
> we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
> also part of Lu's rework [1].
> 
Yes, I would agree. IIRC, Fenghua's early patch was doing pasid drop
in mmdrop. Maybe I missed something.

> Thanks,
> Jean
> 
> [1]
> https://lore.kernel.org/linux-iommu/20220421052121.3464100-9-baolu...@linux.intel.com/


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Jean-Philippe Brucker
On Mon, Apr 25, 2022 at 08:55:46AM -0700, Dave Hansen wrote:
> On 4/25/22 07:26, Jean-Philippe Brucker wrote:
> >>
> >> How does the IOMMU hardware know that all activity to a given PASID is
> >> finished?  That activity should, today, be independent of an mm or a
> >> fd's lifetime.
> > In the case of uacce, it's tied to the fd lifetime: opening an accelerator
> > queue calls iommu_sva_bind_device(), which sets up the PASID context in
> > the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
> > destroys the PASID context (after the device driver stopped all DMA for
> > this PASID).
> 
> Could this PASID context destruction move from being "fd-based" to
> happening under mm_pasid_drop()?  Logically, it seems like that should
> work because mm_pasid_drop() happens after exit_mmap() where the VMAs
> (which hold references to 'struct file' via vma->vm_file) are torn down.

The problem is that we'd have to request the device driver to stop DMA
before we can destroy the context and free the PASID. We did consider
doing this in the release() MMU notifier, but there were concerns about
blocking mmput() for too long (for example
https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/
though I think there was a more recent discussion). We also need to drain
the PRI and fault queues to get rid of all references to that PASID.

At the moment we disable (but not destroy) the PASID context in release(),
so when the process gets killed pending DMA transactions are silently
ignored. Then the device driver informs us through unbind() that no DMA is
active anymore and we can finish cleaning up, then reuse the PASID.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Jean-Philippe Brucker
Hi Jacob,

On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote:
> Hi Jean-Philippe,
> 
> On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
>  wrote:
> 
> > On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:
> > > On 4/25/22 06:53, Jean-Philippe Brucker wrote:  
> > > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com
> > > > wrote:  
> > >  On 5.17
> > >  fops_release is called automatically, as well as
> > >  iommu_sva_unbind_device. On 5.18-rc1.
> > >  fops_release is not called, have to manually call close(fd)  
> > > >>> Right that's weird  
> > > >> Looks it is caused by the fix patch, via mmget, which may add
> > > >> refcount of fd.  
> > > > Yes indirectly I think: when the process mmaps the queue,
> > > > mmap_region() takes a reference to the uacce fd. That reference is
> > > > released either by explicit close() or munmap(), or by exit_mmap()
> > > > (which is triggered by mmput()). Since there is an mm->fd dependency,
> > > > we cannot add a fd->mm dependency, so no mmget()/mmput() in
> > > > bind()/unbind().
> > > > 
> > > > I guess we should go back to refcounted PASIDs instead, to avoid
> > > > freeing them until unbind().  
> > > 
> > > Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
> > > nothing else simple we can do.
> > > 
> > > How does the IOMMU hardware know that all activity to a given PASID is
> > > finished?  That activity should, today, be independent of an mm or a
> > > fd's lifetime.  
> > 
> > In the case of uacce, it's tied to the fd lifetime: opening an accelerator
> > queue calls iommu_sva_bind_device(), which sets up the PASID context in
> > the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
> > destroys the PASID context (after the device driver stopped all DMA for
> > this PASID).
> > 
> For VT-d, it is essentially the same flow except managed by the individual
> drivers such as DSA.
> If free() happens before unbind(), we deactivate the PASIDs and suppress
> faults from the device. When the unbind finally comes, we finalize the
> PASID teardown. It seems we have a need for an intermediate state where
> PASID is "pending free"?

Yes we do have that state, though I'm not sure we need to make it explicit
in the ioasid allocator.

Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
also part of Lu's rework [1].

Thanks,
Jean

[1] 
https://lore.kernel.org/linux-iommu/20220421052121.3464100-9-baolu...@linux.intel.com/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Dave Hansen
On 4/25/22 07:26, Jean-Philippe Brucker wrote:
>>
>> How does the IOMMU hardware know that all activity to a given PASID is
>> finished?  That activity should, today, be independent of an mm or a
>> fd's lifetime.
> In the case of uacce, it's tied to the fd lifetime: opening an accelerator
> queue calls iommu_sva_bind_device(), which sets up the PASID context in
> the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
> destroys the PASID context (after the device driver stopped all DMA for
> this PASID).

Could this PASID context destruction move from being "fd-based" to
happening under mm_pasid_drop()?  Logically, it seems like that should
work because mm_pasid_drop() happens after exit_mmap() where the VMAs
(which hold references to 'struct file' via vma->vm_file) are torn down.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Jacob Pan
Hi Jean-Philippe,

On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
 wrote:

> On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:
> > On 4/25/22 06:53, Jean-Philippe Brucker wrote:  
> > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com
> > > wrote:  
> >  On 5.17
> >  fops_release is called automatically, as well as
> >  iommu_sva_unbind_device. On 5.18-rc1.
> >  fops_release is not called, have to manually call close(fd)  
> > >>> Right that's weird  
> > >> Looks it is caused by the fix patch, via mmget, which may add
> > >> refcount of fd.  
> > > Yes indirectly I think: when the process mmaps the queue,
> > > mmap_region() takes a reference to the uacce fd. That reference is
> > > released either by explicit close() or munmap(), or by exit_mmap()
> > > (which is triggered by mmput()). Since there is an mm->fd dependency,
> > > we cannot add a fd->mm dependency, so no mmget()/mmput() in
> > > bind()/unbind().
> > > 
> > > I guess we should go back to refcounted PASIDs instead, to avoid
> > > freeing them until unbind().  
> > 
> > Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
> > nothing else simple we can do.
> > 
> > How does the IOMMU hardware know that all activity to a given PASID is
> > finished?  That activity should, today, be independent of an mm or a
> > fd's lifetime.  
> 
> In the case of uacce, it's tied to the fd lifetime: opening an accelerator
> queue calls iommu_sva_bind_device(), which sets up the PASID context in
> the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
> destroys the PASID context (after the device driver stopped all DMA for
> this PASID).
> 
For VT-d, it is essentially the same flow except managed by the individual
drivers such as DSA.
If free() happens before unbind(), we deactivate the PASIDs and suppress
faults from the device. When the unbind finally comes, we finalize the
PASID teardown. It seems we have a need for an intermediate state where
PASID is "pending free"?

> Thanks,
> Jean
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Jean-Philippe Brucker
On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:
> On 4/25/22 06:53, Jean-Philippe Brucker wrote:
> > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com wrote:
>  On 5.17
>  fops_release is called automatically, as well as iommu_sva_unbind_device.
>  On 5.18-rc1.
>  fops_release is not called, have to manually call close(fd)
> >>> Right that's weird
> >> Looks it is caused by the fix patch, via mmget, which may add refcount of
> >> fd.
> > Yes indirectly I think: when the process mmaps the queue, mmap_region()
> > takes a reference to the uacce fd. That reference is released either by
> > explicit close() or munmap(), or by exit_mmap() (which is triggered by
> > mmput()). Since there is an mm->fd dependency, we cannot add a fd->mm
> > dependency, so no mmget()/mmput() in bind()/unbind().
> > 
> > I guess we should go back to refcounted PASIDs instead, to avoid freeing
> > them until unbind().
> 
> Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
> nothing else simple we can do.
> 
> How does the IOMMU hardware know that all activity to a given PASID is
> finished?  That activity should, today, be independent of an mm or a
> fd's lifetime.

In the case of uacce, it's tied to the fd lifetime: opening an accelerator
queue calls iommu_sva_bind_device(), which sets up the PASID context in
the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
destroys the PASID context (after the device driver stopped all DMA for
this PASID).

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Dave Hansen
On 4/25/22 06:53, Jean-Philippe Brucker wrote:
> On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com wrote:
 On 5.17
 fops_release is called automatically, as well as iommu_sva_unbind_device.
 On 5.18-rc1.
 fops_release is not called, have to manually call close(fd)
>>> Right that's weird
>> Looks it is caused by the fix patch, via mmget, which may add refcount of
>> fd.
> Yes indirectly I think: when the process mmaps the queue, mmap_region()
> takes a reference to the uacce fd. That reference is released either by
> explicit close() or munmap(), or by exit_mmap() (which is triggered by
> mmput()). Since there is an mm->fd dependency, we cannot add a fd->mm
> dependency, so no mmget()/mmput() in bind()/unbind().
> 
> I guess we should go back to refcounted PASIDs instead, to avoid freeing
> them until unbind().

Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
nothing else simple we can do.

How does the IOMMU hardware know that all activity to a given PASID is
finished?  That activity should, today, be independent of an mm or a
fd's lifetime.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/sva: Revert mm_pasid_drop and reusing refcount in pasid allocation

2022-04-25 Thread Zhangfei Gao
Commit 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID
allocation and free it on mm exit")

However the Arm code was written expecting the PASID to be freed on
unbind(), not mm exit.
The mm_pasid_drop in mmput free mm->pasid, which is only index,
but all related resources are still there.

This cause many strange issues:
For example, user driver is ended unexpectedly without calling unbind,
then mmput and mm_pasid_drop will free pasid.
Finally fops_release->unbind is called, error will happen
since mm->pasid = -1.

This also breaks nginx, where master process bind then fork child (daemon)
to manage all worker processes. The master process then exit and free ioasid
but all related resources are still there.
The worker process will alloc the same ioasid just freed, causing hardware
error.

Hardware reports:
[  152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error 
status=0x20] found
[  152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error 
status=0x40] found
[  152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error status=0x20] 
found
[  152.755340] hisi_sec2 :76:00.0: Controller resetting...
[  152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout!
[  152.768198] hisi_sec2 :76:00.0: Failed to dump sqc!
[  152.773490] hisi_sec2 :76:00.0: Failed to drain out data for stopping!
[  152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start!
[  152.787468] hisi_sec2 :76:00.0: Failed to dump sqc!
[  152.792753] hisi_sec2 :76:00.0: Failed to drain out data for stopping!
[  152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start!
[  152.806730] hisi_sec2 :76:00.0: Failed to dump sqc!
[  152.812017] hisi_sec2 :76:00.0: Failed to drain out data for stopping!
[  152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start!
[  152.825992] hisi_sec2 :76:00.0: Failed to dump sqc!

Signed-off-by: Zhangfei Gao 
---
This patch partily revert Commit 701fac40384f ("iommu/sva: Assign a PASID to mm 
on PASID
allocation and free it on mm exit") since it can not revert directly now.
And use pasid_valid accordingly 


 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c |  5 +++-
 drivers/iommu/intel/svm.c   |  9 ++
 drivers/iommu/ioasid.c  | 39 ++---
 drivers/iommu/iommu-sva-lib.c   | 37 ---
 drivers/iommu/iommu-sva-lib.h   |  1 +
 include/linux/ioasid.h  | 12 ++--
 include/linux/sched/mm.h| 16 --
 kernel/fork.c   |  1 -
 8 files changed, 85 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 22ddd05..a737ba5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -340,12 +340,14 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
bond->smmu_mn = arm_smmu_mmu_notifier_get(smmu_domain, mm);
if (IS_ERR(bond->smmu_mn)) {
ret = PTR_ERR(bond->smmu_mn);
-   goto err_free_bond;
+   goto err_free_pasid;
}
 
list_add(>list, >bonds);
return >sva;
 
+err_free_pasid:
+   iommu_sva_free_pasid(mm);
 err_free_bond:
kfree(bond);
return ERR_PTR(ret);
@@ -375,6 +377,7 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
if (refcount_dec_and_test(>refs)) {
list_del(>list);
arm_smmu_mmu_notifier_put(bond->smmu_mn);
+   iommu_sva_free_pasid(bond->mm);
kfree(bond);
}
mutex_unlock(_lock);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 23a3876..241d095 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -322,6 +322,11 @@ static int intel_svm_alloc_pasid(struct device *dev, 
struct mm_struct *mm,
return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
 }
 
+static void intel_svm_free_pasid(struct mm_struct *mm)
+{
+   iommu_sva_free_pasid(mm);
+}
+
 static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
   struct device *dev,
   struct mm_struct *mm,
@@ -465,6 +470,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
pasid)
kfree(svm);
}
}
+   /* Drop a PASID reference and free it if no reference. */
+   intel_svm_free_pasid(mm);
}
 out:
return ret;
@@ -848,6 +855,8 @@ struct iommu_sva *intel_svm_bind(struct device *dev, struct 
mm_struct *mm, void
}
 
sva = intel_svm_bind_mm(iommu, dev, mm, flags);
+   if (IS_ERR_OR_NULL(sva))
+   intel_svm_free_pasid(mm);
mutex_unlock(_mutex);
 
   

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Jean-Philippe Brucker
On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com wrote:
> > > On 5.17
> > > fops_release is called automatically, as well as iommu_sva_unbind_device.
> > > On 5.18-rc1.
> > > fops_release is not called, have to manually call close(fd)
> > Right that's weird
> Looks it is caused by the fix patch, via mmget, which may add refcount of
> fd.

Yes indirectly I think: when the process mmaps the queue, mmap_region()
takes a reference to the uacce fd. That reference is released either by
explicit close() or munmap(), or by exit_mmap() (which is triggered by
mmput()). Since there is an mm->fd dependency, we cannot add a fd->mm
dependency, so no mmget()/mmput() in bind()/unbind().

I guess we should go back to refcounted PASIDs instead, to avoid freeing
them until unbind().

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/4] thunderbolt: Make iommu_dma_protection more accurate

2022-04-25 Thread Robin Murphy
Between me trying to get rid of iommu_present() and Mario wanting to
support the AMD equivalent of DMAR_PLATFORM_OPT_IN, scrutiny has shown
that the iommu_dma_protection attribute is being far too optimistic.
Even if an IOMMU might be present for some PCI segment in the system,
that doesn't necessarily mean it provides translation for the device(s)
we care about. Furthermore, all that DMAR_PLATFORM_OPT_IN really does
is tell us that memory was protected before the kernel was loaded, and
prevent the user from disabling the intel-iommu driver entirely. While
that lets us assume kernel integrity, what matters for actual runtime
DMA protection is whether we trust individual devices, based on the
"external facing" property that we expect firmware to describe for
Thunderbolt ports.

It's proven challenging to determine the appropriate ports accurately
given the variety of possible topologies, so while still not getting a
perfect answer, by putting enough faith in firmware we can at least get
a good bit closer. If we can see that any device near a Thunderbolt NHI
has all the requisites for Kernel DMA Protection, chances are that it
*is* a relevant port, but moreover that implies that firmware is playing
the game overall, so we'll use that to assume that all Thunderbolt ports
should be correctly marked and thus will end up fully protected.

CC: Mario Limonciello 
Reviewed-by: Christoph Hellwig 
Acked-by: Mika Westerberg 
Signed-off-by: Robin Murphy 
---

v4: No change

 drivers/thunderbolt/domain.c | 12 +++---
 drivers/thunderbolt/nhi.c| 44 
 include/linux/thunderbolt.h  |  2 ++
 3 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/drivers/thunderbolt/domain.c b/drivers/thunderbolt/domain.c
index 7018d959f775..2889a214dadc 100644
--- a/drivers/thunderbolt/domain.c
+++ b/drivers/thunderbolt/domain.c
@@ -7,9 +7,7 @@
  */
 
 #include 
-#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -257,13 +255,9 @@ static ssize_t iommu_dma_protection_show(struct device 
*dev,
 struct device_attribute *attr,
 char *buf)
 {
-   /*
-* Kernel DMA protection is a feature where Thunderbolt security is
-* handled natively using IOMMU. It is enabled when IOMMU is
-* enabled and ACPI DMAR table has DMAR_PLATFORM_OPT_IN set.
-*/
-   return sprintf(buf, "%d\n",
-  iommu_present(_bus_type) && dmar_platform_optin());
+   struct tb *tb = container_of(dev, struct tb, dev);
+
+   return sysfs_emit(buf, "%d\n", tb->nhi->iommu_dma_protection);
 }
 static DEVICE_ATTR_RO(iommu_dma_protection);
 
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 4a582183f675..4bc87b0f003a 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -15,9 +15,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #include "nhi.h"
 #include "nhi_regs.h"
@@ -1103,6 +1105,47 @@ static void nhi_check_quirks(struct tb_nhi *nhi)
nhi->quirks |= QUIRK_AUTO_CLEAR_INT;
 }
 
+static int nhi_check_iommu_pdev(struct pci_dev *pdev, void *data)
+{
+   if (!pdev->external_facing ||
+   !device_iommu_capable(>dev, IOMMU_CAP_PRE_BOOT_PROTECTION))
+   return 0;
+   *(bool *)data = true;
+   return 1; /* Stop walking */
+}
+
+static void nhi_check_iommu(struct tb_nhi *nhi)
+{
+   struct pci_bus *bus = nhi->pdev->bus;
+   bool port_ok = false;
+
+   /*
+* Ideally what we'd do here is grab every PCI device that
+* represents a tunnelling adapter for this NHI and check their
+* status directly, but unfortunately USB4 seems to make it
+* obnoxiously difficult to reliably make any correlation.
+*
+* So for now we'll have to bodge it... Hoping that the system
+* is at least sane enough that an adapter is in the same PCI
+* segment as its NHI, if we can find *something* on that segment
+* which meets the requirements for Kernel DMA Protection, we'll
+* take that to imply that firmware is aware and has (hopefully)
+* done the right thing in general. We need to know that the PCI
+* layer has seen the ExternalFacingPort property which will then
+* inform the IOMMU layer to enforce the complete "untrusted DMA"
+* flow, but also that the IOMMU driver itself can be trusted not
+* to have been subverted by a pre-boot DMA attack.
+*/
+   while (bus->parent)
+   bus = bus->parent;
+
+   pci_walk_bus(bus, nhi_check_iommu_pdev, _ok);
+
+   nhi->iommu_dma_protection = port_ok;
+   dev_dbg(>pdev->dev, "IOMMU DMA protection is %s\n",
+   str_enabled_disabled(port_ok));
+}
+
 static int nhi_init_msi(struct tb_nhi *nhi)
 {
struct pci_dev *pdev = nhi->pdev;
@@ -1220,6 +1263,7 @@ static 

[PATCH v4 4/4] iommu/amd: Indicate whether DMA remap support is enabled

2022-04-25 Thread Robin Murphy
From: Mario Limonciello 

Bit 1 of the IVFS IVInfo field indicates that IOMMU has been used for
pre-boot DMA protection.

Export this capability to allow other places in the kernel to be able to
check for it on AMD systems.

Link: https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
Reviewed-by: Christoph Hellwig 
Signed-off-by: Mario Limonciello 
Signed-off-by: Robin Murphy 
---

v4: No change

 drivers/iommu/amd/amd_iommu_types.h | 4 
 drivers/iommu/amd/init.c| 3 +++
 drivers/iommu/amd/iommu.c   | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 47108ed44fbb..72d0f5e2f651 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -407,6 +407,7 @@
 /* IOMMU IVINFO */
 #define IOMMU_IVINFO_OFFSET 36
 #define IOMMU_IVINFO_EFRSUP BIT(0)
+#define IOMMU_IVINFO_DMA_REMAP  BIT(1)
 
 /* IOMMU Feature Reporting Field (for IVHD type 10h */
 #define IOMMU_FEAT_GASUP_SHIFT 6
@@ -449,6 +450,9 @@ extern struct irq_remap_table **irq_lookup_table;
 /* Interrupt remapping feature used? */
 extern bool amd_iommu_irq_remap;
 
+/* IVRS indicates that pre-boot remapping was enabled */
+extern bool amdr_ivrs_remap_support;
+
 /* kmem_cache to get tables with 128 byte alignement */
 extern struct kmem_cache *amd_iommu_irq_cache;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b4a798c7b347..0467918bf7fd 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -182,6 +182,7 @@ u32 amd_iommu_max_pasid __read_mostly = ~0;
 
 bool amd_iommu_v2_present __read_mostly;
 static bool amd_iommu_pc_present __read_mostly;
+bool amdr_ivrs_remap_support __read_mostly;
 
 bool amd_iommu_force_isolation __read_mostly;
 
@@ -326,6 +327,8 @@ static void __init early_iommu_features_init(struct 
amd_iommu *iommu,
 {
if (amd_iommu_ivinfo & IOMMU_IVINFO_EFRSUP)
iommu->features = h->efr_reg;
+   if (amd_iommu_ivinfo & IOMMU_IVINFO_DMA_REMAP)
+   amdr_ivrs_remap_support = true;
 }
 
 /* Access to l1 and l2 indexed register spaces */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e..991f10ce350e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2162,6 +2162,8 @@ static bool amd_iommu_capable(enum iommu_cap cap)
return (irq_remapping_enabled == 1);
case IOMMU_CAP_NOEXEC:
return false;
+   case IOMMU_CAP_PRE_BOOT_PROTECTION:
+   return amdr_ivrs_remap_support;
default:
break;
}
-- 
2.35.3.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/4] iommu: Add capability for pre-boot DMA protection

2022-04-25 Thread Robin Murphy
VT-d's dmar_platform_optin() actually represents a combination of
properties fairly well standardised by Microsoft as "Pre-boot DMA
Protection" and "Kernel DMA Protection"[1]. As such, we can provide
interested consumers with an abstracted capability rather than
driver-specific interfaces that won't scale. We name it for the former
aspect since that's what external callers are most likely to be
interested in; the latter is for the IOMMU layer to handle itself.

[1] 
https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection

Suggested-by: Christoph Hellwig 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Lu Baolu 
Signed-off-by: Robin Murphy 
---

v4: No change

 drivers/iommu/intel/iommu.c | 2 ++
 include/linux/iommu.h   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index df5c62ecf942..0edf6084dc14 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4551,6 +4551,8 @@ static bool intel_iommu_capable(enum iommu_cap cap)
return domain_update_iommu_snooping(NULL);
if (cap == IOMMU_CAP_INTR_REMAP)
return irq_remapping_enabled == 1;
+   if (cap == IOMMU_CAP_PRE_BOOT_PROTECTION)
+   return dmar_platform_optin();
 
return false;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e26cf84e5d82..4123693ae319 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -107,6 +107,8 @@ enum iommu_cap {
   transactions */
IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
+   IOMMU_CAP_PRE_BOOT_PROTECTION,  /* Firmware says it used the IOMMU for
+  DMA protection and we should too */
 };
 
 /* These are the possible reserved region types */
-- 
2.35.3.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/4] iommu: Introduce device_iommu_capable()

2022-04-25 Thread Robin Murphy
iommu_capable() only really works for systems where all IOMMU instances
are completely homogeneous, and all devices are IOMMU-mapped. Implement
the new variant which will be able to give a more accurate answer for
whichever device the caller is actually interested in, and even more so
once all the external users have been converted and we can reliably pass
the device pointer through the internal driver interface too.

Signed-off-by: Robin Murphy 
---

v4: Hold off changing the internal callback interface for now

 drivers/iommu/iommu.c | 23 +++
 include/linux/iommu.h |  6 ++
 2 files changed, 29 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2c45b85b9fc..780c11734979 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1906,6 +1906,29 @@ bool iommu_present(struct bus_type *bus)
 }
 EXPORT_SYMBOL_GPL(iommu_present);
 
+/**
+ * device_iommu_capable() - check for a general IOMMU capability
+ * @dev: device to which the capability would be relevant, if available
+ * @cap: IOMMU capability
+ *
+ * Return: true if an IOMMU is present and supports the given capability
+ * for the given device, otherwise false.
+ */
+bool device_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+   const struct iommu_ops *ops;
+
+   if (!dev->iommu || !dev->iommu->iommu_dev)
+   return false;
+
+   ops = dev_iommu_ops(dev);
+   if (!ops->capable)
+   return false;
+
+   return ops->capable(cap);
+}
+EXPORT_SYMBOL_GPL(device_iommu_capable);
+
 bool iommu_capable(struct bus_type *bus, enum iommu_cap cap)
 {
if (!bus->iommu_ops || !bus->iommu_ops->capable)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9208eca4b0d1..e26cf84e5d82 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -417,6 +417,7 @@ static inline const struct iommu_ops *dev_iommu_ops(struct 
device *dev)
 extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops);
 extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
+extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
 extern bool iommu_capable(struct bus_type *bus, enum iommu_cap cap);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
 extern struct iommu_group *iommu_group_get_by_id(int id);
@@ -689,6 +690,11 @@ static inline bool iommu_present(struct bus_type *bus)
return false;
 }
 
+static inline bool device_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+   return false;
+}
+
 static inline bool iommu_capable(struct bus_type *bus, enum iommu_cap cap)
 {
return false;
-- 
2.35.3.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/4] iommu, thunderbolt: Make iommu_dma_protection more accurate

2022-04-25 Thread Robin Murphy
Hi all, 

As promised, here's the really-actually-final version, cleaning up the
new interface in patch #1 to not introduce the new parameter before it's
ready, and rebased to make sure it correctly applies on -rc3.

Thanks,
Robin.


Mario Limonciello (1):
  iommu/amd: Indicate whether DMA remap support is enabled

Robin Murphy (3):
  iommu: Introduce device_iommu_capable()
  iommu: Add capability for pre-boot DMA protection
  thunderbolt: Make iommu_dma_protection more accurate

 drivers/iommu/amd/amd_iommu_types.h |  4 +++
 drivers/iommu/amd/init.c|  3 ++
 drivers/iommu/amd/iommu.c   |  2 ++
 drivers/iommu/intel/iommu.c |  2 ++
 drivers/iommu/iommu.c   | 23 +++
 drivers/thunderbolt/domain.c| 12 ++--
 drivers/thunderbolt/nhi.c   | 44 +
 include/linux/iommu.h   |  8 ++
 include/linux/thunderbolt.h |  2 ++
 9 files changed, 91 insertions(+), 9 deletions(-)

-- 
2.35.3.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 37/37] iommu/amd: Update amd_iommu_fault structure to include PCI seg ID

2022-04-25 Thread Vasant Hegde via iommu
Rename 'device_id' as 'sbdf' and extend it to 32bit so that we can
pass PCI segment ID to ppr_notifier(). Also pass PCI segment ID to
pci_get_domain_bus_and_slot() instead of default value.

Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 2 +-
 drivers/iommu/amd/iommu.c   | 2 +-
 drivers/iommu/amd/iommu_v2.c| 9 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index f2bbcb19e92c..a908f18a3632 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -477,7 +477,7 @@ extern struct kmem_cache *amd_iommu_irq_cache;
 struct amd_iommu_fault {
u64 address;/* IO virtual address of the fault*/
u32 pasid;  /* Address space identifier */
-   u16 device_id;  /* Originating PCI device id */
+   u32 sbdf;   /* Originating PCI device id */
u16 tag;/* PPR tag */
u16 flags;  /* Fault flags */
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 47946894aff3..5f48cddeaa29 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -701,7 +701,7 @@ static void iommu_handle_ppr_entry(struct amd_iommu *iommu, 
u64 *raw)
 
fault.address   = raw[1];
fault.pasid = PPR_PASID(raw[0]);
-   fault.device_id = PPR_DEVID(raw[0]);
+   fault.sbdf  = (iommu->pci_seg->id << 16) | PPR_DEVID(raw[0]);
fault.tag   = PPR_TAG(raw[0]);
fault.flags = PPR_FLAGS(raw[0]);
 
diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index b186d6e0..631ded8168ff 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -518,15 +518,16 @@ static int ppr_notifier(struct notifier_block *nb, 
unsigned long e, void *data)
unsigned long flags;
struct fault *fault;
bool finish;
-   u16 tag, devid;
+   u16 tag, devid, seg_id;
int ret;
 
iommu_fault = data;
tag = iommu_fault->tag & 0x1ff;
finish  = (iommu_fault->tag >> 9) & 1;
 
-   devid = iommu_fault->device_id;
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   seg_id = (iommu_fault->sbdf >> 16) & 0x;
+   devid = iommu_fault->sbdf & 0x;
+   pdev = pci_get_domain_bus_and_slot(seg_id, PCI_BUS_NUM(devid),
   devid & 0xff);
if (!pdev)
return -ENODEV;
@@ -540,7 +541,7 @@ static int ppr_notifier(struct notifier_block *nb, unsigned 
long e, void *data)
goto out;
}
 
-   dev_state = get_device_state(iommu_fault->device_id);
+   dev_state = get_device_state(iommu_fault->sbdf);
if (dev_state == NULL)
goto out;
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 36/37] iommu/amd: Update device_state structure to include PCI seg ID

2022-04-25 Thread Vasant Hegde via iommu
Rename struct device_state.devid variable to struct device_state.sbdf
and extend it to 32-bit to include the 16-bit PCI segment ID via
the helper function get_pci_sbdf_id().

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu_v2.c | 58 +++-
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index e56b137ceabd..b186d6e0 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -51,7 +51,7 @@ struct pasid_state {
 
 struct device_state {
struct list_head list;
-   u16 devid;
+   u32 sbdf;
atomic_t count;
struct pci_dev *pdev;
struct pasid_state **states;
@@ -83,35 +83,25 @@ static struct workqueue_struct *iommu_wq;
 
 static void free_pasid_states(struct device_state *dev_state);
 
-static u16 device_id(struct pci_dev *pdev)
-{
-   u16 devid;
-
-   devid = pdev->bus->number;
-   devid = (devid << 8) | pdev->devfn;
-
-   return devid;
-}
-
-static struct device_state *__get_device_state(u16 devid)
+static struct device_state *__get_device_state(u32 sbdf)
 {
struct device_state *dev_state;
 
list_for_each_entry(dev_state, _list, list) {
-   if (dev_state->devid == devid)
+   if (dev_state->sbdf == sbdf)
return dev_state;
}
 
return NULL;
 }
 
-static struct device_state *get_device_state(u16 devid)
+static struct device_state *get_device_state(u32 sbdf)
 {
struct device_state *dev_state;
unsigned long flags;
 
spin_lock_irqsave(_lock, flags);
-   dev_state = __get_device_state(devid);
+   dev_state = __get_device_state(sbdf);
if (dev_state != NULL)
atomic_inc(_state->count);
spin_unlock_irqrestore(_lock, flags);
@@ -609,7 +599,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
struct pasid_state *pasid_state;
struct device_state *dev_state;
struct mm_struct *mm;
-   u16 devid;
+   u32 sbdf;
int ret;
 
might_sleep();
@@ -617,8 +607,8 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
if (!amd_iommu_v2_supported())
return -ENODEV;
 
-   devid = device_id(pdev);
-   dev_state = get_device_state(devid);
+   sbdf  = get_pci_sbdf_id(pdev);
+   dev_state = get_device_state(sbdf);
 
if (dev_state == NULL)
return -EINVAL;
@@ -692,15 +682,15 @@ void amd_iommu_unbind_pasid(struct pci_dev *pdev, u32 
pasid)
 {
struct pasid_state *pasid_state;
struct device_state *dev_state;
-   u16 devid;
+   u32 sbdf;
 
might_sleep();
 
if (!amd_iommu_v2_supported())
return;
 
-   devid = device_id(pdev);
-   dev_state = get_device_state(devid);
+   sbdf = get_pci_sbdf_id(pdev);
+   dev_state = get_device_state(sbdf);
if (dev_state == NULL)
return;
 
@@ -742,7 +732,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
struct iommu_group *group;
unsigned long flags;
int ret, tmp;
-   u16 devid;
+   u32 sbdf;
 
might_sleep();
 
@@ -759,7 +749,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
if (pasids <= 0 || pasids > (PASID_MASK + 1))
return -EINVAL;
 
-   devid = device_id(pdev);
+   sbdf = get_pci_sbdf_id(pdev);
 
dev_state = kzalloc(sizeof(*dev_state), GFP_KERNEL);
if (dev_state == NULL)
@@ -768,7 +758,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
spin_lock_init(_state->lock);
init_waitqueue_head(_state->wq);
dev_state->pdev  = pdev;
-   dev_state->devid = devid;
+   dev_state->sbdf = sbdf;
 
tmp = pasids;
for (dev_state->pasid_levels = 0; (tmp - 1) & ~0x1ff; tmp >>= 9)
@@ -806,7 +796,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 
spin_lock_irqsave(_lock, flags);
 
-   if (__get_device_state(devid) != NULL) {
+   if (__get_device_state(sbdf) != NULL) {
spin_unlock_irqrestore(_lock, flags);
ret = -EBUSY;
goto out_free_domain;
@@ -838,16 +828,16 @@ void amd_iommu_free_device(struct pci_dev *pdev)
 {
struct device_state *dev_state;
unsigned long flags;
-   u16 devid;
+   u32 sbdf;
 
if (!amd_iommu_v2_supported())
return;
 
-   devid = device_id(pdev);
+   sbdf = get_pci_sbdf_id(pdev);
 
spin_lock_irqsave(_lock, flags);
 
-   dev_state = __get_device_state(devid);
+   dev_state = __get_device_state(sbdf);
if (dev_state == NULL) {
spin_unlock_irqrestore(_lock, flags);
return;
@@ -867,18 +857,18 @@ int 

[PATCH v2 35/37] iommu/amd: Print PCI segment ID in error log messages

2022-04-25 Thread Vasant Hegde via iommu
Print pci segment ID along with bdf. Useful for debugging.

Co-developed-by: Suravee Suthikulpaint 
Signed-off-by: Suravee Suthikulpaint 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/init.c  | 10 +-
 drivers/iommu/amd/iommu.c | 36 ++--
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index ba0ef8192a2f..24814ec3dca8 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1850,11 +1850,11 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
h = (struct ivhd_header *)p;
if (*p == amd_iommu_target_ivhd_type) {
 
-   DUMP_printk("device: %02x:%02x.%01x cap: %04x "
-   "seg: %d flags: %01x info %04x\n",
-   PCI_BUS_NUM(h->devid), PCI_SLOT(h->devid),
-   PCI_FUNC(h->devid), h->cap_ptr,
-   h->pci_seg, h->flags, h->info);
+   DUMP_printk("device: %04x:%02x:%02x.%01x cap: %04x "
+   "flags: %01x info %04x\n",
+   h->pci_seg, PCI_BUS_NUM(h->devid),
+   PCI_SLOT(h->devid), PCI_FUNC(h->devid),
+   h->cap_ptr, h->flags, h->info);
DUMP_printk("   mmio-addr: %016llx\n",
h->mmio_phys);
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 46236fb05a1f..47946894aff3 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -496,8 +496,8 @@ static void amd_iommu_report_rmp_hw_error(struct amd_iommu 
*iommu, volatile u32
vmg_tag, spa, flags);
}
} else {
-   pr_err_ratelimited("Event logged [RMP_HW_ERROR 
device=%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   pr_err_ratelimited("Event logged [RMP_HW_ERROR 
device=%04x:%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
vmg_tag, spa, flags);
}
 
@@ -529,8 +529,8 @@ static void amd_iommu_report_rmp_fault(struct amd_iommu 
*iommu, volatile u32 *ev
vmg_tag, gpa, flags_rmp, flags);
}
} else {
-   pr_err_ratelimited("Event logged [RMP_PAGE_FAULT 
device=%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, 
flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   pr_err_ratelimited("Event logged [RMP_PAGE_FAULT 
device=%04x:%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, 
flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
vmg_tag, gpa, flags_rmp, flags);
}
 
@@ -576,8 +576,8 @@ static void amd_iommu_report_page_fault(struct amd_iommu 
*iommu,
domain_id, address, flags);
}
} else {
-   pr_err_ratelimited("Event logged [IO_PAGE_FAULT 
device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   pr_err_ratelimited("Event logged [IO_PAGE_FAULT 
device=%04x:%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
domain_id, address, flags);
}
 
@@ -620,20 +620,20 @@ static void iommu_print_event(struct amd_iommu *iommu, 
void *__evt)
 
switch (type) {
case EVENT_TYPE_ILL_DEV:
-   dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY 
device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY 
device=%04x:%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);
dump_dte_entry(iommu, devid);
break;
case EVENT_TYPE_DEV_TAB_ERR:
-   dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR 
device=%02x:%02x.%x "
+   dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR 
device=%04x:%02x:%02x.%x "
"address=0x%llx flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 

[PATCH v2 34/37] iommu/amd: Add PCI segment support for ivrs_ioapic, ivrs_hpet, ivrs_acpihid commands

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

By default, PCI segment is zero and can be omitted. To support system
with non-zero PCI segment ID, modify the parsing functions to allow
PCI segment ID.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 .../admin-guide/kernel-parameters.txt | 34 +++
 drivers/iommu/amd/init.c  | 41 ---
 2 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index f5a27f067db9..cc8f0c82ff55 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2208,23 +2208,39 @@
 
ivrs_ioapic [HW,X86-64]
Provide an override to the IOAPIC-ID<->DEVICE-ID
-   mapping provided in the IVRS ACPI table. For
-   example, to map IOAPIC-ID decimal 10 to
-   PCI device 00:14.0 write the parameter as:
+   mapping provided in the IVRS ACPI table.
+   By default, PCI segment is 0, and can be omitted.
+   For example:
+   * To map IOAPIC-ID decimal 10 to PCI device 00:14.0
+ write the parameter as:
ivrs_ioapic[10]=00:14.0
+   * To map IOAPIC-ID decimal 10 to PCI segment 0x1 and
+ PCI device 00:14.0 write the parameter as:
+   ivrs_ioapic[10]=0001:00:14.0
 
ivrs_hpet   [HW,X86-64]
Provide an override to the HPET-ID<->DEVICE-ID
-   mapping provided in the IVRS ACPI table. For
-   example, to map HPET-ID decimal 0 to
-   PCI device 00:14.0 write the parameter as:
+   mapping provided in the IVRS ACPI table.
+   By default, PCI segment is 0, and can be omitted.
+   For example:
+   * To map HPET-ID decimal 0 to PCI device 00:14.0
+ write the parameter as:
ivrs_hpet[0]=00:14.0
+   * To map HPET-ID decimal 10 to PCI segment 0x1 and
+ PCI device 00:14.0 write the parameter as:
+   ivrs_ioapic[10]=0001:00:14.0
 
ivrs_acpihid[HW,X86-64]
Provide an override to the ACPI-HID:UID<->DEVICE-ID
-   mapping provided in the IVRS ACPI table. For
-   example, to map UART-HID:UID AMD0020:0 to
-   PCI device 00:14.5 write the parameter as:
+   mapping provided in the IVRS ACPI table.
+
+   For example, to map UART-HID:UID AMD0020:0 to
+   PCI segment 0x1 and PCI device ID 00:14.5,
+   write the parameter as:
+   ivrs_acpihid[0001:00:14.5]=AMD0020:0
+
+   By default, PCI segment is 0, and can be omitted.
+   For example, PCI device 00:14.5 write the parameter as:
ivrs_acpihid[00:14.5]=AMD0020:0
 
js= [HW,JOY] Analog joystick
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index ccc0208d4b69..ba0ef8192a2f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3288,15 +3288,17 @@ static int __init parse_amd_iommu_options(char *str)
 
 static int __init parse_ivrs_ioapic(char *str)
 {
-   unsigned int bus, dev, fn;
+   u32 seg = 0, bus, dev, fn;
int ret, id, i;
u16 devid;
 
ret = sscanf(str, "[%d]=%x:%x.%x", , , , );
-
if (ret != 4) {
-   pr_err("Invalid command line: ivrs_ioapic%s\n", str);
-   return 1;
+   ret = sscanf(str, "[%d]=%x:%x:%x.%x", , , , , 
);
+   if (ret != 5) {
+   pr_err("Invalid command line: ivrs_ioapic%s\n", str);
+   return 1;
+   }
}
 
if (early_ioapic_map_size == EARLY_MAP_SIZE) {
@@ -3305,7 +3307,8 @@ static int __init parse_ivrs_ioapic(char *str)
return 1;
}
 
-   devid = ((bus & 0xff) << 8) | ((dev & 0x1f) << 3) | (fn & 0x7);
+   devid = ((seg & 0x) << 16) | ((bus & 0xff) << 8) |
+   ((dev & 0x1f) << 3) | (fn & 0x7);
 
cmdline_maps= true;
i   = early_ioapic_map_size++;
@@ -3318,15 +3321,17 @@ static int __init parse_ivrs_ioapic(char *str)
 
 static int __init parse_ivrs_hpet(char *str)
 {
-   unsigned int bus, dev, fn;
+   u32 seg = 0, bus, dev, fn;
int ret, id, i;
u16 devid;
 
ret = sscanf(str, "[%d]=%x:%x.%x", , , , );
-
if (ret != 4) {
- 

[PATCH v2 33/37] iommu/amd: Specify PCI segment ID when getting pci device

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Upcoming AMD systems can have multiple PCI segments. Hence pass PCI
segment ID to pci_get_domain_bus_and_slot() instead of '0'.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/init.c  |  6 --
 drivers/iommu/amd/iommu.c | 19 ++-
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 4a9f424eb4b4..ccc0208d4b69 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1961,7 +1961,8 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
int cap_ptr = iommu->cap_ptr;
int ret;
 
-   iommu->dev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(iommu->devid),
+   iommu->dev = pci_get_domain_bus_and_slot(iommu->pci_seg->id,
+PCI_BUS_NUM(iommu->devid),
 iommu->devid & 0xff);
if (!iommu->dev)
return -ENODEV;
@@ -2024,7 +2025,8 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
int i, j;
 
iommu->root_pdev =
-   pci_get_domain_bus_and_slot(0, iommu->dev->bus->number,
+   pci_get_domain_bus_and_slot(iommu->pci_seg->id,
+   iommu->dev->bus->number,
PCI_DEVFN(0, 0));
 
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 1e375d469280..46236fb05a1f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -473,7 +473,7 @@ static void dump_command(unsigned long phys_addr)
pr_err("CMD[%d]: %08x\n", i, cmd->data[i]);
 }
 
-static void amd_iommu_report_rmp_hw_error(volatile u32 *event)
+static void amd_iommu_report_rmp_hw_error(struct amd_iommu *iommu, volatile 
u32 *event)
 {
struct iommu_dev_data *dev_data = NULL;
int devid, vmg_tag, flags;
@@ -485,7 +485,7 @@ static void amd_iommu_report_rmp_hw_error(volatile u32 
*event)
flags   = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK;
spa = ((u64)event[3] << 32) | (event[2] & 0xFFF8);
 
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 
PCI_BUS_NUM(devid),
   devid & 0xff);
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
@@ -505,7 +505,7 @@ static void amd_iommu_report_rmp_hw_error(volatile u32 
*event)
pci_dev_put(pdev);
 }
 
-static void amd_iommu_report_rmp_fault(volatile u32 *event)
+static void amd_iommu_report_rmp_fault(struct amd_iommu *iommu, volatile u32 
*event)
 {
struct iommu_dev_data *dev_data = NULL;
int devid, flags_rmp, vmg_tag, flags;
@@ -518,7 +518,7 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event)
flags = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK;
gpa   = ((u64)event[3] << 32) | event[2];
 
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 
PCI_BUS_NUM(devid),
   devid & 0xff);
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
@@ -544,13 +544,14 @@ static void amd_iommu_report_rmp_fault(volatile u32 
*event)
 #define IS_WRITE_REQUEST(flags)\
((flags) & EVENT_FLAG_RW)
 
-static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
+static void amd_iommu_report_page_fault(struct amd_iommu *iommu,
+   u16 devid, u16 domain_id,
u64 address, int flags)
 {
struct iommu_dev_data *dev_data = NULL;
struct pci_dev *pdev;
 
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 
PCI_BUS_NUM(devid),
   devid & 0xff);
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
@@ -613,7 +614,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
}
 
if (type == EVENT_TYPE_IO_FAULT) {
-   amd_iommu_report_page_fault(devid, pasid, address, flags);
+   amd_iommu_report_page_fault(iommu, devid, pasid, address, 
flags);
return;
}
 
@@ -654,10 +655,10 @@ static void iommu_print_event(struct amd_iommu *iommu, 
void *__evt)
pasid, address, flags);
break;
case EVENT_TYPE_RMP_FAULT:
-   amd_iommu_report_rmp_fault(event);
+   amd_iommu_report_rmp_fault(iommu, event);
break;
case EVENT_TYPE_RMP_HW_ERR:
-   amd_iommu_report_rmp_hw_error(event);
+   

[PATCH v2 32/37] iommu/amd: Include PCI segment ID when initialize IOMMU

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Extend current device ID variables to 32-bit to include the 16-bit
segment ID when parsing device information from IVRS table to initialize
each IOMMU.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  2 +-
 drivers/iommu/amd/amd_iommu_types.h |  6 ++--
 drivers/iommu/amd/init.c| 56 +++--
 drivers/iommu/amd/quirks.c  |  4 +--
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 4dad1b442409..9be5ad746d47 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -125,7 +125,7 @@ static inline int get_pci_sbdf_id(struct pci_dev *pdev)
 
 extern bool translation_pre_enabled(struct amd_iommu *iommu);
 extern bool amd_iommu_is_attach_deferred(struct device *dev);
-extern int __init add_special_device(u8 type, u8 id, u16 *devid,
+extern int __init add_special_device(u8 type, u8 id, u32 *devid,
 bool cmd_line);
 
 #ifdef CONFIG_DMI
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 1109961e1042..f2bbcb19e92c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -734,8 +734,8 @@ struct acpihid_map_entry {
struct list_head list;
u8 uid[ACPIHID_UID_LEN];
u8 hid[ACPIHID_HID_LEN];
-   u16 devid;
-   u16 root_devid;
+   u32 devid;
+   u32 root_devid;
bool cmd_line;
struct iommu_group *group;
 };
@@ -743,7 +743,7 @@ struct acpihid_map_entry {
 struct devid_map {
struct list_head list;
u8 id;
-   u16 devid;
+   u32 devid;
bool cmd_line;
 };
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 093304d16c85..4a9f424eb4b4 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1147,7 +1147,7 @@ static void __init set_dev_entry_from_acpi(struct 
amd_iommu *iommu,
amd_iommu_set_rlookup_table(iommu, devid);
 }
 
-int __init add_special_device(u8 type, u8 id, u16 *devid, bool cmd_line)
+int __init add_special_device(u8 type, u8 id, u32 *devid, bool cmd_line)
 {
struct devid_map *entry;
struct list_head *list;
@@ -1184,7 +1184,7 @@ int __init add_special_device(u8 type, u8 id, u16 *devid, 
bool cmd_line)
return 0;
 }
 
-static int __init add_acpi_hid_device(u8 *hid, u8 *uid, u16 *devid,
+static int __init add_acpi_hid_device(u8 *hid, u8 *uid, u32 *devid,
  bool cmd_line)
 {
struct acpihid_map_entry *entry;
@@ -1263,7 +1263,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 {
u8 *p = (u8 *)h;
u8 *end = p, flags = 0;
-   u16 devid = 0, devid_start = 0, devid_to = 0;
+   u16 devid = 0, devid_start = 0, devid_to = 0, seg_id;
u32 dev_i, ext_flags = 0;
bool alias = false;
struct ivhd_entry *e;
@@ -1299,6 +1299,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 
while (p < end) {
e = (struct ivhd_entry *)p;
+   seg_id = pci_seg->id;
+
switch (e->type) {
case IVHD_DEV_ALL:
 
@@ -1309,9 +1311,9 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case IVHD_DEV_SELECT:
 
-   DUMP_printk("  DEV_SELECT\t\t\t devid: %02x:%02x.%x "
+   DUMP_printk("  DEV_SELECT\t\t\t devid: 
%04x:%02x:%02x.%x "
"flags: %02x\n",
-   PCI_BUS_NUM(e->devid),
+   seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
e->flags);
@@ -1322,8 +1324,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
case IVHD_DEV_SELECT_RANGE_START:
 
DUMP_printk("  DEV_SELECT_RANGE_START\t "
-   "devid: %02x:%02x.%x flags: %02x\n",
-   PCI_BUS_NUM(e->devid),
+   "devid: %04x:%02x:%02x.%x flags: %02x\n",
+   seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
e->flags);
@@ -1335,9 +1337,9 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case IVHD_DEV_ALIAS:
 
-   DUMP_printk("  DEV_ALIAS\t\t\t devid: %02x:%02x.%x "
+   DUMP_printk("  DEV_ALIAS\t\t\t devid: %04x:%02x:%02x.%x 
"
"flags: 

[PATCH v2 31/37] iommu/amd: Introduce get_device_sbdf_id() helper function

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Current get_device_id() only provide 16-bit PCI device ID (i.e. BDF).
With multiple PCI segment support, we need to extend the helper function
to include PCI segment ID.

So, introduce a new helper function get_device_sbdf_id() to replace
the current get_pci_device_id().

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu.h |  7 ++
 drivers/iommu/amd/iommu.c | 40 +--
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 64c954e168d7..4dad1b442409 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -115,6 +115,13 @@ void amd_iommu_domain_clr_pt_root(struct protection_domain 
*domain)
amd_iommu_domain_set_pt_root(domain, 0);
 }
 
+static inline int get_pci_sbdf_id(struct pci_dev *pdev)
+{
+   int seg = pci_domain_nr(pdev->bus);
+   u16 devid = pci_dev_id(pdev);
+
+   return ((seg << 16) | (devid & 0x));
+}
 
 extern bool translation_pre_enabled(struct amd_iommu *iommu);
 extern bool amd_iommu_is_attach_deferred(struct device *dev);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 40415e477853..1e375d469280 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -92,13 +92,6 @@ static void detach_device(struct device *dev);
  *
  /
 
-static inline u16 get_pci_device_id(struct device *dev)
-{
-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   return pci_dev_id(pdev);
-}
-
 static inline int get_acpihid_device_id(struct device *dev,
struct acpihid_map_entry **entry)
 {
@@ -119,16 +112,16 @@ static inline int get_acpihid_device_id(struct device 
*dev,
return -EINVAL;
 }
 
-static inline int get_device_id(struct device *dev)
+static inline int get_device_sbdf_id(struct device *dev)
 {
-   int devid;
+   int sbdf;
 
if (dev_is_pci(dev))
-   devid = get_pci_device_id(dev);
+   sbdf = get_pci_sbdf_id(to_pci_dev(dev));
else
-   devid = get_acpihid_device_id(dev, NULL);
+   sbdf = get_acpihid_device_id(dev, NULL);
 
-   return devid;
+   return sbdf;
 }
 
 struct dev_table_entry *get_dev_table(struct amd_iommu *iommu)
@@ -182,9 +175,11 @@ static struct amd_iommu *__rlookup_amd_iommu(u16 seg, u16 
devid)
 static struct amd_iommu *rlookup_amd_iommu(struct device *dev)
 {
u16 seg = get_device_segment(dev);
-   u16 devid = get_device_id(dev);
+   int devid = get_device_sbdf_id(dev);
 
-   return __rlookup_amd_iommu(seg, devid);
+   if (devid < 0)
+   return NULL;
+   return __rlookup_amd_iommu(seg, (devid & 0x));
 }
 
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
@@ -365,9 +360,10 @@ static bool check_device(struct device *dev)
if (!dev)
return false;
 
-   devid = get_device_id(dev);
+   devid = get_device_sbdf_id(dev);
if (devid < 0)
return false;
+   devid &= 0x;
 
iommu = rlookup_amd_iommu(dev);
if (!iommu)
@@ -375,7 +371,7 @@ static bool check_device(struct device *dev)
 
/* Out of our scope? */
pci_seg = iommu->pci_seg;
-   if ((devid & 0x) > pci_seg->last_bdf)
+   if (devid > pci_seg->last_bdf)
return false;
 
return true;
@@ -389,10 +385,11 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct device *dev)
if (dev_iommu_priv_get(dev))
return 0;
 
-   devid = get_device_id(dev);
+   devid = get_device_sbdf_id(dev);
if (devid < 0)
return devid;
 
+   devid &= 0x;
dev_data = find_dev_data(iommu, devid);
if (!dev_data)
return -ENOMEM;
@@ -422,10 +419,11 @@ static void iommu_ignore_device(struct amd_iommu *iommu, 
struct device *dev)
struct dev_table_entry *dev_table = get_dev_table(iommu);
int devid;
 
-   devid = (get_device_id(dev)) & 0x;
+   devid = get_device_sbdf_id(dev);
if (devid < 0)
return;
 
+   devid &= 0x;
pci_seg->rlookup_table[devid] = NULL;
memset(_table[devid], 0, sizeof(struct dev_table_entry));
 
@@ -2265,9 +2263,11 @@ static void amd_iommu_get_resv_regions(struct device 
*dev,
struct amd_iommu_pci_seg *pci_seg;
int devid;
 
-   devid = get_device_id(dev);
+   devid = get_device_sbdf_id(dev);
if (devid < 0)
return;
+   devid &= 0x;
+
iommu = rlookup_amd_iommu(dev);
if (!iommu)
return;
@@ -3154,7 +3154,7 @@ static int get_devid(struct irq_alloc_info *info)
return get_hpet_devid(info->devid);
case 

[PATCH v2 30/37] iommu/amd: Flush upto last_bdf only

2022-04-25 Thread Vasant Hegde via iommu
Fix amd_iommu_flush_dte_all() and amd_iommu_flush_tlb_all() to flush
upto last_bdf only.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 5976038d48a3..40415e477853 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1191,8 +1191,9 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 
devid)
 static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
 {
u32 devid;
+   u16 last_bdf = iommu->pci_seg->last_bdf;
 
-   for (devid = 0; devid <= 0x; ++devid)
+   for (devid = 0; devid <= last_bdf; ++devid)
iommu_flush_dte(iommu, devid);
 
iommu_completion_wait(iommu);
@@ -1205,8 +1206,9 @@ static void amd_iommu_flush_dte_all(struct amd_iommu 
*iommu)
 static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
 {
u32 dom_id;
+   u16 last_bdf = iommu->pci_seg->last_bdf;
 
-   for (dom_id = 0; dom_id <= 0x; ++dom_id) {
+   for (dom_id = 0; dom_id <= last_bdf; ++dom_id) {
struct iommu_cmd cmd;
build_inv_iommu_pages(, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
  dom_id, 1);
@@ -1249,8 +1251,9 @@ static void iommu_flush_irt(struct amd_iommu *iommu, u16 
devid)
 static void amd_iommu_flush_irt_all(struct amd_iommu *iommu)
 {
u32 devid;
+   u16 last_bdf = iommu->pci_seg->last_bdf;
 
-   for (devid = 0; devid <= MAX_DEV_TABLE_ENTRIES; devid++)
+   for (devid = 0; devid <= last_bdf; devid++)
iommu_flush_irt(iommu, devid);
 
iommu_completion_wait(iommu);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 29/37] iommu/amd: Remove global amd_iommu_last_bdf

2022-04-25 Thread Vasant Hegde via iommu
Replace it with per PCI segment last_bdf variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 ---
 drivers/iommu/amd/init.c| 35 ++---
 drivers/iommu/amd/iommu.c   | 10 ++---
 3 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 0aa170014b85..1109961e1042 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -829,9 +829,6 @@ struct unity_map_entry {
 /* size of the dma_ops aperture as power of 2 */
 extern unsigned amd_iommu_aperture_order;
 
-/* largest PCI device id we expect translation requests for */
-extern u16 amd_iommu_last_bdf;
-
 /* allocation bitmap for domain ids */
 extern unsigned long *amd_iommu_pd_alloc_bitmap;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b3905b1c4bc9..093304d16c85 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -161,9 +161,6 @@ static bool amd_iommu_disabled __initdata;
 static bool amd_iommu_force_enable __initdata;
 static int amd_iommu_target_ivhd_type;
 
-u16 amd_iommu_last_bdf;/* largest PCI device id we have
-  to handle */
-
 LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
   system */
@@ -245,16 +242,10 @@ static void init_translation_status(struct amd_iommu 
*iommu)
iommu->flags |= AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
 }
 
-static inline void update_last_devid(u16 devid)
-{
-   if (devid > amd_iommu_last_bdf)
-   amd_iommu_last_bdf = devid;
-}
-
-static inline unsigned long tbl_size(int entry_size)
+static inline unsigned long tbl_size(int entry_size, int last_bdf)
 {
unsigned shift = PAGE_SHIFT +
-get_order(((int)amd_iommu_last_bdf + 1) * entry_size);
+get_order((last_bdf + 1) * entry_size);
 
return 1UL << shift;
 }
@@ -538,7 +529,6 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
switch (dev->type) {
case IVHD_DEV_ALL:
/* Use maximum BDF value for DEV_ALL */
-   update_last_devid(0x);
return 0x;
break;
case IVHD_DEV_SELECT:
@@ -546,7 +536,6 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
case IVHD_DEV_ALIAS:
case IVHD_DEV_EXT_SELECT:
/* all the above subfield types refer to device ids */
-   update_last_devid(dev->devid);
if (dev->devid > last_devid)
last_devid = dev->devid;
break;
@@ -688,7 +677,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
/*
 * let all alias entries point to itself
 */
-   for (i = 0; i <= amd_iommu_last_bdf; ++i)
+   for (i = 0; i <= pci_seg->last_bdf; ++i)
pci_seg->alias_table[i] = i;
 
return 0;
@@ -1054,7 +1043,7 @@ static bool __copy_device_table(struct amd_iommu *iommu)
return false;
}
 
-   for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
+   for (devid = 0; devid <= pci_seg->last_bdf; ++devid) {
pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
@@ -1315,7 +1304,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 
DUMP_printk("  DEV_ALL\t\t\tflags: %02x\n", e->flags);
 
-   for (dev_i = 0; dev_i <= amd_iommu_last_bdf; ++dev_i)
+   for (dev_i = 0; dev_i <= pci_seg->last_bdf; ++dev_i)
set_dev_entry_from_acpi(iommu, dev_i, e->flags, 
0);
break;
case IVHD_DEV_SELECT:
@@ -1560,9 +1549,9 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
 
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
-   pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
-   pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
-   pci_seg->rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE);
+   pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE, last_bdf);
+   pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE, 
last_bdf);
+   pci_seg->rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE, 
last_bdf);
 
pci_seg->id = 

[PATCH v2 28/37] iommu/amd: Remove global amd_iommu_alias_table

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This is replaced by the per PCI segment alias table.
Also remove alias_table_size variable.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 --
 drivers/iommu/amd/init.c| 24 
 2 files changed, 30 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index dc76ee2c3ea5..0aa170014b85 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -826,12 +826,6 @@ struct unity_map_entry {
  * Data structures for device handling
  */
 
-/*
- * Alias table to find requestor ids to device ids. Not locked because only
- * read on runtime.
- */
-extern u16 *amd_iommu_alias_table;
-
 /* size of the dma_ops aperture as power of 2 */
 extern unsigned amd_iommu_aperture_order;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index dd667dfb4355..b3905b1c4bc9 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -185,21 +185,12 @@ static bool amd_iommu_pc_present __read_mostly;
 
 bool amd_iommu_force_isolation __read_mostly;
 
-/*
- * The alias table is a driver specific data structure which contains the
- * mappings of the PCI device ids to the actual requestor ids on the IOMMU.
- * More than one device can share the same requestor id.
- */
-u16 *amd_iommu_alias_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
  */
 unsigned long *amd_iommu_pd_alloc_bitmap;
 
-static u32 alias_table_size;   /* size of the alias table */
-
 enum iommu_init_state {
IOMMU_START_STATE,
IOMMU_IVRS_DETECTED,
@@ -2791,10 +2782,6 @@ static void __init free_iommu_resources(void)
kmem_cache_destroy(amd_iommu_irq_cache);
amd_iommu_irq_cache = NULL;
 
-   free_pages((unsigned long)amd_iommu_alias_table,
-  get_order(alias_table_size));
-   amd_iommu_alias_table = NULL;
-
free_iommu_all();
free_pci_segment();
 }
@@ -2923,20 +2910,9 @@ static int __init early_amd_iommu_init(void)
amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);
DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);
 
-   alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
-
/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
 
-   /*
-* Alias table - map PCI Bus/Dev/Func to Bus/Dev/Func the
-* IOMMU see for that device
-*/
-   amd_iommu_alias_table = (void *)__get_free_pages(GFP_KERNEL,
-   get_order(alias_table_size));
-   if (amd_iommu_alias_table == NULL)
-   goto out;
-
amd_iommu_pd_alloc_bitmap = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
get_order(MAX_DOMAIN_ID/8));
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 27/37] iommu/amd: Remove global amd_iommu_dev_table

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Replace global amd_iommu_dev_table with per PCI segment device table.
Also remove "dev_table_size".

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 --
 drivers/iommu/amd/init.c| 30 +++--
 drivers/iommu/amd/iommu.c   |  8 +---
 3 files changed, 8 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 334206381f84..dc76ee2c3ea5 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -826,12 +826,6 @@ struct unity_map_entry {
  * Data structures for device handling
  */
 
-/*
- * Device table used by hardware. Read and write accesses by software are
- * locked with the amd_iommu_pd_table lock.
- */
-extern struct dev_table_entry *amd_iommu_dev_table;
-
 /*
  * Alias table to find requestor ids to device ids. Not locked because only
  * read on runtime.
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b2ddf407e967..dd667dfb4355 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -185,14 +185,6 @@ static bool amd_iommu_pc_present __read_mostly;
 
 bool amd_iommu_force_isolation __read_mostly;
 
-/*
- * Pointer to the device table which is shared by all AMD IOMMUs
- * it is indexed by the PCI device id or the HT unit id and contains
- * information about the domain the device belongs to as well as the
- * page table root pointer.
- */
-struct dev_table_entry *amd_iommu_dev_table;
-
 /*
  * The alias table is a driver specific data structure which contains the
  * mappings of the PCI device ids to the actual requestor ids on the IOMMU.
@@ -206,7 +198,6 @@ u16 *amd_iommu_alias_table;
  */
 unsigned long *amd_iommu_pd_alloc_bitmap;
 
-static u32 dev_table_size; /* size of the device table */
 static u32 alias_table_size;   /* size of the alias table */
 
 enum iommu_init_state {
@@ -402,10 +393,11 @@ static void iommu_set_device_table(struct amd_iommu 
*iommu)
 {
u64 entry;
u32 dev_table_size = iommu->pci_seg->dev_table_size;
+   void *dev_table = (void *)get_dev_table(iommu);
 
BUG_ON(iommu->mmio_base == NULL);
 
-   entry = iommu_virt_to_phys(amd_iommu_dev_table);
+   entry = iommu_virt_to_phys(dev_table);
entry |= (dev_table_size >> 12) - 1;
memcpy_toio(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET,
, sizeof(entry));
@@ -1148,12 +1140,6 @@ void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, 
u16 devid)
set_dev_entry_bit(iommu, devid, DEV_ENTRY_IW);
 }
 
-/* Writes the specific IOMMU for a device into the rlookup table */
-static void __init set_iommu_for_device(struct amd_iommu *iommu, u16 devid)
-{
-   iommu->pci_seg->rlookup_table[devid] = iommu;
-}
-
 /*
  * This function takes the device specific flags read from the ACPI
  * table and sets up the device table entry with that information
@@ -1178,7 +1164,7 @@ static void __init set_dev_entry_from_acpi(struct 
amd_iommu *iommu,
 
amd_iommu_apply_erratum_63(iommu, devid);
 
-   set_iommu_for_device(iommu, devid);
+   amd_iommu_set_rlookup_table(iommu, devid);
 }
 
 int __init add_special_device(u8 type, u8 id, u16 *devid, bool cmd_line)
@@ -2809,10 +2795,6 @@ static void __init free_iommu_resources(void)
   get_order(alias_table_size));
amd_iommu_alias_table = NULL;
 
-   free_pages((unsigned long)amd_iommu_dev_table,
-  get_order(dev_table_size));
-   amd_iommu_dev_table = NULL;
-
free_iommu_all();
free_pci_segment();
 }
@@ -2941,16 +2923,10 @@ static int __init early_amd_iommu_init(void)
amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);
DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);
 
-   dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
 
/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
-   amd_iommu_dev_table = (void *)__get_free_pages(
- GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
- get_order(dev_table_size));
-   if (amd_iommu_dev_table == NULL)
-   goto out;
 
/*
 * Alias table - map PCI Bus/Dev/Func to Bus/Dev/Func the
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 079b38501b3b..476217f2890d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -230,6 +230,7 @@ static struct iommu_dev_data *search_dev_data(struct 
amd_iommu *iommu, u16 devid
 static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
 {
struct amd_iommu *iommu;
+   struct dev_table_entry *dev_table;
u16 devid = pci_dev_id(pdev);
 
if (devid == 

[PATCH v2 26/37] iommu/amd: Update set_dev_entry_bit() and get_dev_entry_bit()

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

To include a pointer to per PCI segment device table.

Also include struct amd_iommu as one of the function parameter to
amd_iommu_apply_erratum_63() since it is needed when setting up DTE.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h |  2 +-
 drivers/iommu/amd/init.c  | 59 +++
 drivers/iommu/amd/iommu.c |  2 +-
 3 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 2947239700ce..64c954e168d7 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -13,7 +13,7 @@
 
 extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
 extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
-extern void amd_iommu_apply_erratum_63(u16 devid);
+extern void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid);
 extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu);
 extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index dba1e03e0cd2..b2ddf407e967 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -988,22 +988,37 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
 }
 
 /* sets a specific bit in the device table entry. */
-static void set_dev_entry_bit(u16 devid, u8 bit)
+static void __set_dev_entry_bit(struct dev_table_entry *dev_table,
+   u16 devid, u8 bit)
 {
int i = (bit >> 6) & 0x03;
int _bit = bit & 0x3f;
 
-   amd_iommu_dev_table[devid].data[i] |= (1UL << _bit);
+   dev_table[devid].data[i] |= (1UL << _bit);
 }
 
-static int get_dev_entry_bit(u16 devid, u8 bit)
+static void set_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
+{
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+   return __set_dev_entry_bit(dev_table, devid, bit);
+}
+
+static int __get_dev_entry_bit(struct dev_table_entry *dev_table,
+  u16 devid, u8 bit)
 {
int i = (bit >> 6) & 0x03;
int _bit = bit & 0x3f;
 
-   return (amd_iommu_dev_table[devid].data[i] & (1UL << _bit)) >> _bit;
+   return (dev_table[devid].data[i] & (1UL << _bit)) >> _bit;
 }
 
+static int get_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
+{
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+   return __get_dev_entry_bit(dev_table, devid, bit);
+}
 
 static bool __copy_device_table(struct amd_iommu *iommu)
 {
@@ -1122,15 +1137,15 @@ static bool copy_device_table(void)
return true;
 }
 
-void amd_iommu_apply_erratum_63(u16 devid)
+void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid)
 {
int sysmgt;
 
-   sysmgt = get_dev_entry_bit(devid, DEV_ENTRY_SYSMGT1) |
-(get_dev_entry_bit(devid, DEV_ENTRY_SYSMGT2) << 1);
+   sysmgt = get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1) |
+(get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2) << 1);
 
if (sysmgt == 0x01)
-   set_dev_entry_bit(devid, DEV_ENTRY_IW);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_IW);
 }
 
 /* Writes the specific IOMMU for a device into the rlookup table */
@@ -1147,21 +1162,21 @@ static void __init set_dev_entry_from_acpi(struct 
amd_iommu *iommu,
   u16 devid, u32 flags, u32 ext_flags)
 {
if (flags & ACPI_DEVFLAG_INITPASS)
-   set_dev_entry_bit(devid, DEV_ENTRY_INIT_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_INIT_PASS);
if (flags & ACPI_DEVFLAG_EXTINT)
-   set_dev_entry_bit(devid, DEV_ENTRY_EINT_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_EINT_PASS);
if (flags & ACPI_DEVFLAG_NMI)
-   set_dev_entry_bit(devid, DEV_ENTRY_NMI_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_NMI_PASS);
if (flags & ACPI_DEVFLAG_SYSMGT1)
-   set_dev_entry_bit(devid, DEV_ENTRY_SYSMGT1);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1);
if (flags & ACPI_DEVFLAG_SYSMGT2)
-   set_dev_entry_bit(devid, DEV_ENTRY_SYSMGT2);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2);
if (flags & ACPI_DEVFLAG_LINT0)
-   set_dev_entry_bit(devid, DEV_ENTRY_LINT0_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT0_PASS);
if (flags & ACPI_DEVFLAG_LINT1)
-   set_dev_entry_bit(devid, DEV_ENTRY_LINT1_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT1_PASS);
 
-   amd_iommu_apply_erratum_63(devid);
+   amd_iommu_apply_erratum_63(iommu, devid);
 
set_iommu_for_device(iommu, devid);
 }
@@ -2519,8 +2534,8 @@ static void init_device_table_dma(struct 

[PATCH v2 25/37] iommu/amd: Update (un)init_device_table_dma()

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Include struct amd_iommu_pci_seg as a function parameter since
we need to access per PCI segment device table.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/init.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 70eb6338b45d..dba1e03e0cd2 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -238,7 +238,7 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
 
 static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum iommu_init_state state);
-static void init_device_table_dma(void);
+static void init_device_table_dma(struct amd_iommu_pci_seg *pci_seg);
 
 static bool amd_iommu_pre_enabled = true;
 
@@ -2115,6 +2115,7 @@ static void print_iommu_info(void)
 static int __init amd_iommu_init_pci(void)
 {
struct amd_iommu *iommu;
+   struct amd_iommu_pci_seg *pci_seg;
int ret;
 
for_each_iommu(iommu) {
@@ -2145,7 +2146,8 @@ static int __init amd_iommu_init_pci(void)
goto out;
}
 
-   init_device_table_dma();
+   for_each_pci_segment(pci_seg)
+   init_device_table_dma(pci_seg);
 
for_each_iommu(iommu)
iommu_flush_all_caches(iommu);
@@ -2508,9 +2510,13 @@ static int __init init_memory_definitions(struct 
acpi_table_header *table)
 /*
  * Init the device table to not allow DMA access for devices
  */
-static void init_device_table_dma(void)
+static void init_device_table_dma(struct amd_iommu_pci_seg *pci_seg)
 {
u32 devid;
+   struct dev_table_entry *dev_table = pci_seg->dev_table;
+
+   if (dev_table == NULL)
+   return;
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
set_dev_entry_bit(devid, DEV_ENTRY_VALID);
@@ -2518,13 +2524,17 @@ static void init_device_table_dma(void)
}
 }
 
-static void __init uninit_device_table_dma(void)
+static void __init uninit_device_table_dma(struct amd_iommu_pci_seg *pci_seg)
 {
u32 devid;
+   struct dev_table_entry *dev_table = pci_seg->dev_table;
+
+   if (dev_table == NULL)
+   return;
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
-   amd_iommu_dev_table[devid].data[0] = 0ULL;
-   amd_iommu_dev_table[devid].data[1] = 0ULL;
+   dev_table[devid].data[0] = 0ULL;
+   dev_table[devid].data[1] = 0ULL;
}
 }
 
@@ -3117,8 +3127,11 @@ static int __init state_next(void)
free_iommu_resources();
} else {
struct amd_iommu *iommu;
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg)
+   uninit_device_table_dma(pci_seg);
 
-   uninit_device_table_dma();
for_each_iommu(iommu)
iommu_flush_all_caches(iommu);
}
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 24/37] iommu/amd: Update set_dte_irq_entry

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment device table instead of global
device table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 5a2a4a08da2f..6773f218512c 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2729,18 +2729,20 @@ EXPORT_SYMBOL(amd_iommu_device_info);
 static struct irq_chip amd_ir_chip;
 static DEFINE_SPINLOCK(iommu_table_lock);
 
-static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
+static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid,
+ struct irq_remap_table *table)
 {
u64 dte;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
 
-   dte = amd_iommu_dev_table[devid].data[2];
+   dte = dev_table[devid].data[2];
dte &= ~DTE_IRQ_PHYS_ADDR_MASK;
dte |= iommu_virt_to_phys(table->table);
dte |= DTE_IRQ_REMAP_INTCTL;
dte |= DTE_INTTABLEN;
dte |= DTE_IRQ_REMAP_ENABLE;
 
-   amd_iommu_dev_table[devid].data[2] = dte;
+   dev_table[devid].data[2] = dte;
 }
 
 static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 
devid)
@@ -2791,7 +2793,7 @@ static void set_remap_table_entry(struct amd_iommu 
*iommu, u16 devid,
struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
pci_seg->irq_lookup_table[devid] = table;
-   set_dte_irq_entry(devid, table);
+   set_dte_irq_entry(iommu, devid, table);
iommu_flush_dte(iommu, devid);
 }
 
@@ -2807,8 +2809,7 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
 
pci_seg = iommu->pci_seg;
pci_seg->irq_lookup_table[alias] = table;
-   set_dte_irq_entry(alias, table);
-
+   set_dte_irq_entry(iommu, alias, table);
iommu_flush_dte(pci_seg->rlookup_table[alias], alias);
 
return 0;
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 23/37] iommu/amd: Update dump_dte_entry

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment device table instead of global
device table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index f2424a72100b..5a2a4a08da2f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -451,13 +451,13 @@ static void amd_iommu_uninit_device(struct device *dev)
  *
  /
 
-static void dump_dte_entry(u16 devid)
+static void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
 {
int i;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
 
for (i = 0; i < 4; ++i)
-   pr_err("DTE[%d]: %016llx\n", i,
-   amd_iommu_dev_table[devid].data[i]);
+   pr_err("DTE[%d]: %016llx\n", i, dev_table[devid].data[i]);
 }
 
 static void dump_command(unsigned long phys_addr)
@@ -618,7 +618,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY 
device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);
-   dump_dte_entry(devid);
+   dump_dte_entry(iommu, devid);
break;
case EVENT_TYPE_DEV_TAB_ERR:
dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR 
device=%02x:%02x.%x "
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 22/37] iommu/amd: Update iommu_ignore_device

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment device table instead of global
device table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 689d3c355d73..f2424a72100b 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -413,15 +413,15 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct device *dev)
 static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
 {
struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
int devid;
 
-   devid = get_device_id(dev);
+   devid = (get_device_id(dev)) & 0x;
if (devid < 0)
return;
 
-
pci_seg->rlookup_table[devid] = NULL;
-   memset(_iommu_dev_table[devid], 0, sizeof(struct dev_table_entry));
+   memset(_table[devid], 0, sizeof(struct dev_table_entry));
 
setup_aliases(iommu, dev);
 }
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 21/37] iommu/amd: Update set_dte_entry and clear_dte_entry

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment data structures instead of global data
structures.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 0d6230e493c8..689d3c355d73 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1537,6 +1537,7 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
u64 pte_root = 0;
u64 flags = 0;
u32 old_domid;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
 
if (domain->iop.mode != PAGE_MODE_NONE)
pte_root = iommu_virt_to_phys(domain->iop.root);
@@ -1545,7 +1546,7 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
<< DEV_ENTRY_MODE_SHIFT;
pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;
 
-   flags = amd_iommu_dev_table[devid].data[1];
+   flags = dev_table[devid].data[1];
 
if (ats)
flags |= DTE_FLAG_IOTLB;
@@ -1584,9 +1585,9 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
flags &= ~DEV_DOMID_MASK;
flags |= domain->id;
 
-   old_domid = amd_iommu_dev_table[devid].data[1] & DEV_DOMID_MASK;
-   amd_iommu_dev_table[devid].data[1]  = flags;
-   amd_iommu_dev_table[devid].data[0]  = pte_root;
+   old_domid = dev_table[devid].data[1] & DEV_DOMID_MASK;
+   dev_table[devid].data[1]  = flags;
+   dev_table[devid].data[0]  = pte_root;
 
/*
 * A kdump kernel might be replacing a domain ID that was copied from
@@ -1598,11 +1599,13 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
}
 }
 
-static void clear_dte_entry(u16 devid)
+static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
 {
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
+
/* remove entry from the device table seen by the hardware */
-   amd_iommu_dev_table[devid].data[0]  = DTE_FLAG_V | DTE_FLAG_TV;
-   amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;
+   dev_table[devid].data[0]  = DTE_FLAG_V | DTE_FLAG_TV;
+   dev_table[devid].data[1] &= DTE_FLAG_MASK;
 
amd_iommu_apply_erratum_63(devid);
 }
@@ -1646,7 +1649,7 @@ static void do_detach(struct iommu_dev_data *dev_data)
/* Update data structures */
dev_data->domain = NULL;
list_del(_data->list);
-   clear_dte_entry(dev_data->devid);
+   clear_dte_entry(iommu, dev_data->devid);
clone_aliases(iommu, dev_data->dev);
 
/* Flush the DTE entry */
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 20/37] iommu/amd: Convert to use per PCI segment rlookup_table

2022-04-25 Thread Vasant Hegde via iommu
Then, remove the global amd_iommu_rlookup_table and rlookup_table_size.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  5 -
 drivers/iommu/amd/init.c| 23 ++-
 drivers/iommu/amd/iommu.c   | 19 +--
 3 files changed, 11 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 990272a470aa..334206381f84 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -838,11 +838,6 @@ extern struct dev_table_entry *amd_iommu_dev_table;
  */
 extern u16 *amd_iommu_alias_table;
 
-/*
- * Reverse lookup table to find the IOMMU which translates a specific device.
- */
-extern struct amd_iommu **amd_iommu_rlookup_table;
-
 /* size of the dma_ops aperture as power of 2 */
 extern unsigned amd_iommu_aperture_order;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 29ed687bc43f..70eb6338b45d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -200,12 +200,6 @@ struct dev_table_entry *amd_iommu_dev_table;
  */
 u16 *amd_iommu_alias_table;
 
-/*
- * The rlookup table is used to find the IOMMU which is responsible
- * for a specific device. It is also indexed by the PCI device id.
- */
-struct amd_iommu **amd_iommu_rlookup_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
@@ -214,7 +208,6 @@ unsigned long *amd_iommu_pd_alloc_bitmap;
 
 static u32 dev_table_size; /* size of the device table */
 static u32 alias_table_size;   /* size of the alias table */
-static u32 rlookup_table_size; /* size if the rlookup table */
 
 enum iommu_init_state {
IOMMU_START_STATE,
@@ -1143,7 +1136,7 @@ void amd_iommu_apply_erratum_63(u16 devid)
 /* Writes the specific IOMMU for a device into the rlookup table */
 static void __init set_iommu_for_device(struct amd_iommu *iommu, u16 devid)
 {
-   amd_iommu_rlookup_table[devid] = iommu;
+   iommu->pci_seg->rlookup_table[devid] = iommu;
 }
 
 /*
@@ -1825,7 +1818,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h,
 * Make sure IOMMU is not considered to translate itself. The IVRS
 * table tells us so, but this is a lie!
 */
-   amd_iommu_rlookup_table[iommu->devid] = NULL;
+   pci_seg->rlookup_table[iommu->devid] = NULL;
 
return 0;
 }
@@ -2783,10 +2776,6 @@ static void __init free_iommu_resources(void)
kmem_cache_destroy(amd_iommu_irq_cache);
amd_iommu_irq_cache = NULL;
 
-   free_pages((unsigned long)amd_iommu_rlookup_table,
-  get_order(rlookup_table_size));
-   amd_iommu_rlookup_table = NULL;
-
free_pages((unsigned long)amd_iommu_alias_table,
   get_order(alias_table_size));
amd_iommu_alias_table = NULL;
@@ -2925,7 +2914,6 @@ static int __init early_amd_iommu_init(void)
 
dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
-   rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE);
 
/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
@@ -2944,13 +2932,6 @@ static int __init early_amd_iommu_init(void)
if (amd_iommu_alias_table == NULL)
goto out;
 
-   /* IOMMU rlookup table - find the IOMMU for a specific device */
-   amd_iommu_rlookup_table = (void *)__get_free_pages(
-   GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
-   if (amd_iommu_rlookup_table == NULL)
-   goto out;
-
amd_iommu_pd_alloc_bitmap = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
get_order(MAX_DOMAIN_ID/8));
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 770a7ba558cf..0d6230e493c8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -287,10 +287,9 @@ static void setup_aliases(struct amd_iommu *iommu, struct 
device *dev)
clone_aliases(iommu, dev);
 }
 
-static struct iommu_dev_data *find_dev_data(u16 devid)
+static struct iommu_dev_data *find_dev_data(struct amd_iommu *iommu, u16 devid)
 {
struct iommu_dev_data *dev_data;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
dev_data = search_dev_data(iommu, devid);
 
@@ -388,7 +387,7 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct device *dev)
if (devid < 0)
return devid;
 
-   dev_data = find_dev_data(devid);
+   dev_data = find_dev_data(iommu, devid);
if (!dev_data)
return -ENOMEM;
 
@@ -403,9 +402,6 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct 

[PATCH v2 19/37] iommu/amd: Update alloc_irq_table and alloc_irq_index

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Pass amd_iommu structure as one of the parameter to these functions
as its needed to retrieve variable tables inside these functions.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 26 +-
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ecace06d61cb..770a7ba558cf 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2812,21 +2812,17 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
return 0;
 }
 
-static struct irq_remap_table *alloc_irq_table(u16 devid, struct pci_dev *pdev)
+static struct irq_remap_table *alloc_irq_table(struct amd_iommu *iommu,
+  u16 devid, struct pci_dev *pdev)
 {
struct irq_remap_table *table = NULL;
struct irq_remap_table *new_table = NULL;
struct amd_iommu_pci_seg *pci_seg;
-   struct amd_iommu *iommu;
unsigned long flags;
u16 alias;
 
spin_lock_irqsave(_table_lock, flags);
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (!iommu)
-   goto out_unlock;
-
pci_seg = iommu->pci_seg;
table = pci_seg->irq_lookup_table[devid];
if (table)
@@ -2882,18 +2878,14 @@ static struct irq_remap_table *alloc_irq_table(u16 
devid, struct pci_dev *pdev)
return table;
 }
 
-static int alloc_irq_index(u16 devid, int count, bool align,
-  struct pci_dev *pdev)
+static int alloc_irq_index(struct amd_iommu *iommu, u16 devid, int count,
+  bool align, struct pci_dev *pdev)
 {
struct irq_remap_table *table;
int index, c, alignment = 1;
unsigned long flags;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-   if (!iommu)
-   return -ENODEV;
 
-   table = alloc_irq_table(devid, pdev);
+   table = alloc_irq_table(iommu, devid, pdev);
if (!table)
return -ENODEV;
 
@@ -3265,7 +3257,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
struct irq_remap_table *table;
 
-   table = alloc_irq_table(devid, NULL);
+   table = alloc_irq_table(iommu, devid, NULL);
if (table) {
if (!table->min_index) {
/*
@@ -3285,10 +3277,10 @@ static int irq_remapping_alloc(struct irq_domain 
*domain, unsigned int virq,
   info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) {
bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI);
 
-   index = alloc_irq_index(devid, nr_irqs, align,
+   index = alloc_irq_index(iommu, devid, nr_irqs, align,
msi_desc_to_pci_dev(info->desc));
} else {
-   index = alloc_irq_index(devid, nr_irqs, false, NULL);
+   index = alloc_irq_index(iommu, devid, nr_irqs, false, NULL);
}
 
if (index < 0) {
@@ -3414,8 +3406,8 @@ static int irq_remapping_select(struct irq_domain *d, 
struct irq_fwspec *fwspec,
 
if (devid < 0)
return 0;
+   iommu = __rlookup_amd_iommu((devid >> 16), (devid & 0x));
 
-   iommu = amd_iommu_rlookup_table[devid];
return iommu && iommu->ir_domain == d;
 }
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 18/37] iommu/amd: Update amd_irte_ops functions

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Pass amd_iommu structure as one of the parameter to amd_irte_ops functions
since its needed to activate/deactivate the iommu.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 ++--
 drivers/iommu/amd/iommu.c   | 51 -
 2 files changed, 24 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 0ef9ecb8d3fc..990272a470aa 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -999,9 +999,9 @@ struct amd_ir_data {
 
 struct amd_irte_ops {
void (*prepare)(void *, u32, bool, u8, u32, int);
-   void (*activate)(void *, u16, u16);
-   void (*deactivate)(void *, u16, u16);
-   void (*set_affinity)(void *, u16, u16, u8, u32);
+   void (*activate)(struct amd_iommu *iommu, void *, u16, u16);
+   void (*deactivate)(struct amd_iommu *iommu, void *, u16, u16);
+   void (*set_affinity)(struct amd_iommu *iommu, void *, u16, u16, u8, 
u32);
void *(*get)(struct irq_remap_table *, int);
void (*set_allocated)(struct irq_remap_table *, int);
bool (*is_allocated)(struct irq_remap_table *, int);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b34d0851866b..ecace06d61cb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2932,19 +2932,14 @@ static int alloc_irq_index(u16 devid, int count, bool 
align,
return index;
 }
 
-static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
- struct amd_ir_data *data)
+static int modify_irte_ga(struct amd_iommu *iommu, u16 devid, int index,
+ struct irte_ga *irte, struct amd_ir_data *data)
 {
bool ret;
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
unsigned long flags;
struct irte_ga *entry;
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu == NULL)
-   return -EINVAL;
-
table = get_irq_table(iommu, devid);
if (!table)
return -ENOMEM;
@@ -2976,16 +2971,12 @@ static int modify_irte_ga(u16 devid, int index, struct 
irte_ga *irte,
return 0;
 }
 
-static int modify_irte(u16 devid, int index, union irte *irte)
+static int modify_irte(struct amd_iommu *iommu,
+  u16 devid, int index, union irte *irte)
 {
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
unsigned long flags;
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu == NULL)
-   return -EINVAL;
-
table = get_irq_table(iommu, devid);
if (!table)
return -ENOMEM;
@@ -3047,49 +3038,49 @@ static void irte_ga_prepare(void *entry,
irte->lo.fields_remap.valid   = 1;
 }
 
-static void irte_activate(void *entry, u16 devid, u16 index)
+static void irte_activate(struct amd_iommu *iommu, void *entry, u16 devid, u16 
index)
 {
union irte *irte = (union irte *) entry;
 
irte->fields.valid = 1;
-   modify_irte(devid, index, irte);
+   modify_irte(iommu, devid, index, irte);
 }
 
-static void irte_ga_activate(void *entry, u16 devid, u16 index)
+static void irte_ga_activate(struct amd_iommu *iommu, void *entry, u16 devid, 
u16 index)
 {
struct irte_ga *irte = (struct irte_ga *) entry;
 
irte->lo.fields_remap.valid = 1;
-   modify_irte_ga(devid, index, irte, NULL);
+   modify_irte_ga(iommu, devid, index, irte, NULL);
 }
 
-static void irte_deactivate(void *entry, u16 devid, u16 index)
+static void irte_deactivate(struct amd_iommu *iommu, void *entry, u16 devid, 
u16 index)
 {
union irte *irte = (union irte *) entry;
 
irte->fields.valid = 0;
-   modify_irte(devid, index, irte);
+   modify_irte(iommu, devid, index, irte);
 }
 
-static void irte_ga_deactivate(void *entry, u16 devid, u16 index)
+static void irte_ga_deactivate(struct amd_iommu *iommu, void *entry, u16 
devid, u16 index)
 {
struct irte_ga *irte = (struct irte_ga *) entry;
 
irte->lo.fields_remap.valid = 0;
-   modify_irte_ga(devid, index, irte, NULL);
+   modify_irte_ga(iommu, devid, index, irte, NULL);
 }
 
-static void irte_set_affinity(void *entry, u16 devid, u16 index,
+static void irte_set_affinity(struct amd_iommu *iommu, void *entry, u16 devid, 
u16 index,
  u8 vector, u32 dest_apicid)
 {
union irte *irte = (union irte *) entry;
 
irte->fields.vector = vector;
irte->fields.destination = dest_apicid;
-   modify_irte(devid, index, irte);
+   modify_irte(iommu, devid, index, irte);
 }
 
-static void irte_ga_set_affinity(void *entry, u16 devid, u16 index,
+static void irte_ga_set_affinity(struct amd_iommu *iommu, void *entry, u16 
devid, u16 index,
 u8 vector, u32 

[PATCH v2 17/37] iommu/amd: Introduce struct amd_ir_data.iommu

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Add a pointer to struct amd_iommu to amd_ir_data structure, which
can be used to correlate interrupt remapping data to a per-PCI-segment
interrupt remapping table.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/iommu.c   | 34 +
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index badf49d2371c..0ef9ecb8d3fc 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -981,6 +981,7 @@ struct irq_2_irte {
 
 struct amd_ir_data {
u32 cached_ga_tag;
+   struct amd_iommu *iommu;
struct irq_2_irte irq_2_irte;
struct msi_msg msi_entry;
void *entry;/* Pointer to union irte or struct irte_ga */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c6b8a1f95b55..b34d0851866b 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3000,16 +3000,11 @@ static int modify_irte(u16 devid, int index, union irte 
*irte)
return 0;
 }
 
-static void free_irte(u16 devid, int index)
+static void free_irte(struct amd_iommu *iommu, u16 devid, int index)
 {
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
unsigned long flags;
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu == NULL)
-   return;
-
table = get_irq_table(iommu, devid);
if (!table)
return;
@@ -3193,7 +3188,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data 
*data,
   int devid, int index, int sub_handle)
 {
struct irq_2_irte *irte_info = >irq_2_irte;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+   struct amd_iommu *iommu = data->iommu;
 
if (!iommu)
return;
@@ -3334,6 +3329,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
goto out_free_data;
}
 
+   data->iommu = iommu;
irq_data->hwirq = (devid << 16) + i;
irq_data->chip_data = data;
irq_data->chip = _ir_chip;
@@ -3350,7 +3346,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
kfree(irq_data->chip_data);
}
for (i = 0; i < nr_irqs; i++)
-   free_irte(devid, index + i);
+   free_irte(iommu, devid, index + i);
 out_free_parent:
irq_domain_free_irqs_common(domain, virq, nr_irqs);
return ret;
@@ -3369,7 +3365,7 @@ static void irq_remapping_free(struct irq_domain *domain, 
unsigned int virq,
if (irq_data && irq_data->chip_data) {
data = irq_data->chip_data;
irte_info = >irq_2_irte;
-   free_irte(irte_info->devid, irte_info->index);
+   free_irte(data->iommu, irte_info->devid, 
irte_info->index);
kfree(data->entry);
kfree(data);
}
@@ -3387,7 +3383,7 @@ static int irq_remapping_activate(struct irq_domain 
*domain,
 {
struct amd_ir_data *data = irq_data->chip_data;
struct irq_2_irte *irte_info = >irq_2_irte;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid];
+   struct amd_iommu *iommu = data->iommu;
struct irq_cfg *cfg = irqd_cfg(irq_data);
 
if (!iommu)
@@ -3404,7 +3400,7 @@ static void irq_remapping_deactivate(struct irq_domain 
*domain,
 {
struct amd_ir_data *data = irq_data->chip_data;
struct irq_2_irte *irte_info = >irq_2_irte;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid];
+   struct amd_iommu *iommu = data->iommu;
 
if (iommu)
iommu->irte_ops->deactivate(data->entry, irte_info->devid,
@@ -3500,12 +3496,16 @@ EXPORT_SYMBOL(amd_iommu_deactivate_guest_mode);
 static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
 {
int ret;
-   struct amd_iommu *iommu;
struct amd_iommu_pi_data *pi_data = vcpu_info;
struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data;
struct amd_ir_data *ir_data = data->chip_data;
struct irq_2_irte *irte_info = _data->irq_2_irte;
-   struct iommu_dev_data *dev_data = search_dev_data(NULL, 
irte_info->devid);
+   struct iommu_dev_data *dev_data;
+
+   if (ir_data->iommu == NULL)
+   return -EINVAL;
+
+   dev_data = search_dev_data(ir_data->iommu, irte_info->devid);
 
/* Note:
 * This device has never been set up for guest mode.
@@ -3527,10 +3527,6 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
pi_data->is_guest_mode = false;
 

[PATCH v2 16/37] iommu/amd: Update irq_remapping_alloc to use IOMMU lookup helper function

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

To allow IOMMU rlookup using both PCI segment and device ID.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 445f583795c3..c6b8a1f95b55 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3244,8 +3244,9 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
struct irq_alloc_info *info = arg;
struct irq_data *irq_data;
struct amd_ir_data *data = NULL;
+   struct amd_iommu *iommu;
struct irq_cfg *cfg;
-   int i, ret, devid;
+   int i, ret, devid, seg, sbdf;
int index;
 
if (!info)
@@ -3261,8 +3262,14 @@ static int irq_remapping_alloc(struct irq_domain 
*domain, unsigned int virq,
if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 
-   devid = get_devid(info);
-   if (devid < 0)
+   sbdf = get_devid(info);
+   if (sbdf < 0)
+   return -EINVAL;
+
+   seg = sbdf >> 16;
+   devid = sbdf & 0x;
+   iommu = __rlookup_amd_iommu(seg, devid);
+   if (!iommu)
return -EINVAL;
 
ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
@@ -3271,7 +3278,6 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
 
if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
 
table = alloc_irq_table(devid, NULL);
if (table) {
@@ -3281,7 +3287,6 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
 * interrupts.
 */
table->min_index = 32;
-   iommu = amd_iommu_rlookup_table[devid];
for (i = 0; i < 32; ++i)
iommu->irte_ops->set_allocated(table, 
i);
}
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 15/37] iommu/amd: Convert to use rlookup_amd_iommu helper function

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Use rlookup_amd_iommu() helper function which will give per PCI
segment rlookup_table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 64 +++
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a105ccacce91..445f583795c3 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -229,13 +229,17 @@ static struct iommu_dev_data *search_dev_data(struct 
amd_iommu *iommu, u16 devid
 
 static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
 {
+   struct amd_iommu *iommu;
u16 devid = pci_dev_id(pdev);
 
if (devid == alias)
return 0;
 
-   amd_iommu_rlookup_table[alias] =
-   amd_iommu_rlookup_table[devid];
+   iommu = rlookup_amd_iommu(>dev);
+   if (!iommu)
+   return 0;
+
+   amd_iommu_set_rlookup_table(iommu, alias);
memcpy(amd_iommu_dev_table[alias].data,
   amd_iommu_dev_table[devid].data,
   sizeof(amd_iommu_dev_table[alias].data));
@@ -366,7 +370,7 @@ static bool check_device(struct device *dev)
if (devid > amd_iommu_last_bdf)
return false;
 
-   if (amd_iommu_rlookup_table[devid] == NULL)
+   if (rlookup_amd_iommu(dev) == NULL)
return false;
 
return true;
@@ -1270,7 +1274,9 @@ static int device_flush_iotlb(struct iommu_dev_data 
*dev_data,
int qdep;
 
qdep = dev_data->ats.qdep;
-   iommu= amd_iommu_rlookup_table[dev_data->devid];
+   iommu= rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return -EINVAL;
 
build_inv_iotlb_pages(, dev_data->devid, qdep, address, size);
 
@@ -1295,7 +1301,9 @@ static int device_flush_dte(struct iommu_dev_data 
*dev_data)
u16 alias;
int ret;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return -EINVAL;
 
if (dev_is_pci(dev_data->dev))
pdev = to_pci_dev(dev_data->dev);
@@ -1525,8 +1533,8 @@ static void free_gcr3_table(struct protection_domain 
*domain)
free_page((unsigned long)domain->gcr3_tbl);
 }
 
-static void set_dte_entry(u16 devid, struct protection_domain *domain,
- bool ats, bool ppr)
+static void set_dte_entry(struct amd_iommu *iommu, u16 devid,
+ struct protection_domain *domain, bool ats, bool ppr)
 {
u64 pte_root = 0;
u64 flags = 0;
@@ -1545,8 +1553,6 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
flags |= DTE_FLAG_IOTLB;
 
if (ppr) {
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
if (iommu_feature(iommu, FEATURE_EPHSUP))
pte_root |= 1ULL << DEV_ENTRY_PPR;
}
@@ -1590,8 +1596,6 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
 * entries for the old domain ID that is being overwritten
 */
if (old_domid) {
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
amd_iommu_flush_tlb_domid(iommu, old_domid);
}
 }
@@ -1611,7 +1615,9 @@ static void do_attach(struct iommu_dev_data *dev_data,
struct amd_iommu *iommu;
bool ats;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return;
ats   = dev_data->ats.enabled;
 
/* Update data structures */
@@ -1623,7 +1629,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
domain->dev_cnt += 1;
 
/* Update device table */
-   set_dte_entry(dev_data->devid, domain,
+   set_dte_entry(iommu, dev_data->devid, domain,
  ats, dev_data->iommu_v2);
clone_aliases(iommu, dev_data->dev);
 
@@ -1635,7 +1641,9 @@ static void do_detach(struct iommu_dev_data *dev_data)
struct protection_domain *domain = dev_data->domain;
struct amd_iommu *iommu;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return;
 
/* Update data structures */
dev_data->domain = NULL;
@@ -1813,13 +1821,14 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
 {
struct iommu_device *iommu_dev;
struct amd_iommu *iommu;
-   int ret, devid;
+   int ret;
 
if (!check_device(dev))
return ERR_PTR(-ENODEV);
 
-   devid = get_device_id(dev);
-   iommu = amd_iommu_rlookup_table[devid];
+   iommu = rlookup_amd_iommu(dev);
+   if (!iommu)
+   return ERR_PTR(-ENODEV);
 
if 

[PATCH v2 14/37] iommu/amd: Convert to use per PCI segment irq_lookup_table

2022-04-25 Thread Vasant Hegde via iommu
Then, remove the global irq_lookup_table.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 --
 drivers/iommu/amd/init.c| 19 ---
 drivers/iommu/amd/iommu.c   | 36 ++---
 3 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 6f1900fa86d2..badf49d2371c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -444,8 +444,6 @@ struct irq_remap_table {
u32 *table;
 };
 
-extern struct irq_remap_table **irq_lookup_table;
-
 /* Interrupt remapping feature used? */
 extern bool amd_iommu_irq_remap;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1688532dffb8..29ed687bc43f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -206,12 +206,6 @@ u16 *amd_iommu_alias_table;
  */
 struct amd_iommu **amd_iommu_rlookup_table;
 
-/*
- * This table is used to find the irq remapping table for a given device id
- * quickly.
- */
-struct irq_remap_table **irq_lookup_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
@@ -2786,11 +2780,6 @@ static struct syscore_ops amd_iommu_syscore_ops = {
 
 static void __init free_iommu_resources(void)
 {
-   kmemleak_free(irq_lookup_table);
-   free_pages((unsigned long)irq_lookup_table,
-  get_order(rlookup_table_size));
-   irq_lookup_table = NULL;
-
kmem_cache_destroy(amd_iommu_irq_cache);
amd_iommu_irq_cache = NULL;
 
@@ -3011,14 +3000,6 @@ static int __init early_amd_iommu_init(void)
if (alloc_irq_lookup_table(pci_seg))
goto out;
}
-
-   irq_lookup_table = (void *)__get_free_pages(
-   GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
-   kmemleak_alloc(irq_lookup_table, rlookup_table_size,
-  1, GFP_KERNEL);
-   if (!irq_lookup_table)
-   goto out;
}
 
ret = init_memory_definitions(ivrs_base);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 0f500b1a3885..a105ccacce91 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2730,16 +2730,18 @@ static void set_dte_irq_entry(u16 devid, struct 
irq_remap_table *table)
amd_iommu_dev_table[devid].data[2] = dte;
 }
 
-static struct irq_remap_table *get_irq_table(u16 devid)
+static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 
devid)
 {
struct irq_remap_table *table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
if (WARN_ONCE(!amd_iommu_rlookup_table[devid],
  "%s: no iommu for devid %x\n", __func__, devid))
return NULL;
 
-   table = irq_lookup_table[devid];
-   if (WARN_ONCE(!table, "%s: no table for devid %x\n", __func__, devid))
+   table = pci_seg->irq_lookup_table[devid];
+   if (WARN_ONCE(!table, "%s: no table for devid %x:%x\n",
+ __func__, pci_seg->id, devid))
return NULL;
 
return table;
@@ -2772,7 +2774,9 @@ static struct irq_remap_table *__alloc_irq_table(void)
 static void set_remap_table_entry(struct amd_iommu *iommu, u16 devid,
  struct irq_remap_table *table)
 {
-   irq_lookup_table[devid] = table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   pci_seg->irq_lookup_table[devid] = table;
set_dte_irq_entry(devid, table);
iommu_flush_dte(iommu, devid);
 }
@@ -2781,8 +2785,14 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
   void *data)
 {
struct irq_remap_table *table = data;
+   struct amd_iommu_pci_seg *pci_seg;
+   struct amd_iommu *iommu = rlookup_amd_iommu(>dev);
 
-   irq_lookup_table[alias] = table;
+   if (!iommu)
+   return -EINVAL;
+
+   pci_seg = iommu->pci_seg;
+   pci_seg->irq_lookup_table[alias] = table;
set_dte_irq_entry(alias, table);
 
iommu_flush_dte(amd_iommu_rlookup_table[alias], alias);
@@ -2806,12 +2816,12 @@ static struct irq_remap_table *alloc_irq_table(u16 
devid, struct pci_dev *pdev)
goto out_unlock;
 
pci_seg = iommu->pci_seg;
-   table = irq_lookup_table[devid];
+   table = pci_seg->irq_lookup_table[devid];
if (table)
goto out_unlock;
 
alias = pci_seg->alias_table[devid];
-   table = irq_lookup_table[alias];
+   table = pci_seg->irq_lookup_table[alias];
if (table) {
set_remap_table_entry(iommu, devid, table);
  

[PATCH v2 13/37] iommu/amd: Introduce per PCI segment rlookup table size

2022-04-25 Thread Vasant Hegde via iommu
It will replace global "rlookup_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 11 ++-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 4bed64ad2068..6f1900fa86d2 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -553,6 +553,9 @@ struct amd_iommu_pci_seg {
/* Size of the alias table */
u32 alias_table_size;
 
+   /* Size of the rlookup table */
+   u32 rlookup_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d4e4f49066f8..1688532dffb8 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -671,7 +671,7 @@ static inline int __init alloc_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 {
pci_seg->rlookup_table = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
+   
get_order(pci_seg->rlookup_table_size));
if (pci_seg->rlookup_table == NULL)
return -ENOMEM;
 
@@ -681,7 +681,7 @@ static inline int __init alloc_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->rlookup_table,
-  get_order(rlookup_table_size));
+  get_order(pci_seg->rlookup_table_size));
pci_seg->rlookup_table = NULL;
 }
 
@@ -689,9 +689,9 @@ static inline int __init alloc_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_se
 {
pci_seg->irq_lookup_table = (void *)__get_free_pages(
 GFP_KERNEL | __GFP_ZERO,
-get_order(rlookup_table_size));
+
get_order(pci_seg->rlookup_table_size));
kmemleak_alloc(pci_seg->irq_lookup_table,
-  rlookup_table_size, 1, GFP_KERNEL);
+  pci_seg->rlookup_table_size, 1, GFP_KERNEL);
if (pci_seg->irq_lookup_table == NULL)
return -ENOMEM;
 
@@ -702,7 +702,7 @@ static inline void free_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 {
kmemleak_free(pci_seg->irq_lookup_table);
free_pages((unsigned long)pci_seg->irq_lookup_table,
-  get_order(rlookup_table_size));
+  get_order(pci_seg->rlookup_table_size));
pci_seg->irq_lookup_table = NULL;
 }
 
@@ -1583,6 +1583,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
+   pci_seg->rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 12/37] iommu/amd: Introduce per PCI segment alias table size

2022-04-25 Thread Vasant Hegde via iommu
It will replace global "alias_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 3 +++
 drivers/iommu/amd/init.c| 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index aa666d0723ba..4bed64ad2068 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -550,6 +550,9 @@ struct amd_iommu_pci_seg {
/* Size of the device table */
u32 dev_table_size;
 
+   /* Size of the alias table */
+   u32 alias_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f8da686182b5..d4e4f49066f8 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -711,7 +711,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
int i;
 
pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL,
-   
get_order(alias_table_size));
+   get_order(pci_seg->alias_table_size));
if (!pci_seg->alias_table)
return -ENOMEM;
 
@@ -727,7 +727,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
 static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->alias_table,
-  get_order(alias_table_size));
+  get_order(pci_seg->alias_table_size));
pci_seg->alias_table = NULL;
 }
 
@@ -1582,6 +1582,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
+   pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 11/37] iommu/amd: Introduce per PCI segment device table size

2022-04-25 Thread Vasant Hegde via iommu
With multiple pci segment support, number of BDF supported by each
segment may differ. Hence introduce per segment device table size
which depends on last_bdf. This will replace global
"device_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 18 ++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index e39e7db54e69..aa666d0723ba 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -547,6 +547,9 @@ struct amd_iommu_pci_seg {
/* Largest PCI device id we expect translation requests for */
u16 last_bdf;
 
+   /* Size of the device table */
+   u32 dev_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 71f39551a83a..f8da686182b5 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -414,6 +414,7 @@ static void iommu_set_cwwb_range(struct amd_iommu *iommu)
 static void iommu_set_device_table(struct amd_iommu *iommu)
 {
u64 entry;
+   u32 dev_table_size = iommu->pci_seg->dev_table_size;
 
BUG_ON(iommu->mmio_base == NULL);
 
@@ -651,7 +652,7 @@ static int __init find_last_devid_acpi(struct 
acpi_table_header *table, u16 pci_
 static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | 
GFP_DMA32,
- 
get_order(dev_table_size));
+ 
get_order(pci_seg->dev_table_size));
if (!pci_seg->dev_table)
return -ENOMEM;
 
@@ -661,7 +662,7 @@ static inline int __init alloc_dev_table(struct 
amd_iommu_pci_seg *pci_seg)
 static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->dev_table,
-   get_order(dev_table_size));
+   get_order(pci_seg->dev_table_size));
pci_seg->dev_table = NULL;
 }
 
@@ -1034,7 +1035,7 @@ static bool __copy_device_table(struct amd_iommu *iommu)
entry = (((u64) hi) << 32) + lo;
 
old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
-   if (old_devtb_size != dev_table_size) {
+   if (old_devtb_size != pci_seg->dev_table_size) {
pr_err("The device table size of IOMMU:%d is not expected!\n",
iommu->index);
return false;
@@ -1053,15 +1054,15 @@ static bool __copy_device_table(struct amd_iommu *iommu)
}
old_devtb = (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) && 
is_kdump_kernel())
? (__force void *)ioremap_encrypted(old_devtb_phys,
-   dev_table_size)
-   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   pci_seg->dev_table_size)
+   : memremap(old_devtb_phys, pci_seg->dev_table_size, 
MEMREMAP_WB);
 
if (!old_devtb)
return false;
 
gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
-   get_order(dev_table_size));
+   
get_order(pci_seg->dev_table_size));
if (pci_seg->old_dev_tbl_cpy == NULL) {
pr_err("Failed to allocate memory for copying old device 
table!\n");
memunmap(old_devtb);
@@ -1580,6 +1581,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
 
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
+   pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
@@ -2675,7 +2677,7 @@ static void early_enable_iommus(void)
for_each_pci_segment(pci_seg) {
if (pci_seg->old_dev_tbl_cpy != NULL) {
free_pages((unsigned 
long)pci_seg->old_dev_tbl_cpy,
-   get_order(dev_table_size));
+   
get_order(pci_seg->dev_table_size));
pci_seg->old_dev_tbl_cpy = NULL;
}
}
@@ -2689,7 +2691,7 @@ static void early_enable_iommus(void)
 
for_each_pci_segment(pci_seg) {
free_pages((unsigned long)pci_seg->dev_table,
-  get_order(dev_table_size));
+  

[PATCH v2 10/37] iommu/amd: Introduce per PCI segment last_bdf

2022-04-25 Thread Vasant Hegde via iommu
Current code uses global "amd_iommu_last_bdf" to track the last bdf
supported by the system. This value is used for various memory
allocation, device data flushing, etc.

Introduce per PCI segment last_bdf which will be used to track last bdf
supported by the given PCI segment and use this value for all per
segment memory allocations. Eventually it will replace global
"amd_iommu_last_bdf".

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 ++
 drivers/iommu/amd/init.c| 68 ++---
 2 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index c4c9c35e2bf7..e39e7db54e69 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -544,6 +544,9 @@ struct amd_iommu_pci_seg {
/* PCI segment number */
u16 id;
 
+   /* Largest PCI device id we expect translation requests for */
+   u16 last_bdf;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d613e20ea013..71f39551a83a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -550,6 +550,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
 {
u8 *p = (void *)h, *end = (void *)h;
struct ivhd_entry *dev;
+   int last_devid = -EINVAL;
 
u32 ivhd_size = get_ivhd_header_size(h);
 
@@ -567,6 +568,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
case IVHD_DEV_ALL:
/* Use maximum BDF value for DEV_ALL */
update_last_devid(0x);
+   return 0x;
break;
case IVHD_DEV_SELECT:
case IVHD_DEV_RANGE_END:
@@ -574,6 +576,8 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
case IVHD_DEV_EXT_SELECT:
/* all the above subfield types refer to device ids */
update_last_devid(dev->devid);
+   if (dev->devid > last_devid)
+   last_devid = dev->devid;
break;
default:
break;
@@ -583,7 +587,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
 
WARN_ON(p != end);
 
-   return 0;
+   return last_devid;
 }
 
 static int __init check_ivrs_checksum(struct acpi_table_header *table)
@@ -607,27 +611,31 @@ static int __init check_ivrs_checksum(struct 
acpi_table_header *table)
  * id which we need to handle. This is the first of three functions which parse
  * the ACPI table. So we check the checksum here.
  */
-static int __init find_last_devid_acpi(struct acpi_table_header *table)
+static int __init find_last_devid_acpi(struct acpi_table_header *table, u16 
pci_seg)
 {
u8 *p = (u8 *)table, *end = (u8 *)table;
struct ivhd_header *h;
+   int last_devid, last_bdf = 0;
 
p += IVRS_HEADER_LENGTH;
 
end += table->length;
while (p < end) {
h = (struct ivhd_header *)p;
-   if (h->type == amd_iommu_target_ivhd_type) {
-   int ret = find_last_devid_from_ivhd(h);
-
-   if (ret)
-   return ret;
+   if (h->pci_seg == pci_seg &&
+   h->type == amd_iommu_target_ivhd_type) {
+   last_devid = find_last_devid_from_ivhd(h);
+
+   if (last_devid < 0)
+   return -EINVAL;
+   if (last_devid > last_bdf)
+   last_bdf = last_devid;
}
p += h->length;
}
WARN_ON(p != end);
 
-   return 0;
+   return last_bdf;
 }
 
 /
@@ -1551,14 +1559,28 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 }
 
 /* Allocate PCI segment data structure */
-static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id)
+static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id,
+ struct acpi_table_header *ivrs_base)
 {
struct amd_iommu_pci_seg *pci_seg;
+   int last_bdf;
+
+   /*
+* First parse ACPI tables to find the largest Bus/Dev/Func we need to
+* handle in this PCI segment. Upon this information the shared data
+* structures for the PCI segments in the system will be allocated.
+*/
+   last_bdf = find_last_devid_acpi(ivrs_base, id);
+   if (last_bdf < 0)
+   return NULL;
 
pci_seg = kzalloc(sizeof(struct amd_iommu_pci_seg), GFP_KERNEL);
if (pci_seg == NULL)
 

[PATCH v2 09/37] iommu/amd: Introduce per PCI segment unity map list

2022-04-25 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments. In order to support
multiple PCI segments IVMD table in IVRS structure is enhanced to
include pci segment id. Update ivmd_header structure to include "pci_seg".

Also introduce per PCI segment unity map list. It will replace global
amd_iommu_unity_map list.

Note that we have used "reserved" field in IVMD table to include "pci_seg
id" which was set to zero. It will take care of backward compatibility
(new kernel will work fine on older systems).

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 13 +++--
 drivers/iommu/amd/init.c| 30 +++--
 drivers/iommu/amd/iommu.c   |  8 +++-
 3 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index f9776f188e36..c4c9c35e2bf7 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -579,6 +579,13 @@ struct amd_iommu_pci_seg {
 * More than one device can share the same requestor id.
 */
u16 *alias_table;
+
+   /*
+* A list of required unity mappings we find in ACPI. It is not locked
+* because as runtime it is only read. It is created at ACPI table
+* parsing time.
+*/
+   struct list_head unity_map;
 };
 
 /*
@@ -805,12 +812,6 @@ struct unity_map_entry {
int prot;
 };
 
-/*
- * List of all unity mappings. It is not locked because as runtime it is only
- * read. It is created at ACPI table parsing time.
- */
-extern struct list_head amd_iommu_unity_map;
-
 /*
  * Data structures for device handling
  */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index fe31de6e764c..d613e20ea013 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -142,7 +142,8 @@ struct ivmd_header {
u16 length;
u16 devid;
u16 aux;
-   u64 resv;
+   u16 pci_seg;
+   u8  resv[6];
u64 range_start;
u64 range_length;
 } __attribute__((packed));
@@ -162,8 +163,6 @@ static int amd_iommu_target_ivhd_type;
 
 u16 amd_iommu_last_bdf;/* largest PCI device id we have
   to handle */
-LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
-  we find in ACPI */
 
 LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
@@ -1562,6 +1561,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
+   INIT_LIST_HEAD(_seg->unity_map);
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
if (alloc_dev_table(pci_seg))
@@ -2397,10 +2397,13 @@ static int iommu_init_irq(struct amd_iommu *iommu)
 static void __init free_unity_maps(void)
 {
struct unity_map_entry *entry, *next;
+   struct amd_iommu_pci_seg *p, *pci_seg;
 
-   list_for_each_entry_safe(entry, next, _iommu_unity_map, list) {
-   list_del(>list);
-   kfree(entry);
+   for_each_pci_segment_safe(pci_seg, p) {
+   list_for_each_entry_safe(entry, next, _seg->unity_map, 
list) {
+   list_del(>list);
+   kfree(entry);
+   }
}
 }
 
@@ -2408,8 +2411,13 @@ static void __init free_unity_maps(void)
 static int __init init_unity_map_range(struct ivmd_header *m)
 {
struct unity_map_entry *e = NULL;
+   struct amd_iommu_pci_seg *pci_seg;
char *s;
 
+   pci_seg = get_pci_segment(m->pci_seg);
+   if (pci_seg == NULL)
+   return -ENOMEM;
+
e = kzalloc(sizeof(*e), GFP_KERNEL);
if (e == NULL)
return -ENOMEM;
@@ -2447,14 +2455,16 @@ static int __init init_unity_map_range(struct 
ivmd_header *m)
if (m->flags & IVMD_FLAG_EXCL_RANGE)
e->prot = (IVMD_FLAG_IW | IVMD_FLAG_IR) >> 1;
 
-   DUMP_printk("%s devid_start: %02x:%02x.%x devid_end: %02x:%02x.%x"
-   " range_start: %016llx range_end: %016llx flags: %x\n", s,
+   DUMP_printk("%s devid_start: %04x:%02x:%02x.%x devid_end: "
+   "%04x:%02x:%02x.%x range_start: %016llx range_end: %016llx"
+   " flags: %x\n", s, m->pci_seg,
PCI_BUS_NUM(e->devid_start), PCI_SLOT(e->devid_start),
-   PCI_FUNC(e->devid_start), PCI_BUS_NUM(e->devid_end),
+   PCI_FUNC(e->devid_start), m->pci_seg,
+   PCI_BUS_NUM(e->devid_end),
PCI_SLOT(e->devid_end), PCI_FUNC(e->devid_end),
e->address_start, e->address_end, m->flags);
 
-   list_add_tail(>list, _iommu_unity_map);
+   

[PATCH v2 08/37] iommu/amd: Introduce per PCI segment alias_table

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This will replace global alias table (amd_iommu_alias_table).

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  7 +
 drivers/iommu/amd/init.c| 41 ++---
 drivers/iommu/amd/iommu.c   | 41 ++---
 3 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 330bb346207a..f9776f188e36 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -572,6 +572,13 @@ struct amd_iommu_pci_seg {
 * will be copied to. It's only be used in kdump kernel.
 */
struct dev_table_entry *old_dev_tbl_cpy;
+
+   /*
+* The alias table is a driver specific data structure which contains 
the
+* mappings of the PCI device ids to the actual requestor ids on the 
IOMMU.
+* More than one device can share the same requestor id.
+*/
+   u16 *alias_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index af413738da01..fe31de6e764c 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -698,6 +698,31 @@ static inline void free_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_seg)
pci_seg->irq_lookup_table = NULL;
 }
 
+static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   int i;
+
+   pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL,
+   
get_order(alias_table_size));
+   if (!pci_seg->alias_table)
+   return -ENOMEM;
+
+   /*
+* let all alias entries point to itself
+*/
+   for (i = 0; i <= amd_iommu_last_bdf; ++i)
+   pci_seg->alias_table[i] = i;
+
+   return 0;
+}
+
+static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->alias_table,
+  get_order(alias_table_size));
+   pci_seg->alias_table = NULL;
+}
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1266,6 +1291,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
u32 dev_i, ext_flags = 0;
bool alias = false;
struct ivhd_entry *e;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
u32 ivhd_size;
int ret;
 
@@ -1347,7 +1373,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
devid_to = e->ext >> 8;
set_dev_entry_from_acpi(iommu, devid   , e->flags, 0);
set_dev_entry_from_acpi(iommu, devid_to, e->flags, 0);
-   amd_iommu_alias_table[devid] = devid_to;
+   pci_seg->alias_table[devid] = devid_to;
break;
case IVHD_DEV_ALIAS_RANGE:
 
@@ -1405,7 +1431,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
devid = e->devid;
for (dev_i = devid_start; dev_i <= devid; ++dev_i) {
if (alias) {
-   amd_iommu_alias_table[dev_i] = devid_to;
+   pci_seg->alias_table[dev_i] = devid_to;
set_dev_entry_from_acpi(iommu,
devid_to, flags, ext_flags);
}
@@ -1540,6 +1566,8 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
if (alloc_dev_table(pci_seg))
return NULL;
+   if (alloc_alias_table(pci_seg))
+   return NULL;
if (alloc_rlookup_table(pci_seg))
return NULL;
 
@@ -1566,6 +1594,7 @@ static void __init free_pci_segment(void)
list_del(_seg->list);
free_irq_lookup_table(pci_seg);
free_rlookup_table(pci_seg);
+   free_alias_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
}
@@ -2838,7 +2867,7 @@ static void __init ivinfo_init(void *ivrs)
 static int __init early_amd_iommu_init(void)
 {
struct acpi_table_header *ivrs_base;
-   int i, remap_cache_sz, ret;
+   int remap_cache_sz, ret;
acpi_status status;
 
if (!amd_iommu_detected)
@@ -2909,12 +2938,6 @@ static int __init early_amd_iommu_init(void)
if (amd_iommu_pd_alloc_bitmap == NULL)
goto out;
 
-   /*
-* let all alias entries point to itself
-*/
-   for (i = 0; i <= amd_iommu_last_bdf; ++i)
-   amd_iommu_alias_table[i] = i;
-
/*
 * never allocate domain 0 because its used as the non-allocated and

[PATCH v2 07/37] iommu/amd: Introduce per PCI segment old_dev_tbl_cpy

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

It will remove global old_dev_tbl_cpy. Also update copy_device_table()
copy device table for all PCI segments.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |   6 ++
 drivers/iommu/amd/init.c| 109 
 2 files changed, 70 insertions(+), 45 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 7bf35e3a1ed6..330bb346207a 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -566,6 +566,12 @@ struct amd_iommu_pci_seg {
 * device id quickly.
 */
struct irq_remap_table **irq_lookup_table;
+
+   /*
+* Pointer to a device table which the content of old device table
+* will be copied to. It's only be used in kdump kernel.
+*/
+   struct dev_table_entry *old_dev_tbl_cpy;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 27785a558d9c..af413738da01 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -193,11 +193,6 @@ bool amd_iommu_force_isolation __read_mostly;
  * page table root pointer.
  */
 struct dev_table_entry *amd_iommu_dev_table;
-/*
- * Pointer to a device table which the content of old device table
- * will be copied to. It's only be used in kdump kernel.
- */
-static struct dev_table_entry *old_dev_tbl_cpy;
 
 /*
  * The alias table is a driver specific data structure which contains the
@@ -990,39 +985,27 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
 }
 
 
-static bool copy_device_table(void)
+static bool __copy_device_table(struct amd_iommu *iommu)
 {
-   u64 int_ctl, int_tab_len, entry = 0, last_entry = 0;
+   u64 int_ctl, int_tab_len, entry = 0;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
struct dev_table_entry *old_devtb = NULL;
u32 lo, hi, devid, old_devtb_size;
phys_addr_t old_devtb_phys;
-   struct amd_iommu *iommu;
u16 dom_id, dte_v, irq_v;
gfp_t gfp_flag;
u64 tmp;
 
-   if (!amd_iommu_pre_enabled)
-   return false;
-
-   pr_warn("Translation is already enabled - trying to copy translation 
structures\n");
-   for_each_iommu(iommu) {
-   /* All IOMMUs should use the same device table with the same 
size */
-   lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
-   hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
-   entry = (((u64) hi) << 32) + lo;
-   if (last_entry && last_entry != entry) {
-   pr_err("IOMMU:%d should use the same dev table as 
others!\n",
-   iommu->index);
-   return false;
-   }
-   last_entry = entry;
+   /* Each IOMMU use separate device table with the same size */
+   lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
+   hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
+   entry = (((u64) hi) << 32) + lo;
 
-   old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
-   if (old_devtb_size != dev_table_size) {
-   pr_err("The device table size of IOMMU:%d is not 
expected!\n",
-   iommu->index);
-   return false;
-   }
+   old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
+   if (old_devtb_size != dev_table_size) {
+   pr_err("The device table size of IOMMU:%d is not expected!\n",
+   iommu->index);
+   return false;
}
 
/*
@@ -1045,31 +1028,31 @@ static bool copy_device_table(void)
return false;
 
gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
-   old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
-   get_order(dev_table_size));
-   if (old_dev_tbl_cpy == NULL) {
+   pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
+   get_order(dev_table_size));
+   if (pci_seg->old_dev_tbl_cpy == NULL) {
pr_err("Failed to allocate memory for copying old device 
table!\n");
memunmap(old_devtb);
return false;
}
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
-   old_dev_tbl_cpy[devid] = old_devtb[devid];
+   pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
 
if (dte_v && dom_id) {
-   old_dev_tbl_cpy[devid].data[0] = 
old_devtb[devid].data[0];
-   old_dev_tbl_cpy[devid].data[1] = 
old_devtb[devid].data[1];
+   

[PATCH v2 06/37] iommu/amd: Introduce per PCI segment dev_data_list

2022-04-25 Thread Vasant Hegde via iommu
This will replace global dev_data_list.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c|  1 +
 drivers/iommu/amd/iommu.c   | 21 ++---
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index d507c96598a7..7bf35e3a1ed6 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -538,6 +538,9 @@ struct protection_domain {
 struct amd_iommu_pci_seg {
struct list_head list;
 
+   /* List of all available dev_data structures */
+   struct llist_head dev_data_list;
+
/* PCI segment number */
u16 id;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 48db6c3164aa..27785a558d9c 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1525,6 +1525,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
return NULL;
 
pci_seg->id = id;
+   init_llist_head(_seg->dev_data_list);
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
if (alloc_dev_table(pci_seg))
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index d9cf5b3187b5..63728e34e044 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -62,9 +62,6 @@
 
 static DEFINE_SPINLOCK(pd_bitmap_lock);
 
-/* List of all available dev_data structures */
-static LLIST_HEAD(dev_data_list);
-
 LIST_HEAD(ioapic_map);
 LIST_HEAD(hpet_map);
 LIST_HEAD(acpihid_map);
@@ -195,9 +192,10 @@ static struct protection_domain *to_pdomain(struct 
iommu_domain *dom)
return container_of(dom, struct protection_domain, domain);
 }
 
-static struct iommu_dev_data *alloc_dev_data(u16 devid)
+static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 
devid)
 {
struct iommu_dev_data *dev_data;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
dev_data = kzalloc(sizeof(*dev_data), GFP_KERNEL);
if (!dev_data)
@@ -207,19 +205,20 @@ static struct iommu_dev_data *alloc_dev_data(u16 devid)
dev_data->devid = devid;
ratelimit_default_init(_data->rs);
 
-   llist_add(_data->dev_data_list, _data_list);
+   llist_add(_data->dev_data_list, _seg->dev_data_list);
return dev_data;
 }
 
-static struct iommu_dev_data *search_dev_data(u16 devid)
+static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 
devid)
 {
struct iommu_dev_data *dev_data;
struct llist_node *node;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
-   if (llist_empty(_data_list))
+   if (llist_empty(_seg->dev_data_list))
return NULL;
 
-   node = dev_data_list.first;
+   node = pci_seg->dev_data_list.first;
llist_for_each_entry(dev_data, node, dev_data_list) {
if (dev_data->devid == devid)
return dev_data;
@@ -288,10 +287,10 @@ static struct iommu_dev_data *find_dev_data(u16 devid)
struct iommu_dev_data *dev_data;
struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
-   dev_data = search_dev_data(devid);
+   dev_data = search_dev_data(iommu, devid);
 
if (dev_data == NULL) {
-   dev_data = alloc_dev_data(devid);
+   dev_data = alloc_dev_data(iommu, devid);
if (!dev_data)
return NULL;
 
@@ -3464,7 +3463,7 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data;
struct amd_ir_data *ir_data = data->chip_data;
struct irq_2_irte *irte_info = _data->irq_2_irte;
-   struct iommu_dev_data *dev_data = search_dev_data(irte_info->devid);
+   struct iommu_dev_data *dev_data = search_dev_data(NULL, 
irte_info->devid);
 
/* Note:
 * This device has never been set up for guest mode.
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 05/37] iommu/amd: Introduce per PCI segment irq_lookup_table

2022-04-25 Thread Vasant Hegde via iommu
This will replace global irq lookup table (irq_lookup_table).

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 ++
 drivers/iommu/amd/init.c| 27 +++
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 9c008662be1b..d507c96598a7 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -557,6 +557,12 @@ struct amd_iommu_pci_seg {
 * device id.
 */
struct amd_iommu **rlookup_table;
+
+   /*
+* This table is used to find the irq remapping table for a given
+* device id quickly.
+*/
+   struct irq_remap_table **irq_lookup_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index a2efc02ba80a..48db6c3164aa 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -682,6 +682,26 @@ static inline void free_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
pci_seg->rlookup_table = NULL;
 }
 
+static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg 
*pci_seg)
+{
+   pci_seg->irq_lookup_table = (void *)__get_free_pages(
+GFP_KERNEL | __GFP_ZERO,
+get_order(rlookup_table_size));
+   kmemleak_alloc(pci_seg->irq_lookup_table,
+  rlookup_table_size, 1, GFP_KERNEL);
+   if (pci_seg->irq_lookup_table == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   kmemleak_free(pci_seg->irq_lookup_table);
+   free_pages((unsigned long)pci_seg->irq_lookup_table,
+  get_order(rlookup_table_size));
+   pci_seg->irq_lookup_table = NULL;
+}
 
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
@@ -1533,6 +1553,7 @@ static void __init free_pci_segment(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_irq_lookup_table(pci_seg);
free_rlookup_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
@@ -2896,6 +2917,7 @@ static int __init early_amd_iommu_init(void)
amd_iommu_irq_remap = check_ioapic_information();
 
if (amd_iommu_irq_remap) {
+   struct amd_iommu_pci_seg *pci_seg;
/*
 * Interrupt remapping enabled, create kmem_cache for the
 * remapping tables.
@@ -2912,6 +2934,11 @@ static int __init early_amd_iommu_init(void)
if (!amd_iommu_irq_cache)
goto out;
 
+   for_each_pci_segment(pci_seg) {
+   if (alloc_irq_lookup_table(pci_seg))
+   goto out;
+   }
+
irq_lookup_table = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
get_order(rlookup_table_size));
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 04/37] iommu/amd: Introduce per PCI segment rlookup table

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This will replace global rlookup table (amd_iommu_rlookup_table).
Also add helper functions to set/get rlookup table for the given device.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h |  8 ++
 drivers/iommu/amd/init.c| 23 +++
 drivers/iommu/amd/iommu.c   | 44 +
 4 files changed, 76 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 885570cd0d77..2947239700ce 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -19,6 +19,7 @@ extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
 extern void amd_iommu_init_notifier(void);
 extern int amd_iommu_init_api(void);
+extern void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
 void amd_iommu_debugfs_setup(struct amd_iommu *iommu);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 404feb7995cc..9c008662be1b 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -486,6 +486,7 @@ struct amd_iommu_fault {
 };
 
 
+struct amd_iommu;
 struct iommu_domain;
 struct irq_domain;
 struct amd_irte_ops;
@@ -549,6 +550,13 @@ struct amd_iommu_pci_seg {
 * page table root pointer.
 */
struct dev_table_entry *dev_table;
+
+   /*
+* The rlookup iommu table is used to find the IOMMU which is
+* responsible for a specific device. It is indexed by the PCI
+* device id.
+*/
+   struct amd_iommu **rlookup_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 0fd1071bfc85..a2efc02ba80a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -663,6 +663,26 @@ static inline void free_dev_table(struct amd_iommu_pci_seg 
*pci_seg)
pci_seg->dev_table = NULL;
 }
 
+/* Allocate per PCI segment IOMMU rlookup table. */
+static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   pci_seg->rlookup_table = (void *)__get_free_pages(
+   GFP_KERNEL | __GFP_ZERO,
+   get_order(rlookup_table_size));
+   if (pci_seg->rlookup_table == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->rlookup_table,
+  get_order(rlookup_table_size));
+   pci_seg->rlookup_table = NULL;
+}
+
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1489,6 +1509,8 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
if (alloc_dev_table(pci_seg))
return NULL;
+   if (alloc_rlookup_table(pci_seg))
+   return NULL;
 
return pci_seg;
 }
@@ -1511,6 +1533,7 @@ static void __init free_pci_segment(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_rlookup_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
}
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 54b8eb764530..d9cf5b3187b5 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -146,6 +146,50 @@ struct dev_table_entry *get_dev_table(struct amd_iommu 
*iommu)
return dev_table;
 }
 
+static inline u16 get_device_segment(struct device *dev)
+{
+   u16 seg;
+
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   seg = pci_domain_nr(pdev->bus);
+   } else {
+   u32 devid = get_acpihid_device_id(dev, NULL);
+
+   seg = (devid >> 16) & 0x;
+   }
+
+   return seg;
+}
+
+/* Writes the specific IOMMU for a device into the PCI segment rlookup table */
+void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid)
+{
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   pci_seg->rlookup_table[devid] = iommu;
+}
+
+static struct amd_iommu *__rlookup_amd_iommu(u16 seg, u16 devid)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg) {
+   if (pci_seg->id == seg)
+   return pci_seg->rlookup_table[devid];
+   }
+   return NULL;
+}
+
+static struct amd_iommu *rlookup_amd_iommu(struct device *dev)
+{
+   u16 seg = get_device_segment(dev);
+   u16 devid = get_device_id(dev);
+
+   return __rlookup_amd_iommu(seg, devid);
+}
+
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
 {
return 

[PATCH v2 03/37] iommu/amd: Introduce per PCI segment device table

2022-04-25 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Introduce per PCI segment device table. All IOMMUs within the segment
will share this device table. This will replace global device
table i.e. amd_iommu_dev_table.

Also introduce helper function to get the device table for the given IOMMU.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h | 10 ++
 drivers/iommu/amd/init.c| 26 --
 drivers/iommu/amd/iommu.c   | 12 
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 1ab31074f5b3..885570cd0d77 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -128,4 +128,5 @@ static inline void amd_iommu_apply_ivrs_quirks(void) { }
 
 extern void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 u64 *root, int mode);
+extern struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
 #endif
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 62442d88978f..404feb7995cc 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -539,6 +539,16 @@ struct amd_iommu_pci_seg {
 
/* PCI segment number */
u16 id;
+
+   /*
+* device table virtual address
+*
+* Pointer to the per PCI segment device table.
+* It is indexed by the PCI device id or the HT unit id and contains
+* information about the domain the device belongs to as well as the
+* page table root pointer.
+*/
+   struct dev_table_entry *dev_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index e01eae9dcbc1..0fd1071bfc85 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -640,11 +640,29 @@ static int __init find_last_devid_acpi(struct 
acpi_table_header *table)
  *
  * The following functions belong to the code path which parses the ACPI table
  * the second time. In this ACPI parsing iteration we allocate IOMMU specific
- * data structures, initialize the device/alias/rlookup table and also
- * basically initialize the hardware.
+ * data structures, initialize the per PCI segment device/alias/rlookup table
+ * and also basically initialize the hardware.
  *
  /
 
+/* Allocate per PCI segment device table */
+static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | 
GFP_DMA32,
+ 
get_order(dev_table_size));
+   if (!pci_seg->dev_table)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->dev_table,
+   get_order(dev_table_size));
+   pci_seg->dev_table = NULL;
+}
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1469,6 +1487,9 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
pci_seg->id = id;
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
+   if (alloc_dev_table(pci_seg))
+   return NULL;
+
return pci_seg;
 }
 
@@ -1490,6 +1511,7 @@ static void __init free_pci_segment(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_dev_table(pci_seg);
kfree(pci_seg);
}
 }
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index cf57ffcc8d54..54b8eb764530 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -134,6 +134,18 @@ static inline int get_device_id(struct device *dev)
return devid;
 }
 
+struct dev_table_entry *get_dev_table(struct amd_iommu *iommu)
+{
+   struct dev_table_entry *dev_table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   BUG_ON(pci_seg == NULL);
+   dev_table = pci_seg->dev_table;
+   BUG_ON(dev_table == NULL);
+
+   return dev_table;
+}
+
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
 {
return container_of(dom, struct protection_domain, domain);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 02/37] iommu/amd: Introduce pci segment structure

2022-04-25 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments, where each segment
contains one or more IOMMU instances. However, an IOMMU instance can only
support a single PCI segment.

Current code assumes that system contains only one pci segment (segment 0)
and creates global data structures such as device table, rlookup table,
etc.

Introducing per PCI segment data structure, which contains segment
specific data structures. This will eventually replace the global
data structures.

Also update `amd_iommu->pci_seg` variable to point to PCI segment
structure instead of PCI segment ID.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 23 ++-
 drivers/iommu/amd/init.c| 46 -
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 06235b7cb13d..62442d88978f 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -452,6 +452,11 @@ extern bool amd_iommu_irq_remap;
 /* kmem_cache to get tables with 128 byte alignement */
 extern struct kmem_cache *amd_iommu_irq_cache;
 
+/* Make iterating over all pci segment easier */
+#define for_each_pci_segment(pci_seg) \
+   list_for_each_entry((pci_seg), _iommu_pci_seg_list, list)
+#define for_each_pci_segment_safe(pci_seg, next) \
+   list_for_each_entry_safe((pci_seg), (next), _iommu_pci_seg_list, 
list)
 /*
  * Make iterating over all IOMMUs easier
  */
@@ -526,6 +531,16 @@ struct protection_domain {
unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
 };
 
+/*
+ * This structure contains information about one PCI segment in the system.
+ */
+struct amd_iommu_pci_seg {
+   struct list_head list;
+
+   /* PCI segment number */
+   u16 id;
+};
+
 /*
  * Structure where we save information about one hardware AMD IOMMU in the
  * system.
@@ -577,7 +592,7 @@ struct amd_iommu {
u16 cap_ptr;
 
/* pci domain of this IOMMU */
-   u16 pci_seg;
+   struct amd_iommu_pci_seg *pci_seg;
 
/* start of exclusion range of that IOMMU */
u64 exclusion_start;
@@ -705,6 +720,12 @@ extern struct list_head ioapic_map;
 extern struct list_head hpet_map;
 extern struct list_head acpihid_map;
 
+/*
+ * List with all PCI segments in the system. This list is not locked because
+ * it is only written at driver initialization time
+ */
+extern struct list_head amd_iommu_pci_seg_list;
+
 /*
  * List with all IOMMUs in the system. This list is not locked because it is
  * only written and read at driver initialization or suspend time
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b4a798c7b347..e01eae9dcbc1 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -165,6 +165,7 @@ u16 amd_iommu_last_bdf; /* largest PCI 
device id we have
 LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
   we find in ACPI */
 
+LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
   system */
 
@@ -1456,6 +1457,43 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
return 0;
 }
 
+/* Allocate PCI segment data structure */
+static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   pci_seg = kzalloc(sizeof(struct amd_iommu_pci_seg), GFP_KERNEL);
+   if (pci_seg == NULL)
+   return NULL;
+
+   pci_seg->id = id;
+   list_add_tail(_seg->list, _iommu_pci_seg_list);
+
+   return pci_seg;
+}
+
+static struct amd_iommu_pci_seg *__init get_pci_segment(u16 id)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg) {
+   if (pci_seg->id == id)
+   return pci_seg;
+   }
+
+   return alloc_pci_segment(id);
+}
+
+static void __init free_pci_segment(void)
+{
+   struct amd_iommu_pci_seg *pci_seg, *next;
+
+   for_each_pci_segment_safe(pci_seg, next) {
+   list_del(_seg->list);
+   kfree(pci_seg);
+   }
+}
+
 static void __init free_iommu_one(struct amd_iommu *iommu)
 {
free_cwwb_sem(iommu);
@@ -1542,8 +1580,14 @@ static void amd_iommu_ats_write_check_workaround(struct 
amd_iommu *iommu)
  */
 static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header 
*h)
 {
+   struct amd_iommu_pci_seg *pci_seg;
int ret;
 
+   pci_seg = get_pci_segment(h->pci_seg);
+   if (pci_seg == NULL)
+   return -ENOMEM;
+   iommu->pci_seg = pci_seg;
+
raw_spin_lock_init(>lock);
iommu->cmd_sem_val = 0;
 
@@ -1564,7 +1608,6 @@ static int __init 

[PATCH v2 01/37] iommu/amd: Update struct iommu_dev_data defination

2022-04-25 Thread Vasant Hegde via iommu
struct iommu_dev_data contains member "pdev" to point to pci_dev. This is
valid for only PCI devices and for other devices this will be NULL. This
causes unnecessary "pdev != NULL" check at various places.

Replace "struct pci_dev" member with "struct device" and use to_pci_dev()
to get pci device reference as needed. Also adjust setup_aliases() and
clone_aliases() function.

No functional change intended.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 +-
 drivers/iommu/amd/iommu.c   | 32 +
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 47108ed44fbb..06235b7cb13d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -685,7 +685,7 @@ struct iommu_dev_data {
struct list_head list;/* For domain->dev_list */
struct llist_node dev_data_list;  /* For global dev_data_list */
struct protection_domain *domain; /* Domain the device is bound to */
-   struct pci_dev *pdev;
+   struct device *dev;
u16 devid;/* PCI Device ID */
bool iommu_v2;/* Device can make use of IOMMUv2 */
struct {
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e..cf57ffcc8d54 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -188,10 +188,13 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, 
void *data)
return 0;
 }
 
-static void clone_aliases(struct pci_dev *pdev)
+static void clone_aliases(struct device *dev)
 {
-   if (!pdev)
+   struct pci_dev *pdev;
+
+   if (!dev_is_pci(dev))
return;
+   pdev = to_pci_dev(dev);
 
/*
 * The IVRS alias stored in the alias table may not be
@@ -203,14 +206,14 @@ static void clone_aliases(struct pci_dev *pdev)
pci_for_each_dma_alias(pdev, clone_alias, NULL);
 }
 
-static struct pci_dev *setup_aliases(struct device *dev)
+static void setup_aliases(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
u16 ivrs_alias;
 
/* For ACPI HID devices, there are no aliases */
if (!dev_is_pci(dev))
-   return NULL;
+   return;
 
/*
 * Add the IVRS alias to the pci aliases if it is on the same
@@ -221,9 +224,7 @@ static struct pci_dev *setup_aliases(struct device *dev)
PCI_BUS_NUM(ivrs_alias) == pdev->bus->number)
pci_add_dma_alias(pdev, ivrs_alias & 0xff, 1);
 
-   clone_aliases(pdev);
-
-   return pdev;
+   clone_aliases(dev);
 }
 
 static struct iommu_dev_data *find_dev_data(u16 devid)
@@ -331,7 +332,8 @@ static int iommu_init_device(struct device *dev)
if (!dev_data)
return -ENOMEM;
 
-   dev_data->pdev = setup_aliases(dev);
+   dev_data->dev = dev;
+   setup_aliases(dev);
 
/*
 * By default we use passthrough mode for IOMMUv2 capable device.
@@ -1232,13 +1234,17 @@ static int device_flush_dte_alias(struct pci_dev *pdev, 
u16 alias, void *data)
 static int device_flush_dte(struct iommu_dev_data *dev_data)
 {
struct amd_iommu *iommu;
+   struct pci_dev *pdev = NULL;
u16 alias;
int ret;
 
iommu = amd_iommu_rlookup_table[dev_data->devid];
 
-   if (dev_data->pdev)
-   ret = pci_for_each_dma_alias(dev_data->pdev,
+   if (dev_is_pci(dev_data->dev))
+   pdev = to_pci_dev(dev_data->dev);
+
+   if (pdev)
+   ret = pci_for_each_dma_alias(pdev,
 device_flush_dte_alias, iommu);
else
ret = iommu_flush_dte(iommu, dev_data->devid);
@@ -1561,7 +1567,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
/* Update device table */
set_dte_entry(dev_data->devid, domain,
  ats, dev_data->iommu_v2);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
 
device_flush_dte(dev_data);
 }
@@ -1577,7 +1583,7 @@ static void do_detach(struct iommu_dev_data *dev_data)
dev_data->domain = NULL;
list_del(_data->list);
clear_dte_entry(dev_data->devid);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
 
/* Flush the DTE entry */
device_flush_dte(dev_data);
@@ -1818,7 +1824,7 @@ static void update_device_table(struct protection_domain 
*domain)
list_for_each_entry(dev_data, >dev_list, list) {
set_dte_entry(dev_data->devid, domain,
  dev_data->ats.enabled, dev_data->iommu_v2);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
}
 }
 
-- 
2.27.0


[PATCH v2 00/37] iommu/amd: Add multiple PCI segments support

2022-04-25 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments, where each segment
contains one or more IOMMU instances. However, an IOMMU instance can only
support a single PCI segment.

Current code assumes a system contains only one PCI segment (segment 0)
and creates global data structures such as device table, rlookup table,
etc.

This series introduces per-PCI-segment data structure, which contains
device table, alias table, etc. For each PCI segment, all IOMMUs
share the same data structure. The series also makes necessary code
adjustment and logging enhancements. Finally it removes global data
structures like device table, alias table, etc.

In case of system w/ single PCI segment (e.g. PCI segment ID is zero),
IOMMU driver allocates one PCI segment data structure, which will
be shared by all IOMMUs.

Patch 1 Updates struct iommu_dev_data defination.

Patch 2 - 13 introduce new PCI segment structure and allocate per
data structures, and introduce the amd_iommu.pci_seg pointer to point
to the corresponded pci_segment structure. Also, we have introduced
a helper function rlookup_amd_iommu() to reverse-lookup each iommu
for a particular device.

Patch 14 - 29 adopt to per PCI segment data structure and removes
global data structure.

Patch 30 fixes flushing logic to flush upto last_bdf.

Patch 31 - 37 convert usages of 16-bit PCI device ID to include
16-bit segment ID.


Changes from v1 -> v2:
  - Updated patch 1 to include dev_is_pci() check

v1 patchset  : 
https://lore.kernel.org/linux-iommu/20220404100023.324645-1-vasant.he...@amd.com/T/#t

Changes from RFC -> v1:
  - Rebased patches on top of iommu/next tree.
  - Update struct iommu_dev_data defination
  - Updated few log message to print segment ID
  - Fix smatch warnings

RFC patchset : 
https://lore.kernel.org/linux-iommu/20220311094854.31595-1-vasant.he...@amd.com/T/#t

Regards,
Vasant

Suravee Suthikulpanit (21):
  iommu/amd: Introduce per PCI segment device table
  iommu/amd: Introduce per PCI segment rlookup table
  iommu/amd: Introduce per PCI segment old_dev_tbl_cpy
  iommu/amd: Introduce per PCI segment alias_table
  iommu/amd: Convert to use rlookup_amd_iommu helper function
  iommu/amd: Update irq_remapping_alloc to use IOMMU lookup helper function
  iommu/amd: Introduce struct amd_ir_data.iommu
  iommu/amd: Update amd_irte_ops functions
  iommu/amd: Update alloc_irq_table and alloc_irq_index
  iommu/amd: Update set_dte_entry and clear_dte_entry
  iommu/amd: Update iommu_ignore_device
  iommu/amd: Update dump_dte_entry
  iommu/amd: Update set_dte_irq_entry
  iommu/amd: Update (un)init_device_table_dma()
  iommu/amd: Update set_dev_entry_bit() and get_dev_entry_bit()
  iommu/amd: Remove global amd_iommu_dev_table
  iommu/amd: Remove global amd_iommu_alias_table
  iommu/amd: Introduce get_device_sbdf_id() helper function
  iommu/amd: Include PCI segment ID when initialize IOMMU
  iommu/amd: Specify PCI segment ID when getting pci device
  iommu/amd: Add PCI segment support for ivrs_ioapic, ivrs_hpet, ivrs_acpihid 
commands

Vasant Hegde (16):
  iommu/amd: Update struct iommu_dev_data defination
  iommu/amd: Introduce pci segment structure
  iommu/amd: Introduce per PCI segment irq_lookup_table
  iommu/amd: Introduce per PCI segment dev_data_list
  iommu/amd: Introduce per PCI segment unity map list
  iommu/amd: Introduce per PCI segment last_bdf
  iommu/amd: Introduce per PCI segment device table size
  iommu/amd: Introduce per PCI segment alias table size
  iommu/amd: Introduce per PCI segment rlookup table size
  iommu/amd: Convert to use per PCI segment irq_lookup_table
  iommu/amd: Convert to use per PCI segment rlookup_table
  iommu/amd: Remove global amd_iommu_last_bdf
  iommu/amd: Flush upto last_bdf only
  iommu/amd: Print PCI segment ID in error log messages
  iommu/amd: Update device_state structure to include PCI seg ID
  iommu/amd: Update amd_iommu_fault structure to include PCI seg ID

 .../admin-guide/kernel-parameters.txt |  34 +-
 drivers/iommu/amd/amd_iommu.h |  13 +-
 drivers/iommu/amd/amd_iommu_types.h   | 127 +++-
 drivers/iommu/amd/init.c  | 683 +++---
 drivers/iommu/amd/iommu.c | 545 --
 drivers/iommu/amd/iommu_v2.c  |  67 +-
 drivers/iommu/amd/quirks.c|   4 +-
 7 files changed, 888 insertions(+), 585 deletions(-)

-- 
2.27.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu-v3: check return value after calling platform_get_resource()

2022-04-25 Thread Yang Yingliang via iommu
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.

Signed-off-by: Yang Yingliang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8f..88817a3376ef 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3770,6 +3770,8 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 
/* Base address */
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!res)
+   return -EINVAL;
if (resource_size(res) < arm_smmu_resource_size(smmu)) {
dev_err(dev, "MMIO region too small (%pr)\n", res);
return -EINVAL;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: fix possible null-ptr-deref in arm_smmu_device_probe()

2022-04-25 Thread Yang Yingliang via iommu
It will cause null-ptr-deref when using 'res', if platform_get_resource()
returns NULL, so move using 'res' after devm_ioremap_resource() that
will check it to avoid null-ptr-deref.
And use devm_platform_get_and_ioremap_resource() to simplify code.

Fixes: 9648cbc9625b ("iommu/arm-smmu: Make use of the iommu_register interface")
Signed-off-by: Yang Yingliang 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 568cce590ccc..52b71f6aee3f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2092,11 +2092,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   ioaddr = res->start;
-   smmu->base = devm_ioremap_resource(dev, res);
+   smmu->base = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(smmu->base))
return PTR_ERR(smmu->base);
+   ioaddr = res->start;
/*
 * The resource size should effectively match the value of SMMU_TOP;
 * stash that temporarily until we know PAGESIZE to validate it with.
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/mediatek: fix NULL pointer dereference when printing dev_name

2022-04-25 Thread Robin Murphy

On 2022-04-25 09:24, Miles Chen via iommu wrote:

When larbdev is NULL (in the case I hit, the node is incorrectly set
iommus = < NUM>), it will cause device_link_add() fail and
kernel crashes when we try to print dev_name(larbdev).

Fix it by adding a NULL pointer check before
device_link_add/device_link_remove.

It should work for normal correct setting and avoid the crash caused
by my incorrect setting.

Error log:
[   18.189042][  T301] Unable to handle kernel NULL pointer dereference at 
virtual address 0050
[   18.190247][  T301] Mem abort info:
[   18.190255][  T301]   ESR = 0x9605
[   18.190263][  T301]   EC = 0x25: DABT (current EL), IL = 32 bits
[   18.192142][  T301]   SET = 0, FnV = 0
[   18.192151][  T301]   EA = 0, S1PTW = 0
[   18.194710][  T301]   FSC = 0x05: level 1 translation fault
[   18.195424][  T301] Data abort info:
[   18.195888][  T301]   ISV = 0, ISS = 0x0005
[   18.196500][  T301]   CM = 0, WnR = 0
[   18.196977][  T301] user pgtable: 4k pages, 39-bit VAs, pgdp=000104f9e000
[   18.197889][  T301] [0050] pgd=, 
p4d=, pud=
[   18.199220][  T301] Internal error: Oops: 9605 [#1] PREEMPT SMP
[   18.343152][  T301] Kernel Offset: 0x144408 from 0xffc00800
[   18.343988][  T301] PHYS_OFFSET: 0x4000
[   18.344519][  T301] pstate: a045 (NzCv daif +PAN -UAO)
[   18.345213][  T301] pc : mtk_iommu_probe_device+0xf8/0x118 [mtk_iommu]
[   18.346050][  T301] lr : mtk_iommu_probe_device+0xd0/0x118 [mtk_iommu]
[   18.346884][  T301] sp : ffc00a5635e0
[   18.347392][  T301] x29: ffc00a5635e0 x28: ffd44a46c1d8
[   18.348156][  T301] x27: ff80c39a8000 x26: ffd44a80cc38
[   18.348917][  T301] x25:  x24: ffd44a80cc38
[   18.349677][  T301] x23: ffd44e4da4c6 x22: ffd44a80cc38
[   18.350438][  T301] x21: ff80cecd1880 x20: 
[   18.351198][  T301] x19: ff80c439f010 x18: ffc00a50d0c0
[   18.351959][  T301] x17:  x16: 0004
[   18.352719][  T301] x15: 0004 x14: ffd44eb5d420
[   18.353480][  T301] x13: 0ad2 x12: 0003
[   18.354241][  T301] x11: fad2 x10: c000fad2
[   18.355003][  T301] x9 : a0d288d8d7142d00 x8 : a0d288d8d7142d00
[   18.355763][  T301] x7 : ffd44c2bc640 x6 : 
[   18.356524][  T301] x5 : 0080 x4 : 0001
[   18.357284][  T301] x3 :  x2 : 0005
[   18.358045][  T301] x1 :  x0 : 
[   18.360208][  T301] Hardware name: MT6873 (DT)
[   18.360771][  T301] Call trace:
[   18.361168][  T301]  dump_backtrace+0xf8/0x1f0
[   18.361737][  T301]  dump_stack_lvl+0xa8/0x11c
[   18.362305][  T301]  dump_stack+0x1c/0x2c
[   18.362816][  T301]  mrdump_common_die+0x184/0x40c [mrdump]
[   18.363575][  T301]  ipanic_die+0x24/0x38 [mrdump]
[   18.364230][  T301]  atomic_notifier_call_chain+0x128/0x2b8
[   18.364937][  T301]  die+0x16c/0x568
[   18.365394][  T301]  __do_kernel_fault+0x1e8/0x214
[   18.365402][  T301]  do_page_fault+0xb8/0x678
[   18.366934][  T301]  do_translation_fault+0x48/0x64
[   18.368645][  T301]  do_mem_abort+0x68/0x148
[   18.368652][  T301]  el1_abort+0x40/0x64
[   18.368660][  T301]  el1h_64_sync_handler+0x54/0x88
[   18.368668][  T301]  el1h_64_sync+0x68/0x6c
[   18.368673][  T301]  mtk_iommu_probe_device+0xf8/0x118 [mtk_iommu]
[   18.369840][  T301]  __iommu_probe_device+0x12c/0x358
[   18.370880][  T301]  iommu_probe_device+0x3c/0x31c
[   18.372026][  T301]  of_iommu_configure+0x200/0x274
[   18.373587][  T301]  of_dma_configure_id+0x1b8/0x230
[   18.375200][  T301]  platform_dma_configure+0x24/0x3c
[   18.376456][  T301]  really_probe+0x110/0x504
[   18.376464][  T301]  __driver_probe_device+0xb4/0x188
[   18.376472][  T301]  driver_probe_device+0x5c/0x2b8
[   18.376481][  T301]  __driver_attach+0x338/0x42c
[   18.377992][  T301]  bus_add_driver+0x218/0x4c8
[   18.379389][  T301]  driver_register+0x84/0x17c
[   18.380580][  T301]  __platform_driver_register+0x28/0x38
...

Reported-by: kernel test robot 
Fixes: 635319a4a744 ("media: iommu/mediatek: Add device_link between the consumer 
and the larb devices")
Signed-off-by: Miles Chen 

---

Change since v1
fix a build warning reported by kernel test robot
https://lore.kernel.org/lkml/202204231446.iykdz674-...@intel.com/

---
  drivers/iommu/mtk_iommu.c| 13 -
  drivers/iommu/mtk_iommu_v1.c | 13 -
  2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6fd75a60abd6..03e0133f346a 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -581,10 +581,12 @@ static struct iommu_device *mtk_iommu_probe_device(struct 
device *dev)
}
}
larbdev = data->larb_imu[larbid].dev;
-   link = device_link_add(dev, larbdev,
- 

[PATCH] iommu/dart: check return value after calling platform_get_resource()

2022-04-25 Thread Yang Yingliang via iommu
It will cause null-ptr-deref in resource_size(), if platform_get_resource()
returns NULL, move calling resource_size() after devm_ioremap_resource() that
will check 'res' to avoid null-ptr-deref.
And use devm_platform_get_and_ioremap_resource() to simplify code.

Fixes: 46d1fb072e76 ("iommu/dart: Add DART iommu driver")
Signed-off-by: Yang Yingliang 
---
 drivers/iommu/apple-dart.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index decafb07ad08..15b77f16cfa3 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -859,16 +859,15 @@ static int apple_dart_probe(struct platform_device *pdev)
dart->dev = dev;
spin_lock_init(>lock);
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   dart->regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
+   if (IS_ERR(dart->regs))
+   return PTR_ERR(dart->regs);
+
if (resource_size(res) < 0x4000) {
dev_err(dev, "MMIO region too small (%pr)\n", res);
return -EINVAL;
}
 
-   dart->regs = devm_ioremap_resource(dev, res);
-   if (IS_ERR(dart->regs))
-   return PTR_ERR(dart->regs);
-
dart->irq = platform_get_irq(pdev, 0);
if (dart->irq < 0)
return -ENODEV;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/mediatek: fix NULL pointer dereference when printing dev_name

2022-04-25 Thread Miles Chen via iommu
When larbdev is NULL (in the case I hit, the node is incorrectly set
iommus = < NUM>), it will cause device_link_add() fail and
kernel crashes when we try to print dev_name(larbdev).

Fix it by adding a NULL pointer check before
device_link_add/device_link_remove.

It should work for normal correct setting and avoid the crash caused
by my incorrect setting.

Error log:
[   18.189042][  T301] Unable to handle kernel NULL pointer dereference at 
virtual address 0050
[   18.190247][  T301] Mem abort info:
[   18.190255][  T301]   ESR = 0x9605
[   18.190263][  T301]   EC = 0x25: DABT (current EL), IL = 32 bits
[   18.192142][  T301]   SET = 0, FnV = 0
[   18.192151][  T301]   EA = 0, S1PTW = 0
[   18.194710][  T301]   FSC = 0x05: level 1 translation fault
[   18.195424][  T301] Data abort info:
[   18.195888][  T301]   ISV = 0, ISS = 0x0005
[   18.196500][  T301]   CM = 0, WnR = 0
[   18.196977][  T301] user pgtable: 4k pages, 39-bit VAs, pgdp=000104f9e000
[   18.197889][  T301] [0050] pgd=, 
p4d=, pud=
[   18.199220][  T301] Internal error: Oops: 9605 [#1] PREEMPT SMP
[   18.343152][  T301] Kernel Offset: 0x144408 from 0xffc00800
[   18.343988][  T301] PHYS_OFFSET: 0x4000
[   18.344519][  T301] pstate: a045 (NzCv daif +PAN -UAO)
[   18.345213][  T301] pc : mtk_iommu_probe_device+0xf8/0x118 [mtk_iommu]
[   18.346050][  T301] lr : mtk_iommu_probe_device+0xd0/0x118 [mtk_iommu]
[   18.346884][  T301] sp : ffc00a5635e0
[   18.347392][  T301] x29: ffc00a5635e0 x28: ffd44a46c1d8
[   18.348156][  T301] x27: ff80c39a8000 x26: ffd44a80cc38
[   18.348917][  T301] x25:  x24: ffd44a80cc38
[   18.349677][  T301] x23: ffd44e4da4c6 x22: ffd44a80cc38
[   18.350438][  T301] x21: ff80cecd1880 x20: 
[   18.351198][  T301] x19: ff80c439f010 x18: ffc00a50d0c0
[   18.351959][  T301] x17:  x16: 0004
[   18.352719][  T301] x15: 0004 x14: ffd44eb5d420
[   18.353480][  T301] x13: 0ad2 x12: 0003
[   18.354241][  T301] x11: fad2 x10: c000fad2
[   18.355003][  T301] x9 : a0d288d8d7142d00 x8 : a0d288d8d7142d00
[   18.355763][  T301] x7 : ffd44c2bc640 x6 : 
[   18.356524][  T301] x5 : 0080 x4 : 0001
[   18.357284][  T301] x3 :  x2 : 0005
[   18.358045][  T301] x1 :  x0 : 
[   18.360208][  T301] Hardware name: MT6873 (DT)
[   18.360771][  T301] Call trace:
[   18.361168][  T301]  dump_backtrace+0xf8/0x1f0
[   18.361737][  T301]  dump_stack_lvl+0xa8/0x11c
[   18.362305][  T301]  dump_stack+0x1c/0x2c
[   18.362816][  T301]  mrdump_common_die+0x184/0x40c [mrdump]
[   18.363575][  T301]  ipanic_die+0x24/0x38 [mrdump]
[   18.364230][  T301]  atomic_notifier_call_chain+0x128/0x2b8
[   18.364937][  T301]  die+0x16c/0x568
[   18.365394][  T301]  __do_kernel_fault+0x1e8/0x214
[   18.365402][  T301]  do_page_fault+0xb8/0x678
[   18.366934][  T301]  do_translation_fault+0x48/0x64
[   18.368645][  T301]  do_mem_abort+0x68/0x148
[   18.368652][  T301]  el1_abort+0x40/0x64
[   18.368660][  T301]  el1h_64_sync_handler+0x54/0x88
[   18.368668][  T301]  el1h_64_sync+0x68/0x6c
[   18.368673][  T301]  mtk_iommu_probe_device+0xf8/0x118 [mtk_iommu]
[   18.369840][  T301]  __iommu_probe_device+0x12c/0x358
[   18.370880][  T301]  iommu_probe_device+0x3c/0x31c
[   18.372026][  T301]  of_iommu_configure+0x200/0x274
[   18.373587][  T301]  of_dma_configure_id+0x1b8/0x230
[   18.375200][  T301]  platform_dma_configure+0x24/0x3c
[   18.376456][  T301]  really_probe+0x110/0x504
[   18.376464][  T301]  __driver_probe_device+0xb4/0x188
[   18.376472][  T301]  driver_probe_device+0x5c/0x2b8
[   18.376481][  T301]  __driver_attach+0x338/0x42c
[   18.377992][  T301]  bus_add_driver+0x218/0x4c8
[   18.379389][  T301]  driver_register+0x84/0x17c
[   18.380580][  T301]  __platform_driver_register+0x28/0x38
...

Reported-by: kernel test robot 
Fixes: 635319a4a744 ("media: iommu/mediatek: Add device_link between the 
consumer and the larb devices")
Signed-off-by: Miles Chen 

---

Change since v1
fix a build warning reported by kernel test robot
https://lore.kernel.org/lkml/202204231446.iykdz674-...@intel.com/

---
 drivers/iommu/mtk_iommu.c| 13 -
 drivers/iommu/mtk_iommu_v1.c | 13 -
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6fd75a60abd6..03e0133f346a 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -581,10 +581,12 @@ static struct iommu_device *mtk_iommu_probe_device(struct 
device *dev)
}
}
larbdev = data->larb_imu[larbid].dev;
-   link = device_link_add(dev, larbdev,
-  DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS);