Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-06-23 Thread Joerg Roedel
Hi Baolu,

On Wed, May 25, 2022 at 09:40:26AM +0800, Baolu Lu wrote:
> How do you like it? If you agree, I can queue it in my next pull request
> for fixes.

Would it help to tie DMAR and IOMMU components together, so that
selecting DMAR for IRQ remapping also selects IOMMU? The IOMMU can be in
PT mode and I think it would simplify a lot of things.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-23 Thread Nicolin Chen via iommu
On Fri, Jun 24, 2022 at 01:38:58PM +0800, Yong Wu wrote:

> > > > diff --git a/drivers/iommu/mtk_iommu_v1.c
> > > > b/drivers/iommu/mtk_iommu_v1.c
> > > > index e1cb51b9866c..5386d889429d 100644
> > > > --- a/drivers/iommu/mtk_iommu_v1.c
> > > > +++ b/drivers/iommu/mtk_iommu_v1.c
> > > > @@ -304,7 +304,7 @@ static int mtk_iommu_v1_attach_device(struct
> > > > iommu_domain *domain, struct device
> > > >   /* Only allow the domain created internally. */
> > > >   mtk_mapping = data->mapping;
> > > >   if (mtk_mapping->domain != domain)
> > > > - return 0;
> > > > + return -EMEDIUMTYPE;
> > > >
> > > >   if (!data->m4u_dom) {
> > > >   data->m4u_dom = dom;
> > >
> > > This change looks odd. It turns the return value from success to
> > > failure. Is it a bug? If so, it should go through a separated fix
> > > patch.
> 
> Thanks for the review:)
> 
> >
> > Makes sense.
> >
> > I read the commit log of the original change:
> >
> https://lore.kernel.org/r/1589530123-30240-1-git-send-email-yong...@mediatek.com
> >
> > It doesn't seem to allow devices to get attached to different
> > domains other than the shared mapping->domain, created in the
> > in the mtk_iommu_probe_device(). So it looks like returning 0
> > is intentional. Though I am still very confused by this return
> > value here, I doubt it has ever been used in a VFIO context.
> 
> It's not used in VFIO context. "return 0" just satisfy the iommu
> framework to go ahead. and yes, here we only allow the shared "mapping-
> >domain" (All the devices share a domain created internally).
> 
> thus I think we should still keep "return 0" here.

Thanks for the reply. I will just drop the change of this file.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-23 Thread Yong Wu via iommu
On Thu, 2022-06-23 at 19:44 -0700, Nicolin Chen wrote:
> On Fri, Jun 24, 2022 at 09:35:49AM +0800, Baolu Lu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On 2022/6/24 04:00, Nicolin Chen wrote:
> > > diff --git a/drivers/iommu/mtk_iommu_v1.c
> > > b/drivers/iommu/mtk_iommu_v1.c
> > > index e1cb51b9866c..5386d889429d 100644
> > > --- a/drivers/iommu/mtk_iommu_v1.c
> > > +++ b/drivers/iommu/mtk_iommu_v1.c
> > > @@ -304,7 +304,7 @@ static int mtk_iommu_v1_attach_device(struct
> > > iommu_domain *domain, struct device
> > >   /* Only allow the domain created internally. */
> > >   mtk_mapping = data->mapping;
> > >   if (mtk_mapping->domain != domain)
> > > - return 0;
> > > + return -EMEDIUMTYPE;
> > > 
> > >   if (!data->m4u_dom) {
> > >   data->m4u_dom = dom;
> > 
> > This change looks odd. It turns the return value from success to
> > failure. Is it a bug? If so, it should go through a separated fix
> > patch.

Thanks for the review:)

> 
> Makes sense.
> 
> I read the commit log of the original change:
> 
https://lore.kernel.org/r/1589530123-30240-1-git-send-email-yong...@mediatek.com
> 
> It doesn't seem to allow devices to get attached to different
> domains other than the shared mapping->domain, created in the
> in the mtk_iommu_probe_device(). So it looks like returning 0
> is intentional. Though I am still very confused by this return
> value here, I doubt it has ever been used in a VFIO context.

It's not used in VFIO context. "return 0" just satisfy the iommu
framework to go ahead. and yes, here we only allow the shared "mapping-
>domain" (All the devices share a domain created internally).

thus I think we should still keep "return 0" here.

Thanks:)

> 
> Young, would you please give us some input?
> 
> Overall, I feel it's better to play it safe here by dropping
> this part. If we later confirm there is a need to fix it, we
> will do that in a separate patch anyway.
> 
> Thanks
> Nic

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-23 Thread Nicolin Chen via iommu
On Fri, Jun 24, 2022 at 09:35:49AM +0800, Baolu Lu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2022/6/24 04:00, Nicolin Chen wrote:
> > diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
> > index e1cb51b9866c..5386d889429d 100644
> > --- a/drivers/iommu/mtk_iommu_v1.c
> > +++ b/drivers/iommu/mtk_iommu_v1.c
> > @@ -304,7 +304,7 @@ static int mtk_iommu_v1_attach_device(struct 
> > iommu_domain *domain, struct device
> >   /* Only allow the domain created internally. */
> >   mtk_mapping = data->mapping;
> >   if (mtk_mapping->domain != domain)
> > - return 0;
> > + return -EMEDIUMTYPE;
> > 
> >   if (!data->m4u_dom) {
> >   data->m4u_dom = dom;
> 
> This change looks odd. It turns the return value from success to
> failure. Is it a bug? If so, it should go through a separated fix patch.

Makes sense.

I read the commit log of the original change:
https://lore.kernel.org/r/1589530123-30240-1-git-send-email-yong...@mediatek.com

It doesn't seem to allow devices to get attached to different
domains other than the shared mapping->domain, created in the
in the mtk_iommu_probe_device(). So it looks like returning 0
is intentional. Though I am still very confused by this return
value here, I doubt it has ever been used in a VFIO context.

Young, would you please give us some input?

Overall, I feel it's better to play it safe here by dropping
this part. If we later confirm there is a need to fix it, we
will do that in a separate patch anyway.

Thanks
Nic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Jason Gunthorpe via iommu
On Wed, Jun 22, 2022 at 01:04:11PM +0100, Robin Murphy wrote:

> +struct vfio_device *vfio_device_get_from_iommu(struct iommu_group 
> *iommu_group)
> +{
> + struct vfio_group *group = vfio_group_get_from_iommu(iommu_group);
> + struct vfio_device *device;
> +
> + mutex_lock(>device_lock);
> + list_for_each_entry(device, >device_list, group_next) {
> + if (vfio_device_try_get(device)) {
> + mutex_unlock(>device_lock);
> + return device;
> + }
> + }
> + mutex_unlock(>device_lock);
> + return NULL;
> +}

FWIW, I have no objection to this general approach, and I don't think
we should make any broader API just for this.

Though I might call it something like
'vfio_get_group_representor_device()' which more strongly suggests
what it is only used for.

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-06-23 Thread Baolu Lu

On 2022/6/24 04:00, Nicolin Chen wrote:

From: Jason Gunthorpe 

The KVM mechanism for controlling wbinvd is based on OR of the coherency
property of all devices attached to a guest, no matter whether those
devices are attached to a single domain or multiple domains.

On the other hand, the benefit to using separate domains was that those
devices attached to domains supporting enforced cache coherency always
mapped with the attributes necessary to provide that feature, therefore
if a non-enforced domain was dropped, the associated group removal would
re-trigger an evaluation by KVM.

In practice however, the only known cases of such mixed domains included
an Intel IGD device behind an IOMMU lacking snoop control, where such
devices do not support hotplug, therefore this scenario lacks testing and
is not considered sufficiently relevant to support.

After all, KVM won't take advantage of trying to push a device that could
do enforced cache coherency to a dedicated domain vs re-using an existing
domain, which is non-coherent.

Simplify this code and eliminate the test. This removes the only logic
that needed to have a dummy domain attached prior to searching for a
matching domain and simplifies the next patches.

It's unclear whether we want to further optimize the Intel driver to
update the domain coherency after a device is detached from it, at
least not before KVM can be verified to handle such dynamics in related
emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality
we don't see an usage requiring such optimization as the only device
which imposes such non-coherency is Intel GPU which even doesn't
support hotplug/hot remove.

Signed-off-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Signed-off-by: Nicolin Chen 
---
  drivers/vfio/vfio_iommu_type1.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e357..f4e3b423a453 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2285,9 +2285,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 * testing if they're on the same bus_type.
 */
list_for_each_entry(d, >domain_list, next) {
-   if (d->domain->ops == domain->domain->ops &&
-   d->enforce_cache_coherency ==
-   domain->enforce_cache_coherency) {
+   if (d->domain->ops == domain->domain->ops) {
iommu_detach_group(domain->domain, group->iommu_group);
if (!iommu_attach_group(d->domain,
group->iommu_group)) {


Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Jason Gunthorpe via iommu
On Thu, Jun 23, 2022 at 05:00:44PM -0600, Alex Williamson wrote:

> > >> +struct vfio_device *vfio_device_get_from_iommu(struct iommu_group 
> > >> *iommu_group)
> > >> +{
> > >> +struct vfio_group *group = 
> > >> vfio_group_get_from_iommu(iommu_group);
> > >> +struct vfio_device *device;  
> > > 
> > > Check group for NULL.  
> > 
> > OK - FWIW in context this should only ever make sense to call with an 
> > iommu_group which has already been derived from a vfio_group, and I did 
> > initially consider a check with a WARN_ON(), but then decided that the 
> > unguarded dereference would be a sufficiently strong message. No problem 
> > with bringing that back to make it more defensive if that's what you prefer.
> 
> A while down the road, that's a bit too much implicit knowledge of the
> intent and single purpose of this function just to simply avoid a test.

I think we should just pass the 'struct vfio_group *' into the
attach_group op and have this API take that type in and forget the
vfio_group_get_from_iommu().

At this point there is little justification for
vfio_group_get_from_iommu() existing at all, it should be folded into
the one use in vfio_group_find_or_alloc() and the locking widened so
we don't have the unlock/alloc/lock race that requires it to be called
twice.

> I'd lean towards Kevin's idea that we could store bus_type on the
> vfio_group and pass that to type1, with the same assumptions we're
> making in the commit log that it's consistent, but that doesn't get us
> closer to the long term plan of dropping the bus_type interfaces
> AIUI.

Right, the point is to get a representative struct device here to use.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu_sva_bind_device question

2022-06-23 Thread Baolu Lu

On 2022/6/24 09:14, Jerry Snitselaar wrote:

On Fri, Jun 24, 2022 at 08:55:08AM +0800, Baolu Lu wrote:

On 2022/6/24 01:02, Jerry Snitselaar wrote:

Hi Baolu & Dave,

I noticed last night that on a Sapphire Rapids system if you boot without
intel_iommu=on, the idxd driver will crash during probe in 
iommu_sva_bind_device().
Should there be a sanity check before calling dev_iommu_ops(), or is the 
expectation
that the caller would verify it is safe to call? This seemed to be uncovered by
the combination of 3f6634d997db ("iommu: Use right way to retrieve iommu_ops"), 
and
42a1b73852c4 ("dmaengine: idxd: Separate user and kernel pasid enabling").

[   21.423729] BUG: kernel NULL pointer dereference, address: 0038
[   21.445108] #PF: supervisor read access in kernel mode
[   21.450912] #PF: error_code(0x) - not-present page
[   21.456706] PGD 0
[   21.459047] Oops:  [#1] PREEMPT SMP NOPTI
[   21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 
5.19.0-0.rc3.27.eln120.x86_64 #1
[   21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS 
EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021
[   21.464015] Workqueue: events work_for_cpu_fn
[   21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0
[   21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 
56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 <48> 8b 40 38 48 
8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00
[   21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296
[   21.464054] RAX:  RBX: ff1eadeec8a51000 RCX: 
[   21.464058] RDX: ff7245d9096b7e24 RSI:  RDI: ff1eadeec8a510d0
[   21.464060] RBP: ff1eadeec8a51000 R08: b1a12300 R09: ff1eadffbfce25b4
[   21.464062] R10:  R11: 0038 R12: c09f8000
[   21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: ff1eaddf54429000
[   21.464067] FS:  () GS:ff1eadee7f60() 
knlGS:
[   21.464070] CS:  0010 DS:  ES:  CR0: 80050033
[   21.464072] CR2: 0038 CR3: 0008c0e10006 CR4: 00771ef0
[   21.464074] DR0:  DR1:  DR2: 
[   21.464076] DR3:  DR6: fffe07f0 DR7: 0400
[   21.464078] PKRU: 5554
[   21.464079] Call Trace:
[   21.464083]  
[   21.464092]  idxd_pci_probe+0x259/0x1070 [idxd]
[   21.464121]  local_pci_probe+0x3e/0x80
[   21.464132]  work_for_cpu_fn+0x13/0x20
[   21.464136]  process_one_work+0x1c4/0x380
[   21.464143]  worker_thread+0x1ab/0x380
[   21.464147]  ? _raw_spin_lock_irqsave+0x23/0x50
[   21.464158]  ? process_one_work+0x380/0x380
[   21.464161]  kthread+0xe6/0x110
[   21.464168]  ? kthread_complete_and_exit+0x20/0x20
[   21.464172]  ret_from_fork+0x1f/0x30

I figure either there needs to be a check in iommu_sva_bind_device, or
idxd needs to check in idxd_enable_system_pasid that that
idxd->pdev->dev.iommu is not null before it tries calling iommu_sva_bind_device.


As documented around the iommu_sva_bind_device() interface:

  * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first,
to
  * initialize the required SVA features.

idxd->pdev->dev.iommu should be checked in there.

Dave, any thoughts?

Best regards,
baolu


Duh, sorry I missed that in the comments.

It calls iommu_dev_enable_feature(), but then goes into code that
calls iommu_sva_bind_device whether or not iommu_dev_enable_feature()
fails.

You also will get the following warning if you don't have scalable
mode enabled (either not enabled by default, or if enabled by default
and passed intel_iommu=on,sm_off):


If scalable mode is disabled, iommu_dev_enable_feature(IOMMU_SVA) will
return failure, hence driver should not call iommu_sva_bind_device().
I guess below will disappear if above is fixed in the idxd driver.

Best regards,
baolu



[   24.645784] idxd :6a:01.0: enabling device (0144 -> 0146)
[   24.645871] idxd :6a:01.0: Unable to turn on user SVA feature.
[   24.645932] [ cut here ]
[   24.645935] WARNING: CPU: 0 PID: 422 at drivers/iommu/intel/pasid.c:253 
intel_pasid_get_entry.isra.0+0xcd/0xe0
[   24.675872] Modules linked in: intel_uncore(+) drm_ttm_helper 
isst_if_mbox_pci(+) idxd(+) snd i2c_i801(+) isst_if_mmio ttm isst_if_common mei 
fjes(+) soundcore intel_vsec i2c_ismt i2c_smbus idxd_bus ipmi_ssif acpi_ipmi 
ipmi_si acpi_pad acpi_power_meter pfr_telemetry pfr_update vfat fat fuse xfs 
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel igc wmi 
pinctrl_emmitsburg ipmi_devintf ipmi_msghandler
[   24.716612] CPU: 0 PID: 422 Comm: kworker/0:2 Not tainted 
5.19.0-0.rc3.27.eln120.x86_64 #1
[   24.716621] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS 
EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021
[   24.716625] Workqueue: events work_for_cpu_fn
[   24.716645] RIP: 0010:intel_pasid_get_entry.isra.0+0xcd/0xe0
[   

Re: [PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-23 Thread Baolu Lu

On 2022/6/24 04:00, Nicolin Chen wrote:

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index e1cb51b9866c..5386d889429d 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -304,7 +304,7 @@ static int mtk_iommu_v1_attach_device(struct iommu_domain 
*domain, struct device
/* Only allow the domain created internally. */
mtk_mapping = data->mapping;
if (mtk_mapping->domain != domain)
-   return 0;
+   return -EMEDIUMTYPE;
  
  	if (!data->m4u_dom) {

data->m4u_dom = dom;


This change looks odd. It turns the return value from success to
failure. Is it a bug? If so, it should go through a separated fix patch.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu_sva_bind_device question

2022-06-23 Thread Jerry Snitselaar
On Fri, Jun 24, 2022 at 08:55:08AM +0800, Baolu Lu wrote:
> On 2022/6/24 01:02, Jerry Snitselaar wrote:
> > Hi Baolu & Dave,
> > 
> > I noticed last night that on a Sapphire Rapids system if you boot without
> > intel_iommu=on, the idxd driver will crash during probe in 
> > iommu_sva_bind_device().
> > Should there be a sanity check before calling dev_iommu_ops(), or is the 
> > expectation
> > that the caller would verify it is safe to call? This seemed to be 
> > uncovered by
> > the combination of 3f6634d997db ("iommu: Use right way to retrieve 
> > iommu_ops"), and
> > 42a1b73852c4 ("dmaengine: idxd: Separate user and kernel pasid enabling").
> > 
> > [   21.423729] BUG: kernel NULL pointer dereference, address: 
> > 0038
> > [   21.445108] #PF: supervisor read access in kernel mode
> > [   21.450912] #PF: error_code(0x) - not-present page
> > [   21.456706] PGD 0
> > [   21.459047] Oops:  [#1] PREEMPT SMP NOPTI
> > [   21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 
> > 5.19.0-0.rc3.27.eln120.x86_64 #1
> > [   21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, 
> > BIOS EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021
> > [   21.464015] Workqueue: events work_for_cpu_fn
> > [   21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0
> > [   21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 
> > 41 57 41 56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 
> > <48> 8b 40 38 48 8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00
> > [   21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296
> > [   21.464054] RAX:  RBX: ff1eadeec8a51000 RCX: 
> > 
> > [   21.464058] RDX: ff7245d9096b7e24 RSI:  RDI: 
> > ff1eadeec8a510d0
> > [   21.464060] RBP: ff1eadeec8a51000 R08: b1a12300 R09: 
> > ff1eadffbfce25b4
> > [   21.464062] R10:  R11: 0038 R12: 
> > c09f8000
> > [   21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: 
> > ff1eaddf54429000
> > [   21.464067] FS:  () GS:ff1eadee7f60() 
> > knlGS:
> > [   21.464070] CS:  0010 DS:  ES:  CR0: 80050033
> > [   21.464072] CR2: 0038 CR3: 0008c0e10006 CR4: 
> > 00771ef0
> > [   21.464074] DR0:  DR1:  DR2: 
> > 
> > [   21.464076] DR3:  DR6: fffe07f0 DR7: 
> > 0400
> > [   21.464078] PKRU: 5554
> > [   21.464079] Call Trace:
> > [   21.464083]  
> > [   21.464092]  idxd_pci_probe+0x259/0x1070 [idxd]
> > [   21.464121]  local_pci_probe+0x3e/0x80
> > [   21.464132]  work_for_cpu_fn+0x13/0x20
> > [   21.464136]  process_one_work+0x1c4/0x380
> > [   21.464143]  worker_thread+0x1ab/0x380
> > [   21.464147]  ? _raw_spin_lock_irqsave+0x23/0x50
> > [   21.464158]  ? process_one_work+0x380/0x380
> > [   21.464161]  kthread+0xe6/0x110
> > [   21.464168]  ? kthread_complete_and_exit+0x20/0x20
> > [   21.464172]  ret_from_fork+0x1f/0x30
> > 
> > I figure either there needs to be a check in iommu_sva_bind_device, or
> > idxd needs to check in idxd_enable_system_pasid that that
> > idxd->pdev->dev.iommu is not null before it tries calling 
> > iommu_sva_bind_device.
> 
> As documented around the iommu_sva_bind_device() interface:
> 
>  * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first,
> to
>  * initialize the required SVA features.
> 
> idxd->pdev->dev.iommu should be checked in there.
> 
> Dave, any thoughts?
> 
> Best regards,
> baolu

Duh, sorry I missed that in the comments.

It calls iommu_dev_enable_feature(), but then goes into code that
calls iommu_sva_bind_device whether or not iommu_dev_enable_feature()
fails.

You also will get the following warning if you don't have scalable
mode enabled (either not enabled by default, or if enabled by default
and passed intel_iommu=on,sm_off):

[   24.645784] idxd :6a:01.0: enabling device (0144 -> 0146)
[   24.645871] idxd :6a:01.0: Unable to turn on user SVA feature.
[   24.645932] [ cut here ]
[   24.645935] WARNING: CPU: 0 PID: 422 at drivers/iommu/intel/pasid.c:253 
intel_pasid_get_entry.isra.0+0xcd/0xe0
[   24.675872] Modules linked in: intel_uncore(+) drm_ttm_helper 
isst_if_mbox_pci(+) idxd(+) snd i2c_i801(+) isst_if_mmio ttm isst_if_common mei 
fjes(+) soundcore intel_vsec i2c_ismt i2c_smbus idxd_bus ipmi_ssif acpi_ipmi 
ipmi_si acpi_pad acpi_power_meter pfr_telemetry pfr_update vfat fat fuse xfs 
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel igc wmi 
pinctrl_emmitsburg ipmi_devintf ipmi_msghandler
[   24.716612] CPU: 0 PID: 422 Comm: kworker/0:2 Not tainted 
5.19.0-0.rc3.27.eln120.x86_64 #1
[   24.716621] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS 
EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021
[   24.716625] Workqueue: events work_for_cpu_fn
[   24.716645] RIP: 

Re: iommu_sva_bind_device question

2022-06-23 Thread Baolu Lu

On 2022/6/24 01:02, Jerry Snitselaar wrote:

Hi Baolu & Dave,

I noticed last night that on a Sapphire Rapids system if you boot without
intel_iommu=on, the idxd driver will crash during probe in 
iommu_sva_bind_device().
Should there be a sanity check before calling dev_iommu_ops(), or is the 
expectation
that the caller would verify it is safe to call? This seemed to be uncovered by
the combination of 3f6634d997db ("iommu: Use right way to retrieve iommu_ops"), 
and
42a1b73852c4 ("dmaengine: idxd: Separate user and kernel pasid enabling").

[   21.423729] BUG: kernel NULL pointer dereference, address: 0038
[   21.445108] #PF: supervisor read access in kernel mode
[   21.450912] #PF: error_code(0x) - not-present page
[   21.456706] PGD 0
[   21.459047] Oops:  [#1] PREEMPT SMP NOPTI
[   21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 
5.19.0-0.rc3.27.eln120.x86_64 #1
[   21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS 
EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021
[   21.464015] Workqueue: events work_for_cpu_fn
[   21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0
[   21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 
56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 <48> 8b 40 38 48 
8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00
[   21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296
[   21.464054] RAX:  RBX: ff1eadeec8a51000 RCX: 
[   21.464058] RDX: ff7245d9096b7e24 RSI:  RDI: ff1eadeec8a510d0
[   21.464060] RBP: ff1eadeec8a51000 R08: b1a12300 R09: ff1eadffbfce25b4
[   21.464062] R10:  R11: 0038 R12: c09f8000
[   21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: ff1eaddf54429000
[   21.464067] FS:  () GS:ff1eadee7f60() 
knlGS:
[   21.464070] CS:  0010 DS:  ES:  CR0: 80050033
[   21.464072] CR2: 0038 CR3: 0008c0e10006 CR4: 00771ef0
[   21.464074] DR0:  DR1:  DR2: 
[   21.464076] DR3:  DR6: fffe07f0 DR7: 0400
[   21.464078] PKRU: 5554
[   21.464079] Call Trace:
[   21.464083]  
[   21.464092]  idxd_pci_probe+0x259/0x1070 [idxd]
[   21.464121]  local_pci_probe+0x3e/0x80
[   21.464132]  work_for_cpu_fn+0x13/0x20
[   21.464136]  process_one_work+0x1c4/0x380
[   21.464143]  worker_thread+0x1ab/0x380
[   21.464147]  ? _raw_spin_lock_irqsave+0x23/0x50
[   21.464158]  ? process_one_work+0x380/0x380
[   21.464161]  kthread+0xe6/0x110
[   21.464168]  ? kthread_complete_and_exit+0x20/0x20
[   21.464172]  ret_from_fork+0x1f/0x30

I figure either there needs to be a check in iommu_sva_bind_device, or
idxd needs to check in idxd_enable_system_pasid that that
idxd->pdev->dev.iommu is not null before it tries calling iommu_sva_bind_device.


As documented around the iommu_sva_bind_device() interface:

 * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called 
first, to

 * initialize the required SVA features.

idxd->pdev->dev.iommu should be checked in there.

Dave, any thoughts?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread Saravana Kannan via iommu
On Thu, Jun 23, 2022 at 1:37 PM sascha hauer  wrote:
>
> On Thu, Jun 23, 2022 at 10:26:46AM -0700, Saravana Kannan wrote:
> > On Thu, Jun 23, 2022 at 3:05 AM sascha hauer  wrote:
> > >
> > > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > > > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > > enabled iommus and dmas dependency enforcement by default. On some
> > > > systems, this caused the console device's probe to get delayed until the
> > > > deferred_probe_timeout expires.
> > > >
> > > > We need consoles to work as soon as possible, so mark the console device
> > > > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > > > the probe of the console device for suppliers without drivers. The
> > > > driver can then make the decision on where it can probe without those
> > > > suppliers or defer its probe.
> > > >
> > > > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > > Reported-by: Sascha Hauer 
> > > > Reported-by: Peng Fan 
> > > > Signed-off-by: Saravana Kannan 
> > > > Tested-by: Peng Fan 
> > > > ---
> > > >  drivers/of/base.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > > index d4f98c8469ed..a19cd0c73644 100644
> > > > --- a/drivers/of/base.c
> > > > +++ b/drivers/of/base.c
> > > > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, 
> > > > u64 align))
> > > >   of_property_read_string(of_aliases, "stdout", 
> > > > );
> > > >   if (name)
> > > >   of_stdout = of_find_node_opts_by_path(name, 
> > > > _stdout_options);
> > > > + if (of_stdout)
> > > > + of_stdout->fwnode.flags |= 
> > > > FWNODE_FLAG_BEST_EFFORT;
> > >
> > > The device given in the stdout-path property doesn't necessarily have to
> > > be consistent with the console= parameter. The former is usually
> > > statically set in the device trees contained in the kernel while the
> > > latter is dynamically set by the bootloader. So if you change the
> > > console uart in the bootloader then you'll still run into this trap.
> > >
> > > It's problematic to consult only the device tree for dependencies. I
> > > found several examples of drivers in the tree for which dma support
> > > is optional. They use it if they can, but continue without it when
> > > not available. "hwlock" is another property which consider several
> > > drivers as optional. Also consider SoCs in early upstreaming phases
> > > when the device tree is merged with "dmas" or "hwlock" properties,
> > > but the corresponding drivers are not yet upstreamed. It's not nice
> > > to defer probing of all these devices for a long time.
> > >
> > > I wonder if it wouldn't be a better approach to just probe all devices
> > > and record the device(node) they are waiting on. Then you know that you
> > > don't need to probe them again until the device they are waiting for
> > > is available.
> >
> > That actually breaks things in a worse sense. There are cases where
> > the consumer driver is built in and the optional supplier driver is
> > loaded at boot. Without fw_devlink and the deferred probe timeout, we
> > end up probing the consumer with limited functionality. With the
> > current setup, sure we delay some probes a bit but at least everything
> > works with the right functionality. And you can reduce or remove the
> > delay if you want to optimize it.
>
> We have optional and mandatory resources. In this situation a driver has
> to decide what to do. Either it continues with limited resources or it
> defers probing. Some drivers try to allocate the optional resources at
> open time so that they are able to use them once they are available.  We
> could even think of an asynchronous callback into a driver when a
> resource becomes available. Whether we put this decision what is
> optional or not into the driver or in the framework doesn't make a
> difference to the problem, it is still the same: When a resource is not
> yet available we have no idea if and when it becomes available, if it's
> worth waiting for it or not.
>
> The difference is that with my proposal (which isn't actually mine but
> from my collegue Lucas) a driver can decide very fine grained how it
> wants to deal with the situation. With fw_devlink we try to put this
> intelligence into the framework and it seems there are quite some quirks
> necessary to get that running for everyone.

That's one possible solution, but for that to work, all drivers with
optional suppliers would need to be changed to take advantage of this
callback to work correctly when the optional suppliers become
available. We could add this callback, but it would be a long time
before the callback handles all/most cases of optional suppliers.

One of the goals of fw_devlink is so that people can stop having to
play initcall chicken where they try to tune their 

Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Alex Williamson
On Thu, 23 Jun 2022 13:23:05 +0100
Robin Murphy  wrote:

> On 2022-06-22 23:17, Alex Williamson wrote:
> > On Wed, 22 Jun 2022 13:04:11 +0100
> > Robin Murphy  wrote:
> >   
> >> Since IOMMU groups are mandatory for drivers to support, it stands to
> >> reason that any device which has been successfully be added to a group  
> > 
> > s/be //  
> 
> Oops.
> 
> >> must be on a bus supported by that IOMMU driver, and therefore a domain
> >> viable for any device in the group must be viable for all devices in
> >> the group. This already has to be the case for the IOMMU API's internal
> >> default domain, for instance. Thus even if the group contains devices on
> >> different buses, that can only mean that the IOMMU driver actually
> >> supports such an odd topology, and so without loss of generality we can
> >> expect the bus type of any device in a group to be suitable for IOMMU
> >> API calls.
> >>
> >> Replace vfio_bus_type() with a simple call to resolve an appropriate
> >> member device from which to then derive a bus type. This is also a step
> >> towards removing the vague bus-based interfaces from the IOMMU API, when
> >> we can subsequently switch to using this device directly.
> >>
> >> Furthermore, scrutiny reveals a lack of protection for the bus being
> >> removed while vfio_iommu_type1_attach_group() is using it; the reference
> >> that VFIO holds on the iommu_group ensures that data remains valid, but
> >> does not prevent the group's membership changing underfoot. Holding the
> >> vfio_device for as long as we need here also neatly solves this.
> >>
> >> Signed-off-by: Robin Murphy 
> >> ---
> >>
> >> After sleeping on it, I decided to type up the helper function approach
> >> to see how it looked in practice, and in doing so realised that with one
> >> more tweak it could also subsume the locking out of the common paths as
> >> well, so end up being a self-contained way for type1 to take care of its
> >> own concern, which I rather like.
> >>
> >>   drivers/vfio/vfio.c | 18 +-
> >>   drivers/vfio/vfio.h |  3 +++
> >>   drivers/vfio/vfio_iommu_type1.c | 30 +++---
> >>   3 files changed, 31 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> >> index 61e71c1154be..73bab04880d0 100644
> >> --- a/drivers/vfio/vfio.c
> >> +++ b/drivers/vfio/vfio.c
> >> @@ -448,7 +448,7 @@ static void vfio_group_get(struct vfio_group *group)
> >>* Device objects - create, release, get, put, search
> >>*/
> >>   /* Device reference always implies a group reference */
> >> -static void vfio_device_put(struct vfio_device *device)
> >> +void vfio_device_put(struct vfio_device *device)
> >>   {
> >>if (refcount_dec_and_test(>refcount))
> >>complete(>comp);
> >> @@ -475,6 +475,22 @@ static struct vfio_device 
> >> *vfio_group_get_device(struct vfio_group *group,
> >>return NULL;
> >>   }
> >>   
> >> +struct vfio_device *vfio_device_get_from_iommu(struct iommu_group 
> >> *iommu_group)
> >> +{
> >> +  struct vfio_group *group = vfio_group_get_from_iommu(iommu_group);
> >> +  struct vfio_device *device;  
> > 
> > Check group for NULL.  
> 
> OK - FWIW in context this should only ever make sense to call with an 
> iommu_group which has already been derived from a vfio_group, and I did 
> initially consider a check with a WARN_ON(), but then decided that the 
> unguarded dereference would be a sufficiently strong message. No problem 
> with bringing that back to make it more defensive if that's what you prefer.

A while down the road, that's a bit too much implicit knowledge of the
intent and single purpose of this function just to simply avoid a test.

> >> +
> >> +  mutex_lock(>device_lock);
> >> +  list_for_each_entry(device, >device_list, group_next) {
> >> +  if (vfio_device_try_get(device)) {
> >> +  mutex_unlock(>device_lock);
> >> +  return device;
> >> +  }
> >> +  }
> >> +  mutex_unlock(>device_lock);
> >> +  return NULL;  
> > 
> > No vfio_group_put() on either path.  
> 
> Oops indeed.
> 
> >> +}
> >> +
> >>   /*
> >>* VFIO driver API
> >>*/
> >> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> >> index a67130221151..e8f21e64541b 100644
> >> --- a/drivers/vfio/vfio.h
> >> +++ b/drivers/vfio/vfio.h
> >> @@ -70,3 +70,6 @@ struct vfio_iommu_driver_ops {
> >>   
> >>   int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
> >>   void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops 
> >> *ops);
> >> +
> >> +struct vfio_device *vfio_device_get_from_iommu(struct iommu_group 
> >> *iommu_group);
> >> +void vfio_device_put(struct vfio_device *device);
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >> b/drivers/vfio/vfio_iommu_type1.c
> >> index c13b9290e357..e38b8bfde677 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -1679,18 

Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Jason Gunthorpe via iommu
On Thu, Jun 23, 2022 at 01:23:05PM +0100, Robin Murphy wrote:

> So yes, technically we could implement an iommu_group_capable() and an
> iommu_group_domain_alloc(), which would still just internally resolve the
> IOMMU ops and instance data from a member device to perform the driver-level
> call, but once again it would be for the benefit of precisely one
> user. 

Benefit one user and come with a fairly complex locking situation to
boot.

Alex, I'd rather think about moving the type 1 code so that the iommu
attach happens during device FD creation (then we have a concrete
non-fake device), not during group FD opening.

That is the model we need for iommufd anyhow.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread sascha hauer
On Thu, Jun 23, 2022 at 10:26:46AM -0700, Saravana Kannan wrote:
> On Thu, Jun 23, 2022 at 3:05 AM sascha hauer  wrote:
> >
> > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > enabled iommus and dmas dependency enforcement by default. On some
> > > systems, this caused the console device's probe to get delayed until the
> > > deferred_probe_timeout expires.
> > >
> > > We need consoles to work as soon as possible, so mark the console device
> > > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > > the probe of the console device for suppliers without drivers. The
> > > driver can then make the decision on where it can probe without those
> > > suppliers or defer its probe.
> > >
> > > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > Reported-by: Sascha Hauer 
> > > Reported-by: Peng Fan 
> > > Signed-off-by: Saravana Kannan 
> > > Tested-by: Peng Fan 
> > > ---
> > >  drivers/of/base.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > index d4f98c8469ed..a19cd0c73644 100644
> > > --- a/drivers/of/base.c
> > > +++ b/drivers/of/base.c
> > > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
> > > align))
> > >   of_property_read_string(of_aliases, "stdout", 
> > > );
> > >   if (name)
> > >   of_stdout = of_find_node_opts_by_path(name, 
> > > _stdout_options);
> > > + if (of_stdout)
> > > + of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
> >
> > The device given in the stdout-path property doesn't necessarily have to
> > be consistent with the console= parameter. The former is usually
> > statically set in the device trees contained in the kernel while the
> > latter is dynamically set by the bootloader. So if you change the
> > console uart in the bootloader then you'll still run into this trap.
> >
> > It's problematic to consult only the device tree for dependencies. I
> > found several examples of drivers in the tree for which dma support
> > is optional. They use it if they can, but continue without it when
> > not available. "hwlock" is another property which consider several
> > drivers as optional. Also consider SoCs in early upstreaming phases
> > when the device tree is merged with "dmas" or "hwlock" properties,
> > but the corresponding drivers are not yet upstreamed. It's not nice
> > to defer probing of all these devices for a long time.
> >
> > I wonder if it wouldn't be a better approach to just probe all devices
> > and record the device(node) they are waiting on. Then you know that you
> > don't need to probe them again until the device they are waiting for
> > is available.
> 
> That actually breaks things in a worse sense. There are cases where
> the consumer driver is built in and the optional supplier driver is
> loaded at boot. Without fw_devlink and the deferred probe timeout, we
> end up probing the consumer with limited functionality. With the
> current setup, sure we delay some probes a bit but at least everything
> works with the right functionality. And you can reduce or remove the
> delay if you want to optimize it.

We have optional and mandatory resources. In this situation a driver has
to decide what to do. Either it continues with limited resources or it
defers probing. Some drivers try to allocate the optional resources at
open time so that they are able to use them once they are available.  We
could even think of an asynchronous callback into a driver when a
resource becomes available. Whether we put this decision what is
optional or not into the driver or in the framework doesn't make a
difference to the problem, it is still the same: When a resource is not
yet available we have no idea if and when it becomes available, if it's
worth waiting for it or not.

The difference is that with my proposal (which isn't actually mine but
from my collegue Lucas) a driver can decide very fine grained how it
wants to deal with the situation. With fw_devlink we try to put this
intelligence into the framework and it seems there are quite some quirks
necessary to get that running for everyone.

Anyway, we have fw_devlink now and actually I think the dependency graph
that we have with fw_devlink is quite nice to resolve the natural probe
order. But why do we have to put an extra penalty on drivers whose
resources are not yet available?  Probe devices with complete resources
as long as you find them, execute more initcalls as long as there are
any, but when there are no more left, you could start probing devices
with incomplete resources, why wait for another ten seconds?

For me it's no problem when the UART probes late, we have earlycon which
can be used to debug problems that arise before the UART probes, but
what nags is the ten seconds delay. zero would be a 

Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Alex Williamson
On Thu, 23 Jun 2022 08:46:45 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Thursday, June 23, 2022 6:17 AM
> >   
> > >
> > >   ret = -EIO;
> > > - domain->domain = iommu_domain_alloc(bus);
> > > + domain->domain = iommu_domain_alloc(iommu_api_dev->dev-
> > >bus);  
> > 
> > It makes sense to move away from a bus centric interface to iommu ops
> > and I can see that having a device interface when we have device level
> > address-ability within a group makes sense, but does it make sense to
> > only have that device level interface?  For example, if an iommu_group
> > is going to remain an aspect of the iommu subsystem, shouldn't we be
> > able to allocate a domain and test capabilities based on the group and
> > the iommu driver should have enough embedded information reachable
> > from
> > the struct iommu_group to do those things?  This "perform group level
> > operations based on an arbitrary device in the group" is pretty klunky.
> > Thanks,
> >   
> 
> This sounds a right thing to do.
> 
> btw another alternative which I'm thinking of is whether vfio_group
> can record the bus info when the first device is added to it in
> __vfio_register_dev(). Then we don't need a group interface from
> iommu to test if vfio is the only user having such requirement.

That might be more simple, but it's just another variation on vfio
picking an arbitrary device from a group to satisfy the iommu interface
rather than operating on an iommu subsystem provided object.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-23 Thread Nicolin Chen via iommu
The domain->ops validation was added, as a precaution, for mixed-driver
systems.

Per Robin's remarks,
* While bus_set_iommu() still exists, the core code prevents multiple
  drivers from registering, so we can't really run into a situation of
  having a mixed-driver system:
  https://lore.kernel.org/kvm/6e1280c5-4b22-ebb3-3912-6c72bc169...@arm.com/

* And there's plenty more significant problems than this to fix; in future
  when many can be permitted, we will rely on the IOMMU core code to check
  the domain->ops:
  https://lore.kernel.org/kvm/6575de6d-94ba-c427-5b1e-967750ddf...@arm.com/

So remove the check in VFIO for simplicity.

Reviewed-by: Kevin Tian 
Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f4e3b423a453..11be5f95580b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2277,29 +2277,19 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
domain->domain->ops->enforce_cache_coherency(
domain->domain);
 
-   /*
-* Try to match an existing compatible domain.  We don't want to
-* preclude an IOMMU driver supporting multiple bus_types and being
-* able to include different bus_types in the same IOMMU domain, so
-* we test whether the domains use the same iommu_ops rather than
-* testing if they're on the same bus_type.
-*/
+   /* Try to match an existing compatible domain */
list_for_each_entry(d, >domain_list, next) {
-   if (d->domain->ops == domain->domain->ops) {
-   iommu_detach_group(domain->domain, group->iommu_group);
-   if (!iommu_attach_group(d->domain,
-   group->iommu_group)) {
-   list_add(>next, >group_list);
-   iommu_domain_free(domain->domain);
-   kfree(domain);
-   goto done;
-   }
-
-   ret = iommu_attach_group(domain->domain,
-group->iommu_group);
-   if (ret)
-   goto out_domain;
+   iommu_detach_group(domain->domain, group->iommu_group);
+   if (!iommu_attach_group(d->domain, group->iommu_group)) {
+   list_add(>next, >group_list);
+   iommu_domain_free(domain->domain);
+   kfree(domain);
+   goto done;
}
+
+   ret = iommu_attach_group(domain->domain,  group->iommu_group);
+   if (ret)
+   goto out_domain;
}
 
vfio_test_domain_fgsp(domain);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/5] vfio/iommu_type1: Clean up update_dirty_scope in detach_group()

2022-06-23 Thread Nicolin Chen via iommu
All devices in emulated_iommu_groups have pinned_page_dirty_scope
set, so the update_dirty_scope in the first list_for_each_entry
is always false. Clean it up, and move the "if update_dirty_scope"
part from the detach_group_done routine to the domain_list part.

Suggested-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 11be5f95580b..b9ccb3cfac5d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2453,14 +2453,12 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
struct vfio_iommu *iommu = iommu_data;
struct vfio_domain *domain;
struct vfio_iommu_group *group;
-   bool update_dirty_scope = false;
LIST_HEAD(iova_copy);
 
mutex_lock(>lock);
list_for_each_entry(group, >emulated_iommu_groups, next) {
if (group->iommu_group != iommu_group)
continue;
-   update_dirty_scope = !group->pinned_page_dirty_scope;
list_del(>next);
kfree(group);
 
@@ -2469,7 +2467,8 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
WARN_ON(iommu->notifier.head);
vfio_iommu_unmap_unpin_all(iommu);
}
-   goto detach_group_done;
+   mutex_unlock(>lock);
+   return;
}
 
/*
@@ -2485,9 +2484,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
continue;
 
iommu_detach_group(domain->domain, group->iommu_group);
-   update_dirty_scope = !group->pinned_page_dirty_scope;
list_del(>next);
-   kfree(group);
/*
 * Group ownership provides privilege, if the group list is
 * empty, the domain goes away. If it's the last domain with
@@ -2510,6 +2507,16 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
vfio_iommu_aper_expand(iommu, _copy);
vfio_update_pgsize_bitmap(iommu);
}
+   /*
+* Removal of a group without dirty tracking may allow
+* the iommu scope to be promoted.
+*/
+   if (!group->pinned_page_dirty_scope) {
+   iommu->num_non_pinned_groups--;
+   if (iommu->dirty_page_tracking)
+   vfio_iommu_populate_bitmap_full(iommu);
+   }
+   kfree(group);
break;
}
 
@@ -2518,16 +2525,6 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
else
vfio_iommu_iova_free(_copy);
 
-detach_group_done:
-   /*
-* Removal of a group without dirty tracking may allow the iommu scope
-* to be promoted.
-*/
-   if (update_dirty_scope) {
-   iommu->num_non_pinned_groups--;
-   if (iommu->dirty_page_tracking)
-   vfio_iommu_populate_bitmap_full(iommu);
-   }
mutex_unlock(>lock);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 5/5] vfio/iommu_type1: Simplify group attachment

2022-06-23 Thread Nicolin Chen via iommu
Un-inline the domain specific logic from the attach/detach_group ops into
two paired functions vfio_iommu_alloc_attach_domain() and
vfio_iommu_detach_destroy_domain() that strictly deal with creating and
destroying struct vfio_domains.

Add the logic to check for EMEDIUMTYPE return code of iommu_attach_group()
and avoid the extra domain allocations and attach/detach sequences of the
old code. This allows properly detecting an actual attach error, like
-ENOMEM, vs treating all attach errors as an incompatible domain.

Co-developed-by: Jason Gunthorpe 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 321 +---
 1 file changed, 174 insertions(+), 147 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b9ccb3cfac5d..3ffa4e2d9d18 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2153,15 +2153,174 @@ static void vfio_iommu_iova_insert_copy(struct 
vfio_iommu *iommu,
list_splice_tail(iova_copy, iova);
 }
 
+static struct vfio_domain *
+vfio_iommu_alloc_attach_domain(struct bus_type *bus, struct vfio_iommu *iommu,
+  struct vfio_iommu_group *group,
+  struct list_head *group_resv_regions)
+{
+   struct iommu_domain *new_domain;
+   struct vfio_domain *domain;
+   phys_addr_t resv_msi_base;
+   int ret = 0;
+
+   /* Try to match an existing compatible domain */
+   list_for_each_entry (domain, >domain_list, next) {
+   ret = iommu_attach_group(domain->domain, group->iommu_group);
+   /* -EMEDIUMTYPE means an incompatible domain, so try next one */
+   if (ret == -EMEDIUMTYPE)
+   continue;
+   if (ret)
+   return ERR_PTR(ret);
+   goto done;
+   }
+
+   new_domain = iommu_domain_alloc(bus);
+   if (!new_domain)
+   return ERR_PTR(-EIO);
+
+   if (iommu->nesting) {
+   ret = iommu_enable_nesting(new_domain);
+   if (ret)
+   goto out_free_iommu_domain;
+   }
+
+   ret = iommu_attach_group(new_domain, group->iommu_group);
+   if (ret)
+   goto out_free_iommu_domain;
+
+   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+   if (!domain) {
+   ret = -ENOMEM;
+   goto out_detach;
+   }
+
+   domain->domain = new_domain;
+   vfio_test_domain_fgsp(domain);
+
+   /*
+* If the IOMMU can block non-coherent operations (ie PCIe TLPs with
+* no-snoop set) then VFIO always turns this feature on because on Intel
+* platforms it optimizes KVM to disable wbinvd emulation.
+*/
+   if (new_domain->ops->enforce_cache_coherency)
+   domain->enforce_cache_coherency =
+   new_domain->ops->enforce_cache_coherency(new_domain);
+
+   /* replay mappings on new domains */
+   ret = vfio_iommu_replay(iommu, domain);
+   if (ret)
+   goto out_free_domain;
+
+   if (vfio_iommu_has_sw_msi(group_resv_regions, _msi_base)) {
+   ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
+   if (ret && ret != -ENODEV)
+   goto out_free_domain;
+   }
+
+   INIT_LIST_HEAD(>group_list);
+   list_add(>next, >domain_list);
+   vfio_update_pgsize_bitmap(iommu);
+
+done:
+   list_add(>next, >group_list);
+
+   /*
+* An iommu backed group can dirty memory directly and therefore
+* demotes the iommu scope until it declares itself dirty tracking
+* capable via the page pinning interface.
+*/
+   iommu->num_non_pinned_groups++;
+
+   return domain;
+
+out_free_domain:
+   kfree(domain);
+out_detach:
+   iommu_detach_group(new_domain, group->iommu_group);
+out_free_iommu_domain:
+   iommu_domain_free(new_domain);
+   return ERR_PTR(ret);
+}
+
+static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
+{
+   struct rb_node *node;
+
+   while ((node = rb_first(>dma_list)))
+   vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
+}
+
+static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
+{
+   struct rb_node *n, *p;
+
+   n = rb_first(>dma_list);
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma;
+   long locked = 0, unlocked = 0;
+
+   dma = rb_entry(n, struct vfio_dma, node);
+   unlocked += vfio_unmap_unpin(iommu, dma, false);
+   p = rb_first(>pfn_list);
+   for (; p; p = rb_next(p)) {
+   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
+node);
+
+   if (!is_invalid_reserved_pfn(vpfn->pfn))
+   

[PATCH v3 2/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-06-23 Thread Nicolin Chen via iommu
From: Jason Gunthorpe 

The KVM mechanism for controlling wbinvd is based on OR of the coherency
property of all devices attached to a guest, no matter whether those
devices are attached to a single domain or multiple domains.

On the other hand, the benefit to using separate domains was that those
devices attached to domains supporting enforced cache coherency always
mapped with the attributes necessary to provide that feature, therefore
if a non-enforced domain was dropped, the associated group removal would
re-trigger an evaluation by KVM.

In practice however, the only known cases of such mixed domains included
an Intel IGD device behind an IOMMU lacking snoop control, where such
devices do not support hotplug, therefore this scenario lacks testing and
is not considered sufficiently relevant to support.

After all, KVM won't take advantage of trying to push a device that could
do enforced cache coherency to a dedicated domain vs re-using an existing
domain, which is non-coherent.

Simplify this code and eliminate the test. This removes the only logic
that needed to have a dummy domain attached prior to searching for a
matching domain and simplifies the next patches.

It's unclear whether we want to further optimize the Intel driver to
update the domain coherency after a device is detached from it, at
least not before KVM can be verified to handle such dynamics in related
emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality
we don't see an usage requiring such optimization as the only device
which imposes such non-coherency is Intel GPU which even doesn't
support hotplug/hot remove.

Signed-off-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e357..f4e3b423a453 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2285,9 +2285,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 * testing if they're on the same bus_type.
 */
list_for_each_entry(d, >domain_list, next) {
-   if (d->domain->ops == domain->domain->ops &&
-   d->enforce_cache_coherency ==
-   domain->enforce_cache_coherency) {
+   if (d->domain->ops == domain->domain->ops) {
iommu_detach_group(domain->domain, group->iommu_group);
if (!iommu_attach_group(d->domain,
group->iommu_group)) {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-23 Thread Nicolin Chen via iommu
Cases like VFIO wish to attach a device to an existing domain that was
not allocated specifically from the device. This raises a condition
where the IOMMU driver can fail the domain attach because the domain and
device are incompatible with each other.

This is a soft failure that can be resolved by using a different domain.

Provide a dedicated errno from the IOMMU driver during attach that the
reason attached failed is because of domain incompatability. EMEDIUMTYPE
is chosen because it is never used within the iommu subsystem today and
evokes a sense that the 'medium' aka the domain is incompatible.

VFIO can use this to know attach is a soft failure and it should continue
searching. Otherwise the attach will be a hard failure and VFIO will
return the code to userspace.

Update all drivers to return EMEDIUMTYPE in their failure paths that are
related to domain incompatability. Also remove adjacent error prints for
these soft failures, to prevent a kernel log spam, since -EMEDIUMTYPE is
clear enough to indicate an incompatability error.

Add kdocs describing this behavior.

Suggested-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Signed-off-by: Nicolin Chen 
---
 drivers/iommu/amd/iommu.c   |  2 +-
 drivers/iommu/apple-dart.c  |  4 +--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |  6 ++---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c |  9 ++-
 drivers/iommu/intel/iommu.c | 10 +++-
 drivers/iommu/iommu.c   | 28 +
 drivers/iommu/ipmmu-vmsa.c  |  4 +--
 drivers/iommu/mtk_iommu_v1.c|  2 +-
 drivers/iommu/omap-iommu.c  |  3 +--
 drivers/iommu/s390-iommu.c  |  2 +-
 drivers/iommu/sprd-iommu.c  |  6 ++---
 drivers/iommu/tegra-gart.c  |  2 +-
 drivers/iommu/virtio-iommu.c|  3 +--
 14 files changed, 49 insertions(+), 47 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 840831d5d2ad..ad499658a6b6 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1662,7 +1662,7 @@ static int attach_device(struct device *dev,
if (domain->flags & PD_IOMMUV2_MASK) {
struct iommu_domain *def_domain = iommu_get_dma_domain(dev);
 
-   ret = -EINVAL;
+   ret = -EMEDIUMTYPE;
if (def_domain->type != IOMMU_DOMAIN_IDENTITY)
goto out;
 
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 8af0242a90d9..e58dc310afd7 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -495,10 +495,10 @@ static int apple_dart_attach_dev(struct iommu_domain 
*domain,
 
if (cfg->stream_maps[0].dart->force_bypass &&
domain->type != IOMMU_DOMAIN_IDENTITY)
-   return -EINVAL;
+   return -EMEDIUMTYPE;
if (!cfg->stream_maps[0].dart->supports_bypass &&
domain->type == IOMMU_DOMAIN_IDENTITY)
-   return -EINVAL;
+   return -EMEDIUMTYPE;
 
ret = apple_dart_finalize_domain(domain, cfg);
if (ret)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 88817a3376ef..5b64138f549d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2420,24 +2420,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
goto out_unlock;
}
} else if (smmu_domain->smmu != smmu) {
-   dev_err(dev,
-   "cannot attach to SMMU %s (upstream of %s)\n",
-   dev_name(smmu_domain->smmu->dev),
-   dev_name(smmu->dev));
-   ret = -ENXIO;
+   ret = -EMEDIUMTYPE;
goto out_unlock;
} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 &&
   master->ssid_bits != smmu_domain->s1_cfg.s1cdmax) {
-   dev_err(dev,
-   "cannot attach to incompatible domain (%u SSID bits != 
%u)\n",
-   smmu_domain->s1_cfg.s1cdmax, master->ssid_bits);
-   ret = -EINVAL;
+   ret = -EMEDIUMTYPE;
goto out_unlock;
} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 &&
   smmu_domain->stall_enabled != master->stall_enabled) {
-   dev_err(dev, "cannot attach to stall-%s domain\n",
-   smmu_domain->stall_enabled ? "enabled" : "disabled");
-   ret = -EINVAL;
+   ret = -EMEDIUMTYPE;
goto out_unlock;
}
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..072cac5ab5a4 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c

[PATCH v3 0/5] Simplify vfio_iommu_type1 attach/detach routine

2022-06-23 Thread Nicolin Chen via iommu
This is a preparatory series for IOMMUFD v2 patches. It enforces error
code -EMEDIUMTYPE in iommu_attach_device() and iommu_attach_group() when
an IOMMU domain and a device/group are incompatible. It also drops the
useless domain->ops check since it won't fail in current environment.

These allow VFIO iommu code to simplify its group attachment routine, by
avoiding the extra IOMMU domain allocations and attach/detach sequences
of the old code.

Worths mentioning the exact match for enforce_cache_coherency is removed
with this series, since there's very less value in doing that since KVM
won't be able to take advantage of it -- this just wastes domain memory.
Instead, we rely on Intel IOMMU driver taking care of that internally.

This is on github:
https://github.com/nicolinc/iommufd/commits/vfio_iommu_attach

Changelog
v3:
 * Dropped all dev_err since -EMEDIUMTYPE clearly indicates what error.
 * Updated commit message of enforce_cache_coherency removing patch.
 * Updated commit message of domain->ops removing patch.
 * Replaced "goto out_unlock" with simply mutex_unlock() and return.
 * Added a line of comments for -EMEDIUMTYPE return check.
 * Moved iommu_get_msi_cookie() into alloc_attach_domain() as a cookie
   should be logically tied to the lifetime of a domain itself.
 * Added Kevin's "Reviewed-by".
v2: https://lore.kernel.org/kvm/20220616000304.23890-1-nicol...@nvidia.com/
 * Added -EMEDIUMTYPE to more IOMMU drivers that fit the category.
 * Changed dev_err to dev_dbg for -EMEDIUMTYPE to avoid kernel log spam.
 * Dropped iommu_ops patch, and removed domain->ops in VFIO directly,
   since there's no mixed-driver use case that would fail the sanity.
 * Updated commit log of the patch removing enforce_cache_coherency.
 * Fixed a misplace of "num_non_pinned_groups--" in detach_group patch.
 * Moved "num_non_pinned_groups++" in PATCH-5 to the common path between
   domain-reusing and new-domain pathways, like the code previously did.
 * Fixed a typo in EMEDIUMTYPE patch.
v1: https://lore.kernel.org/kvm/20220606061927.26049-1-nicol...@nvidia.com/

Jason Gunthorpe (1):
  vfio/iommu_type1: Prefer to reuse domains vs match enforced cache
coherency

Nicolin Chen (4):
  iommu: Return -EMEDIUMTYPE for incompatible domain and device/group
  vfio/iommu_type1: Remove the domain->ops comparison
  vfio/iommu_type1: Clean up update_dirty_scope in detach_group()
  vfio/iommu_type1: Simplify group attachment

 drivers/iommu/amd/iommu.c   |   2 +-
 drivers/iommu/apple-dart.c  |   4 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  15 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |   6 +-
 drivers/iommu/arm/arm-smmu/qcom_iommu.c |   9 +-
 drivers/iommu/intel/iommu.c |  10 +-
 drivers/iommu/iommu.c   |  28 ++
 drivers/iommu/ipmmu-vmsa.c  |   4 +-
 drivers/iommu/mtk_iommu_v1.c|   2 +-
 drivers/iommu/omap-iommu.c  |   3 +-
 drivers/iommu/s390-iommu.c  |   2 +-
 drivers/iommu/sprd-iommu.c  |   6 +-
 drivers/iommu/tegra-gart.c  |   2 +-
 drivers/iommu/virtio-iommu.c|   3 +-
 drivers/vfio/vfio_iommu_type1.c | 340 ++--
 15 files changed, 225 insertions(+), 211 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread Ahmad Fatoum
Hello Saravana,

On 23.06.22 19:26, Saravana Kannan wrote:
> On Thu, Jun 23, 2022 at 3:05 AM sascha hauer  wrote:
>>
>> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
>>> Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
>>> enabled iommus and dmas dependency enforcement by default. On some
>>> systems, this caused the console device's probe to get delayed until the
>>> deferred_probe_timeout expires.
>>>
>>> We need consoles to work as soon as possible, so mark the console device
>>> node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
>>> the probe of the console device for suppliers without drivers. The
>>> driver can then make the decision on where it can probe without those
>>> suppliers or defer its probe.
>>>
>>> Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
>>> Reported-by: Sascha Hauer 
>>> Reported-by: Peng Fan 
>>> Signed-off-by: Saravana Kannan 
>>> Tested-by: Peng Fan 
>>> ---
>>>  drivers/of/base.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>>> index d4f98c8469ed..a19cd0c73644 100644
>>> --- a/drivers/of/base.c
>>> +++ b/drivers/of/base.c
>>> @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
>>> align))
>>>   of_property_read_string(of_aliases, "stdout", );
>>>   if (name)
>>>   of_stdout = of_find_node_opts_by_path(name, 
>>> _stdout_options);
>>> + if (of_stdout)
>>> + of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
>>
>> The device given in the stdout-path property doesn't necessarily have to
>> be consistent with the console= parameter. The former is usually
>> statically set in the device trees contained in the kernel while the
>> latter is dynamically set by the bootloader. So if you change the
>> console uart in the bootloader then you'll still run into this trap.
>>
>> It's problematic to consult only the device tree for dependencies. I
>> found several examples of drivers in the tree for which dma support
>> is optional. They use it if they can, but continue without it when
>> not available. "hwlock" is another property which consider several
>> drivers as optional. Also consider SoCs in early upstreaming phases
>> when the device tree is merged with "dmas" or "hwlock" properties,
>> but the corresponding drivers are not yet upstreamed. It's not nice
>> to defer probing of all these devices for a long time.
>>
>> I wonder if it wouldn't be a better approach to just probe all devices
>> and record the device(node) they are waiting on. Then you know that you
>> don't need to probe them again until the device they are waiting for
>> is available.
> 
> That actually breaks things in a worse sense. There are cases where
> the consumer driver is built in and the optional supplier driver is
> loaded at boot. Without fw_devlink and the deferred probe timeout, we
> end up probing the consumer with limited functionality. With the
> current setup, sure we delay some probes a bit but at least everything
> works with the right functionality. And you can reduce or remove the
> delay if you want to optimize it.

I have a system that doesn't use stdout-path and has the bootloader
set console= either to ttynull when secure booting or to an UART
when booting normally. How would I optimize the kernel to avoid
my UART being loaded after DMA controller probe without touching
the bootloader?

Cheers,
Ahmad

> 
> -Saravana
> 
> 


-- 
Pengutronix e.K.   | |
Steuerwalder Str. 21   | http://www.pengutronix.de/  |
31137 Hildesheim, Germany  | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread Saravana Kannan via iommu
On Thu, Jun 23, 2022 at 10:36 AM Ahmad Fatoum  wrote:
>
> Hello Saravana,
>
> On 23.06.22 19:26, Saravana Kannan wrote:
> > On Thu, Jun 23, 2022 at 3:05 AM sascha hauer  wrote:
> >>
> >> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> >>> Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> >>> enabled iommus and dmas dependency enforcement by default. On some
> >>> systems, this caused the console device's probe to get delayed until the
> >>> deferred_probe_timeout expires.
> >>>
> >>> We need consoles to work as soon as possible, so mark the console device
> >>> node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> >>> the probe of the console device for suppliers without drivers. The
> >>> driver can then make the decision on where it can probe without those
> >>> suppliers or defer its probe.
> >>>
> >>> Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> >>> Reported-by: Sascha Hauer 
> >>> Reported-by: Peng Fan 
> >>> Signed-off-by: Saravana Kannan 
> >>> Tested-by: Peng Fan 
> >>> ---
> >>>  drivers/of/base.c | 2 ++
> >>>  1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/drivers/of/base.c b/drivers/of/base.c
> >>> index d4f98c8469ed..a19cd0c73644 100644
> >>> --- a/drivers/of/base.c
> >>> +++ b/drivers/of/base.c
> >>> @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
> >>> align))
> >>>   of_property_read_string(of_aliases, "stdout", 
> >>> );
> >>>   if (name)
> >>>   of_stdout = of_find_node_opts_by_path(name, 
> >>> _stdout_options);
> >>> + if (of_stdout)
> >>> + of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
> >>
> >> The device given in the stdout-path property doesn't necessarily have to
> >> be consistent with the console= parameter. The former is usually
> >> statically set in the device trees contained in the kernel while the
> >> latter is dynamically set by the bootloader. So if you change the
> >> console uart in the bootloader then you'll still run into this trap.
> >>
> >> It's problematic to consult only the device tree for dependencies. I
> >> found several examples of drivers in the tree for which dma support
> >> is optional. They use it if they can, but continue without it when
> >> not available. "hwlock" is another property which consider several
> >> drivers as optional. Also consider SoCs in early upstreaming phases
> >> when the device tree is merged with "dmas" or "hwlock" properties,
> >> but the corresponding drivers are not yet upstreamed. It's not nice
> >> to defer probing of all these devices for a long time.
> >>
> >> I wonder if it wouldn't be a better approach to just probe all devices
> >> and record the device(node) they are waiting on. Then you know that you
> >> don't need to probe them again until the device they are waiting for
> >> is available.
> >
> > That actually breaks things in a worse sense. There are cases where
> > the consumer driver is built in and the optional supplier driver is
> > loaded at boot. Without fw_devlink and the deferred probe timeout, we
> > end up probing the consumer with limited functionality. With the
> > current setup, sure we delay some probes a bit but at least everything
> > works with the right functionality. And you can reduce or remove the
> > delay if you want to optimize it.
>
> I have a system that doesn't use stdout-path and has the bootloader
> set console= either to ttynull when secure booting or to an UART
> when booting normally. How would I optimize the kernel to avoid
> my UART being loaded after DMA controller probe without touching
> the bootloader?
>

Thanks for the report Ahmad. I think someone else reported a similar
thing in another thread. I plan to take a look at it. It should be
possible to find the device and set the flag for those devices too.

-Saravana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread Saravana Kannan via iommu
On Thu, Jun 23, 2022 at 9:39 AM Andy Shevchenko
 wrote:
>
> On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
>
> ...
>
> > I wonder if it wouldn't be a better approach to just probe all devices
> > and record the device(node) they are waiting on. Then you know that you
> > don't need to probe them again until the device they are waiting for
> > is available.
>
> There may be no device, but resource. And we become again to the something 
> like
> deferred probe ugly hack.
>
> The real solution is to rework device driver model in the kernel that it will
> create a graph of dependencies and then simply follow it. But actually it 
> should
> be more than 1 graph, because there are resources and there are power, clock 
> and
> resets that may be orthogonal to the higher dependencies (like driver X 
> provides
> a resource to driver Y).

We already do this with fw_devlink for DT based systems and we do
effectively just probe the devices in graph order (by deferring any
attempts that happen too early and before it even gets to the driver).
The problem is the knowledge of what's considered an optional vs
mandatory dependency and that's affected by the global state of driver
support in the kernel.

-Saravana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread Saravana Kannan via iommu
On Thu, Jun 23, 2022 at 3:05 AM sascha hauer  wrote:
>
> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > enabled iommus and dmas dependency enforcement by default. On some
> > systems, this caused the console device's probe to get delayed until the
> > deferred_probe_timeout expires.
> >
> > We need consoles to work as soon as possible, so mark the console device
> > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > the probe of the console device for suppliers without drivers. The
> > driver can then make the decision on where it can probe without those
> > suppliers or defer its probe.
> >
> > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > Reported-by: Sascha Hauer 
> > Reported-by: Peng Fan 
> > Signed-off-by: Saravana Kannan 
> > Tested-by: Peng Fan 
> > ---
> >  drivers/of/base.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > index d4f98c8469ed..a19cd0c73644 100644
> > --- a/drivers/of/base.c
> > +++ b/drivers/of/base.c
> > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
> > align))
> >   of_property_read_string(of_aliases, "stdout", );
> >   if (name)
> >   of_stdout = of_find_node_opts_by_path(name, 
> > _stdout_options);
> > + if (of_stdout)
> > + of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
>
> The device given in the stdout-path property doesn't necessarily have to
> be consistent with the console= parameter. The former is usually
> statically set in the device trees contained in the kernel while the
> latter is dynamically set by the bootloader. So if you change the
> console uart in the bootloader then you'll still run into this trap.
>
> It's problematic to consult only the device tree for dependencies. I
> found several examples of drivers in the tree for which dma support
> is optional. They use it if they can, but continue without it when
> not available. "hwlock" is another property which consider several
> drivers as optional. Also consider SoCs in early upstreaming phases
> when the device tree is merged with "dmas" or "hwlock" properties,
> but the corresponding drivers are not yet upstreamed. It's not nice
> to defer probing of all these devices for a long time.
>
> I wonder if it wouldn't be a better approach to just probe all devices
> and record the device(node) they are waiting on. Then you know that you
> don't need to probe them again until the device they are waiting for
> is available.

That actually breaks things in a worse sense. There are cases where
the consumer driver is built in and the optional supplier driver is
loaded at boot. Without fw_devlink and the deferred probe timeout, we
end up probing the consumer with limited functionality. With the
current setup, sure we delay some probes a bit but at least everything
works with the right functionality. And you can reduce or remove the
delay if you want to optimize it.

-Saravana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread Rafael J. Wysocki
On Thu, Jun 23, 2022 at 6:39 PM Andy Shevchenko
 wrote:
>
> On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
>
> ...
>
> > I wonder if it wouldn't be a better approach to just probe all devices
> > and record the device(node) they are waiting on. Then you know that you
> > don't need to probe them again until the device they are waiting for
> > is available.
>
> There may be no device, but resource. And we become again to the something 
> like
> deferred probe ugly hack.
>
> The real solution is to rework device driver model in the kernel that it will
> create a graph of dependencies and then simply follow it. But actually it 
> should
> be more than 1 graph, because there are resources and there are power, clock 
> and
> resets that may be orthogonal to the higher dependencies (like driver X 
> provides
> a resource to driver Y).

There is one graph, or it wouldn't be possible to shut down the system orderly.

The problem is that this graph is generally dynamic, especially during
system init, and following dependencies in transient states is
generally hard.

Device links allow the already known dependencies to be recorded and
taken into account later, so we already have a graph for those.

The unknown dependencies obviously cannot be used for creating a graph
of any sort, though, and here we are in the business of guessing what
the unknown dependencies may be IIUC.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


iommu_sva_bind_device question

2022-06-23 Thread Jerry Snitselaar
Hi Baolu & Dave,

I noticed last night that on a Sapphire Rapids system if you boot without
intel_iommu=on, the idxd driver will crash during probe in 
iommu_sva_bind_device().
Should there be a sanity check before calling dev_iommu_ops(), or is the 
expectation
that the caller would verify it is safe to call? This seemed to be uncovered by
the combination of 3f6634d997db ("iommu: Use right way to retrieve iommu_ops"), 
and
42a1b73852c4 ("dmaengine: idxd: Separate user and kernel pasid enabling").

[   21.423729] BUG: kernel NULL pointer dereference, address: 0038 
[   21.445108] #PF: supervisor read access in kernel mode 
[   21.450912] #PF: error_code(0x) - not-present page 
[   21.456706] PGD 0  
[   21.459047] Oops:  [#1] PREEMPT SMP NOPTI 
[   21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 
5.19.0-0.rc3.27.eln120.x86_64 #1 
[   21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS 
EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021 
[   21.464015] Workqueue: events work_for_cpu_fn 
[   21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0 
[   21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 
57 41 56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 <48> 8b 40 
38 48 8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00 
[   21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296 
[   21.464054] RAX:  RBX: ff1eadeec8a51000 RCX: 
 
[   21.464058] RDX: ff7245d9096b7e24 RSI:  RDI: 
ff1eadeec8a510d0 
[   21.464060] RBP: ff1eadeec8a51000 R08: b1a12300 R09: 
ff1eadffbfce25b4 
[   21.464062] R10:  R11: 0038 R12: 
c09f8000 
[   21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: 
ff1eaddf54429000 
[   21.464067] FS:  () GS:ff1eadee7f60() 
knlGS: 
[   21.464070] CS:  0010 DS:  ES:  CR0: 80050033 
[   21.464072] CR2: 0038 CR3: 0008c0e10006 CR4: 
00771ef0 
[   21.464074] DR0:  DR1:  DR2: 
 
[   21.464076] DR3:  DR6: fffe07f0 DR7: 
0400 
[   21.464078] PKRU: 5554 
[   21.464079] Call Trace: 
[   21.464083]   
[   21.464092]  idxd_pci_probe+0x259/0x1070 [idxd] 
[   21.464121]  local_pci_probe+0x3e/0x80 
[   21.464132]  work_for_cpu_fn+0x13/0x20 
[   21.464136]  process_one_work+0x1c4/0x380 
[   21.464143]  worker_thread+0x1ab/0x380 
[   21.464147]  ? _raw_spin_lock_irqsave+0x23/0x50 
[   21.464158]  ? process_one_work+0x380/0x380 
[   21.464161]  kthread+0xe6/0x110 
[   21.464168]  ? kthread_complete_and_exit+0x20/0x20 
[   21.464172]  ret_from_fork+0x1f/0x30

I figure either there needs to be a check in iommu_sva_bind_device, or
idxd needs to check in idxd_enable_system_pasid that that
idxd->pdev->dev.iommu is not null before it tries calling iommu_sva_bind_device.

Regards,
Jerry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread Andy Shevchenko
On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:

...

> I wonder if it wouldn't be a better approach to just probe all devices
> and record the device(node) they are waiting on. Then you know that you
> don't need to probe them again until the device they are waiting for
> is available.

There may be no device, but resource. And we become again to the something like
deferred probe ugly hack.

The real solution is to rework device driver model in the kernel that it will
create a graph of dependencies and then simply follow it. But actually it should
be more than 1 graph, because there are resources and there are power, clock and
resets that may be orthogonal to the higher dependencies (like driver X provides
a resource to driver Y).

-- 
With Best Regards,
Andy Shevchenko


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: New IOMMU mailing list coming

2022-06-23 Thread Jason Gunthorpe via iommu
On Tue, Jun 14, 2022 at 10:30:20AM +0200, Jörg Rödel wrote:
> Hi all,
> 
> after several problems with the current IOMMU mailing list (no DKIM
> support, poor b4 interoperability) we have decided to move the IOMMU
> development discussions to a new list hosted at lists.linux.dev.
> 
> The new list is up and running already, to subscribe please send an
> email to iommu+subscr...@lists.linux.dev. It is not yet archived, but
> there will be a hard switch between the old and the new list on
> 
>   July 5th, 2022
> 
> After that date the old list will no longer work and the archive will
> switch to the new mailing list.

Nice! Thanks for doing it

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH V4 1/1] swiotlb: Split up single swiotlb lock

2022-06-23 Thread Tianyu Lan

On 6/22/2022 6:54 PM, Christoph Hellwig wrote:

Thanks,

this looks pretty good to me.  A few comments below:



Thanks for your review.


On Fri, Jun 17, 2022 at 10:47:41AM -0400, Tianyu Lan wrote:

+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int index;
+   spinlock_t lock;
+};


This can go into swiotlb.c.


struct io_tlb_area is used in the struct io_tlb_mem.




+void __init swiotlb_adjust_nareas(unsigned int nareas);


And this should be marked static.


+#define DEFAULT_NUM_AREAS 1


I'd drop this define, the magic 1 and a > 1 comparism seems to
convey how it is used much better as the checks aren't about default
or not, but about larger than one.

I also think that we want some good way to size the default, e.g.
by number of CPUs or memory size.


swiotlb_adjust_nareas() is exposed to platforms to set area number.
When swiotlb_init() is called, smp_init() isn't called at that point and
so standard API of checking cpu number (e.g, num_online_cpus()) doesn't
work. Platforms may have other ways to get cpu number(e.g x86 may ACPI
MADT table entries to get cpu nubmer) and set area number. I will post 
following patch to set cpu number via swiotlb_adjust_nareas(),





+void __init swiotlb_adjust_nareas(unsigned int nareas)
+{
+   if (!is_power_of_2(nareas)) {
+   pr_err("swiotlb: Invalid areas parameter %d.\n", nareas);
+   return;
+   }
+
+   default_nareas = nareas;
+
+   pr_info("area num %d.\n", nareas);
+   /* Round up number of slabs to the next power of 2.
+* The last area is going be smaller than the rest if
+* default_nslabs is not power of two.
+*/


Please follow the normal kernel comment style with a /* on its own line.



OK. Will update.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/3] scsi: BusLogic remove bus_to_virt

2022-06-23 Thread Arnd Bergmann
On Tue, Jun 21, 2022 at 11:56 PM Khalid Aziz  wrote:
> >   while ((comp_code = next_inbox->comp_code) != BLOGIC_INBOX_FREE) {
> > - /*
> > -We are only allowed to do this because we limit our
> > -architectures we run on to machines where bus_to_virt(
> > -actually works.  There *needs* to be a dma_addr_to_virt()
> > -in the new PCI DMA mapping interface to replace
> > -bus_to_virt() or else this code is going to become very
> > -innefficient.
> > -  */
> > - struct blogic_ccb *ccb =
> > - (struct blogic_ccb *) bus_to_virt(next_inbox->ccb);
> > + struct blogic_ccb *ccb = blogic_inbox_to_ccb(adapter, 
> > adapter->next_inbox);
>
> This change looks good enough as workaround to not use bus_to_virt() for
> now. There are two problems I see though. One, I do worry about
> blogic_inbox_to_ccb() returning NULL for ccb which should not happen
> unless the mailbox pointer was corrupted which would indicate a bigger
> problem. Nevertheless a NULL pointer causing kernel panic concerns me.
> How about adding a check before we dereference ccb?

Right, makes sense

> Second, with this patch applied, I am seeing errors from the driver:
>
> =
> [ 1623.902685]  sdb: sdb1 sdb2
> [ 1623.903245] sd 2:0:0:0: [sdb] Attached SCSI disk
> [ 1623.911000] scsi2: Illegal CCB #76 status 2 in Incoming Mailbox
> [ 1623.911005] scsi2: Illegal CCB #76 status 2 in Incoming Mailbox
> [ 1623.911070] scsi2: Illegal CCB #79 status 2 in Incoming Mailbox
> [ 1651.458008] scsi2: Warning: Partition Table appears to have Geometry
> 256/63 which is
> [ 1651.458013] scsi2: not compatible with current BusLogic Host Adapter
> Geometry 255/63
> [ 1658.797609] scsi2: Resetting BusLogic BT-958D Failed
> [ 1659.533208] sd 2:0:0:0: Device offlined - not ready after error recovery
> [ 1659.51] sd 2:0:0:0: Device offlined - not ready after error recovery
> [ 1659.53] sd 2:0:0:0: Device offlined - not ready after error recovery
> [ 1659.533342] sd 2:0:0:0: [sdb] tag#101 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=35s
> [ 1659.533345] sd 2:0:0:0: [sdb] tag#101 CDB: Read(10) 28 00 00 00 00 28
> 00 00 10 00
> [ 1659.533346] I/O error, dev sdb, sector 40 op 0x0:(READ) flags 0x80700
> phys_seg 1 prio class 0
>
> =
>
> This is on VirtualBox using emulated BusLogic adapter.
>
> This patch needs more refinement.

Thanks for testing the patch, too bad it didn't work. At least I quickly found
one stupid mistake on my end, hope it's the only one.

Can you test it again with this patch on top?

diff --git a/drivers/scsi/BusLogic.c b/drivers/scsi/BusLogic.c
index d057abfcdd5c..9e67f2ee25ee 100644
--- a/drivers/scsi/BusLogic.c
+++ b/drivers/scsi/BusLogic.c
@@ -2554,8 +2554,14 @@ static void blogic_scan_inbox(struct
blogic_adapter *adapter)
enum blogic_cmplt_code comp_code;

while ((comp_code = next_inbox->comp_code) != BLOGIC_INBOX_FREE) {
-   struct blogic_ccb *ccb = blogic_inbox_to_ccb(adapter,
adapter->next_inbox);
-   if (comp_code != BLOGIC_CMD_NOTFOUND) {
+   struct blogic_ccb *ccb = blogic_inbox_to_ccb(adapter,
next_inbox);
+   if (!ccb) {
+   /*
+* This should never happen, unless the CCB list is
+* corrupted in memory.
+*/
+   blogic_warn("Could not find CCB for dma
address 0x%x\n", adapter, next_inbox->ccb);
+   } else if (comp_code != BLOGIC_CMD_NOTFOUND) {
if (ccb->status == BLOGIC_CCB_ACTIVE ||
ccb->status == BLOGIC_CCB_RESET) {
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: use the correct size for dma_set_encrypted()

2022-06-23 Thread Christoph Hellwig
On Thu, Jun 23, 2022 at 07:00:58AM +, Dexuan Cui wrote:
> It looks like commit 4a37f3dd9a831 fixed a different issue?
> 
> Here my patch is for the latest mainline:
> 
> In dma_direct_alloc()'s error handling path, we pass 'size' to 
> dma_set_encrypted():
>   out_encrypt_pages:
> if (dma_set_encrypted(dev, page_address(page), size))
> 
> However, in dma_direct_free(), we pass ' 1 << page_order ' to 
> dma_set_encrypted().
> I think the ' 1 << page_order' is incorrect and it should be 'size' as well?

Indeed.  I've applied the patch now.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/exynos: Make driver independent of the system page size

2022-06-23 Thread Sam Protsenko
Hi Marek,

On Thu, 23 Jun 2022 at 12:36, Marek Szyprowski  wrote:
>
> PAGE_{SIZE,SHIFT} macros depend on the configured kernel's page size, but
> the driver expects values calculated as for 4KB pages. Fix this.
>
> Reported-by: Robin Murphy 
> Signed-off-by: Marek Szyprowski 
> ---

Reviewed-by: Sam Protsenko 

with one note: SPAGE_SIZE and SPAGE_ORDER could be used instead of
SZ_4K. But that's just a matter of taste, I'm ok with that as is,
hence R-b tag.

Thanks!

> Untested, because Exynos based boards I have doesn't boot with non-4KB
> page size for other reasons.
> ---
>  drivers/iommu/exynos-iommu.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index 71f2018e23fe..9c060505a46e 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -340,8 +340,7 @@ static void __sysmmu_set_ptbase(struct sysmmu_drvdata 
> *data, phys_addr_t pgd)
> if (MMU_MAJ_VER(data->version) < 5)
> writel(pgd, data->sfrbase + REG_PT_BASE_ADDR);
> else
> -   writel(pgd >> PAGE_SHIFT,
> -data->sfrbase + REG_V5_PT_BASE_PFN);
> +   writel(pgd / SZ_4K, data->sfrbase + REG_V5_PT_BASE_PFN);
>
> __sysmmu_tlb_invalidate(data);
>  }
> @@ -551,7 +550,7 @@ static void sysmmu_tlb_invalidate_entry(struct 
> sysmmu_drvdata *data,
>  * 64KB page can be one of 16 consecutive sets.
>  */
> if (MMU_MAJ_VER(data->version) == 2)
> -   num_inv = min_t(unsigned int, size / PAGE_SIZE, 64);
> +   num_inv = min_t(unsigned int, size / SZ_4K, 64);
>
> if (sysmmu_block(data)) {
> __sysmmu_tlb_invalidate_entry(data, iova, num_inv);
> --
> 2.17.1
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Robin Murphy

On 2022-06-22 23:17, Alex Williamson wrote:

On Wed, 22 Jun 2022 13:04:11 +0100
Robin Murphy  wrote:


Since IOMMU groups are mandatory for drivers to support, it stands to
reason that any device which has been successfully be added to a group


s/be //


Oops.


must be on a bus supported by that IOMMU driver, and therefore a domain
viable for any device in the group must be viable for all devices in
the group. This already has to be the case for the IOMMU API's internal
default domain, for instance. Thus even if the group contains devices on
different buses, that can only mean that the IOMMU driver actually
supports such an odd topology, and so without loss of generality we can
expect the bus type of any device in a group to be suitable for IOMMU
API calls.

Replace vfio_bus_type() with a simple call to resolve an appropriate
member device from which to then derive a bus type. This is also a step
towards removing the vague bus-based interfaces from the IOMMU API, when
we can subsequently switch to using this device directly.

Furthermore, scrutiny reveals a lack of protection for the bus being
removed while vfio_iommu_type1_attach_group() is using it; the reference
that VFIO holds on the iommu_group ensures that data remains valid, but
does not prevent the group's membership changing underfoot. Holding the
vfio_device for as long as we need here also neatly solves this.

Signed-off-by: Robin Murphy 
---

After sleeping on it, I decided to type up the helper function approach
to see how it looked in practice, and in doing so realised that with one
more tweak it could also subsume the locking out of the common paths as
well, so end up being a self-contained way for type1 to take care of its
own concern, which I rather like.

  drivers/vfio/vfio.c | 18 +-
  drivers/vfio/vfio.h |  3 +++
  drivers/vfio/vfio_iommu_type1.c | 30 +++---
  3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 61e71c1154be..73bab04880d0 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -448,7 +448,7 @@ static void vfio_group_get(struct vfio_group *group)
   * Device objects - create, release, get, put, search
   */
  /* Device reference always implies a group reference */
-static void vfio_device_put(struct vfio_device *device)
+void vfio_device_put(struct vfio_device *device)
  {
if (refcount_dec_and_test(>refcount))
complete(>comp);
@@ -475,6 +475,22 @@ static struct vfio_device *vfio_group_get_device(struct 
vfio_group *group,
return NULL;
  }
  
+struct vfio_device *vfio_device_get_from_iommu(struct iommu_group *iommu_group)

+{
+   struct vfio_group *group = vfio_group_get_from_iommu(iommu_group);
+   struct vfio_device *device;


Check group for NULL.


OK - FWIW in context this should only ever make sense to call with an 
iommu_group which has already been derived from a vfio_group, and I did 
initially consider a check with a WARN_ON(), but then decided that the 
unguarded dereference would be a sufficiently strong message. No problem 
with bringing that back to make it more defensive if that's what you prefer.



+
+   mutex_lock(>device_lock);
+   list_for_each_entry(device, >device_list, group_next) {
+   if (vfio_device_try_get(device)) {
+   mutex_unlock(>device_lock);
+   return device;
+   }
+   }
+   mutex_unlock(>device_lock);
+   return NULL;


No vfio_group_put() on either path.


Oops indeed.


+}
+
  /*
   * VFIO driver API
   */
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index a67130221151..e8f21e64541b 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -70,3 +70,6 @@ struct vfio_iommu_driver_ops {
  
  int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);

  void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops);
+
+struct vfio_device *vfio_device_get_from_iommu(struct iommu_group 
*iommu_group);
+void vfio_device_put(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e357..e38b8bfde677 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1679,18 +1679,6 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
return ret;
  }
  
-static int vfio_bus_type(struct device *dev, void *data)

-{
-   struct bus_type **bus = data;
-
-   if (*bus && *bus != dev->bus)
-   return -EINVAL;
-
-   *bus = dev->bus;
-
-   return 0;
-}
-
  static int vfio_iommu_replay(struct vfio_iommu *iommu,
 struct vfio_domain *domain)
  {
@@ -2159,7 +2147,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
struct vfio_iommu *iommu = iommu_data;
struct vfio_iommu_group *group;
struct vfio_domain *domain, *d;
-   struct 

Re: [PATCH] iommu/exynos: Make driver independent of the system page size

2022-06-23 Thread Krzysztof Kozlowski
On 23/06/2022 11:36, Marek Szyprowski wrote:
> PAGE_{SIZE,SHIFT} macros depend on the configured kernel's page size, but
> the driver expects values calculated as for 4KB pages. Fix this.
> 
> Reported-by: Robin Murphy 
> Signed-off-by: Marek Szyprowski 
> ---
> Untested, because Exynos based boards I have doesn't boot with non-4KB
> page size for other reasons.
> ---
>  drivers/iommu/exynos-iommu.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)


Acked-by: Krzysztof Kozlowski 


Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-06-23 Thread Alexander Stein
Hi,

Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> Hi,
> 
> * Saravana Kannan  [700101 02:00]:
> > Now that fw_devlink=on by default and fw_devlink supports
> > "power-domains" property, the execution will never get to the point
> > where driver_deferred_probe_check_state() is called before the supplier
> > has probed successfully or before deferred probe timeout has expired.
> > 
> > So, delete the call and replace it with -ENODEV.
> 
> Looks like this causes omaps to not boot in Linux next. With this
> simple-pm-bus fails to probe initially as the power-domain is not
> yet available. On platform_probe() genpd_get_from_provider() returns
> -ENOENT.
> 
> Seems like other stuff is potentially broken too, any ideas on
> how to fix this?

I think I'm hit by this as well, although I do not get a lockup.
In my case I'm using arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts 
and probing of 3832.blk-ctrl fails as the power-domain is not (yet) 
registed. See the (filtered) dmesg output:

> [0.744245] PM: Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@0 [0.744756] PM:
> Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@2 [0.745012] PM:
> Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@3 [0.745268] PM:
> Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@4 [0.746121] PM:
> Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@7 [0.746400] PM:
> Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@8 [0.746665] PM:
> Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@9 [0.746927] PM:
> Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@a [0.748870]
> imx8m-blk-ctrl 3832.blk-ctrl: error -ENODEV: failed to attach bus power
> domain [1.265279] PM: Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@5 [1.265861] PM:
> Added domain provider from
> /soc@0/bus@3000/gpc@303a/pgc/power-domain@6

blk-ctrl@3832 requires the power-domain 'pgc_vpu', which is power-domain@6 
in pgc.

Best regards,
Alexander

> > Signed-off-by: Saravana Kannan 
> > ---
> > 
> >  drivers/base/power/domain.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 739e52cd4aba..3e86772d5fac 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev,
> > struct device *base_dev,> 
> > mutex_unlock(_list_lock);
> > dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
> > 
> > __func__, PTR_ERR(pd));
> > 
> > -   return driver_deferred_probe_check_state(base_dev);
> > +   return -ENODEV;
> > 
> > }
> > 
> > dev_dbg(dev, "adding to PM domain %s\n", pd->name);




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/dma: Add config for PCI SAC address trick

2022-06-23 Thread Robin Murphy

On 2022-06-23 12:33, Joerg Roedel wrote:

On Wed, Jun 22, 2022 at 02:12:39PM +0100, Robin Murphy wrote:

Thanks for your bravery!


It already starts, with that patch I am getting:

xhci_hcd :02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x000f address=0xff00ffefe000 flags=0x]

In my kernel log. The device is an AMD XHCI controller and seems to
funciton normally after boot. The message disappears with
iommu.forcedac=0.

Need to look more into that...


Given how amd_iommu_domain_alloc() sets the domain aperture, presumably 
the DMA address allocated was 0xffefe000? Odd that it gets bits 
punched out in the middle rather than simply truncated off the top as I 
would have expected :/


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/dma: Add config for PCI SAC address trick

2022-06-23 Thread Joerg Roedel
On Wed, Jun 22, 2022 at 02:12:39PM +0100, Robin Murphy wrote:
> Thanks for your bravery!

It already starts, with that patch I am getting:

xhci_hcd :02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x000f address=0xff00ffefe000 flags=0x]

In my kernel log. The device is an AMD XHCI controller and seems to
funciton normally after boot. The message disappears with
iommu.forcedac=0.

Need to look more into that...

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 3/7] iommu/amd: Fix sparse warning

2022-06-23 Thread Joerg Roedel
On Thu, Jun 23, 2022 at 10:42:52AM +0100, Robin Murphy wrote:
> TBH it's probably time to retire iommu_ops->pgsize_bitmap anyway. At the
> very least it would be logical to move it to iommu_domain_ops now, but maybe
> we could skip ahead and just rely on drivers initialising
> domain->pgsize_bitmap directly in their .domain_alloc?
> 
> (and FWIW I'm leaning towards the same for the domain->ops as well; the more
> I look at ops->default_domain_ops, the more it starts looking like a stupid
> mess... my stupid mess... sorry about that)

Good idea, that would be even better.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/5] iommu/mediatek: Add error path for loop of mm_dts_parse

2022-06-23 Thread Robin Murphy

On 2022-06-23 02:54, Yong Wu wrote:

On Thu, 2022-06-16 at 11:31 +0100, Robin Murphy wrote:

On 2022-06-16 11:08, Yong Wu wrote:

On Thu, 2022-06-16 at 09:59 +0100, Robin Murphy wrote:

On 2022-06-16 06:42, Yong Wu wrote:

The mtk_iommu_mm_dts_parse will parse the smi larbs nodes. if
the
i+1
larb is parsed fail(return -EINVAL), we should of_node_put for
the
0..i
larbs. In the fail path, one of_node_put matches with
of_parse_phandle in
it.

Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow
with
the MM TYPE")
Signed-off-by: Yong Wu 
---
drivers/iommu/mtk_iommu.c | 21 -
1 file changed, 16 insertions(+), 5 deletions(-)


[snip..]


+err_larbnode_put:
+   while (i--) {
+   larbnode = of_parse_phandle(dev->of_node,
"mediatek,larbs", i);
+   if (larbnode &&
of_device_is_available(larbnode)) {
+   of_node_put(larbnode);
+   of_node_put(larbnode);
+   }


This looks a bit awkward - could we not just iterate through
data->larb_imu and put dev->of_node for each valid dev?


It should work. Thanks very much.



Also, of_find_device_by_node() takes a reference on the struct
device
itself, so strictly we should be doing put_device() on those as
well
if we're bailing out.


Thanks for this hint. A new reference for me. I will add it.


In fact, thinking about it some more we may as well do the
of_node_put()
unconditionally immediately after the of_find_device_by_node() call,


of_node_put is called in component_release_of in the normal case, thus
we shouldn't call of_node_put unconditionally. Right?


As it stands, yes. However I'm also figuring that we could just use 
component_match_add() there, and probably also switch to 
component_compare_dev as the the comparison, since we've already 
resolved the larb device, and it is the device itself that we're 
interested in here, rather than just its of_node.


I *think* this idea could end up with simpler code overall, but as 
always, feel free to ignore the suggestion if you think it wouldn't make 
enough difference to be worth the bother.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread sascha hauer
On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> enabled iommus and dmas dependency enforcement by default. On some
> systems, this caused the console device's probe to get delayed until the
> deferred_probe_timeout expires.
> 
> We need consoles to work as soon as possible, so mark the console device
> node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> the probe of the console device for suppliers without drivers. The
> driver can then make the decision on where it can probe without those
> suppliers or defer its probe.
> 
> Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> Reported-by: Sascha Hauer 
> Reported-by: Peng Fan 
> Signed-off-by: Saravana Kannan 
> Tested-by: Peng Fan 
> ---
>  drivers/of/base.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index d4f98c8469ed..a19cd0c73644 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
> align))
>   of_property_read_string(of_aliases, "stdout", );
>   if (name)
>   of_stdout = of_find_node_opts_by_path(name, 
> _stdout_options);
> + if (of_stdout)
> + of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;

The device given in the stdout-path property doesn't necessarily have to
be consistent with the console= parameter. The former is usually
statically set in the device trees contained in the kernel while the
latter is dynamically set by the bootloader. So if you change the
console uart in the bootloader then you'll still run into this trap.

It's problematic to consult only the device tree for dependencies. I
found several examples of drivers in the tree for which dma support
is optional. They use it if they can, but continue without it when
not available. "hwlock" is another property which consider several
drivers as optional. Also consider SoCs in early upstreaming phases
when the device tree is merged with "dmas" or "hwlock" properties,
but the corresponding drivers are not yet upstreamed. It's not nice
to defer probing of all these devices for a long time.

I wonder if it wouldn't be a better approach to just probe all devices
and record the device(node) they are waiting on. Then you know that you
don't need to probe them again until the device they are waiting for
is available.

Sascha


-- 
Pengutronix e.K.   | |
Steuerwalder Str. 21   | http://www.pengutronix.de/  |
31137 Hildesheim, Germany  | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 3/7] iommu/amd: Fix sparse warning

2022-06-23 Thread Robin Murphy

On 2022-06-23 09:03, Joerg Roedel wrote:

On Fri, Jun 03, 2022 at 04:51:03PM +0530, Vasant Hegde wrote:

Fix below sparse warning:
   CHECK   drivers/iommu/amd/iommu.c
   drivers/iommu/amd/iommu.c:73:24: warning: symbol 'amd_iommu_ops' was not 
declared. Should it be static?

Also we are going to introduce v2 page table which has different
pgsize_bitmaps. Hence remove 'const' qualifier.


I am not a fan of removing the consts. Please use separate ops
structures for v2 page-tables and make then const as well. This probably
also has some optimization potential in the future when we can make the
ops call-back functions page-table specific.


TBH it's probably time to retire iommu_ops->pgsize_bitmap anyway. At the 
very least it would be logical to move it to iommu_domain_ops now, but 
maybe we could skip ahead and just rely on drivers initialising 
domain->pgsize_bitmap directly in their .domain_alloc?


(and FWIW I'm leaning towards the same for the domain->ops as well; the 
more I look at ops->default_domain_ops, the more it starts looking like 
a stupid mess... my stupid mess... sorry about that)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/exynos: Make driver independent of the system page size

2022-06-23 Thread Marek Szyprowski
PAGE_{SIZE,SHIFT} macros depend on the configured kernel's page size, but
the driver expects values calculated as for 4KB pages. Fix this.

Reported-by: Robin Murphy 
Signed-off-by: Marek Szyprowski 
---
Untested, because Exynos based boards I have doesn't boot with non-4KB
page size for other reasons.
---
 drivers/iommu/exynos-iommu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 71f2018e23fe..9c060505a46e 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -340,8 +340,7 @@ static void __sysmmu_set_ptbase(struct sysmmu_drvdata 
*data, phys_addr_t pgd)
if (MMU_MAJ_VER(data->version) < 5)
writel(pgd, data->sfrbase + REG_PT_BASE_ADDR);
else
-   writel(pgd >> PAGE_SHIFT,
-data->sfrbase + REG_V5_PT_BASE_PFN);
+   writel(pgd / SZ_4K, data->sfrbase + REG_V5_PT_BASE_PFN);
 
__sysmmu_tlb_invalidate(data);
 }
@@ -551,7 +550,7 @@ static void sysmmu_tlb_invalidate_entry(struct 
sysmmu_drvdata *data,
 * 64KB page can be one of 16 consecutive sets.
 */
if (MMU_MAJ_VER(data->version) == 2)
-   num_inv = min_t(unsigned int, size / PAGE_SIZE, 64);
+   num_inv = min_t(unsigned int, size / SZ_4K, 64);
 
if (sysmmu_block(data)) {
__sysmmu_tlb_invalidate_entry(data, iova, num_inv);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: use the correct size for dma_set_encrypted()

2022-06-23 Thread Robin Murphy

On 2022-06-23 08:00, Dexuan Cui wrote:

From: Christoph Hellwig 
Sent: Wednesday, June 22, 2022 10:44 PM
To: Dexuan Cui 
  ...
On Wed, Jun 22, 2022 at 12:14:24PM -0700, Dexuan Cui wrote:

The third parameter of dma_set_encrypted() is a size in bytes rather than
the number of pages.

Fixes: 4d0564785bb0 ("dma-direct: factor out dma_set_{de,en}crypted

helpers")

Signed-off-by: Dexuan Cui 


see:

commit 4a37f3dd9a83186cb88d44808ab35b78375082c9 (tag:
dma-mapping-5.19-2022-05-25)
Author: Robin Murphy 
Date:   Fri May 20 18:10:13 2022 +0100

 dma-direct: don't over-decrypt memory


It looks like commit 4a37f3dd9a831 fixed a different issue?

Here my patch is for the latest mainline:

In dma_direct_alloc()'s error handling path, we pass 'size' to 
dma_set_encrypted():
   out_encrypt_pages:
  if (dma_set_encrypted(dev, page_address(page), size))

However, in dma_direct_free(), we pass ' 1 << page_order ' to 
dma_set_encrypted().
I think the ' 1 << page_order' is incorrect and it should be 'size' as well?


I think technically you're both right - these instances clearly have a 
history tracing back to the original bug that my patch addressed, but 
the refactoring then made them into their own distinct bug in terms of 
the internal dma_set_encrypted() interface, per the commit message here. 
Apparently I failed to spot this when forward-porting 4a37f3dd9a831 from 
5.10 (as the commit message says, don't ask... ;) ) - I guess I was only 
looking at where the set_memory_*() callsites had moved to. For this patch,


Reviewed-by: Robin Murphy 

Thanks
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Thursday, June 23, 2022 6:17 AM
> 
> >
> > ret = -EIO;
> > -   domain->domain = iommu_domain_alloc(bus);
> > +   domain->domain = iommu_domain_alloc(iommu_api_dev->dev-
> >bus);
> 
> It makes sense to move away from a bus centric interface to iommu ops
> and I can see that having a device interface when we have device level
> address-ability within a group makes sense, but does it make sense to
> only have that device level interface?  For example, if an iommu_group
> is going to remain an aspect of the iommu subsystem, shouldn't we be
> able to allocate a domain and test capabilities based on the group and
> the iommu driver should have enough embedded information reachable
> from
> the struct iommu_group to do those things?  This "perform group level
> operations based on an arbitrary device in the group" is pretty klunky.
> Thanks,
> 

This sounds a right thing to do.

btw another alternative which I'm thinking of is whether vfio_group
can record the bus info when the first device is added to it in
__vfio_register_dev(). Then we don't need a group interface from
iommu to test if vfio is the only user having such requirement.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-06-23 Thread John Garry via iommu

On 14/06/2022 14:12, John Garry wrote:

On 06/06/2022 10:30, John Garry wrote:

Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.

This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may be quite slow.



Can I please get some sort of ack from the IOMMU people on this one?



Another request for an ack please.

Thanks,
john




Signed-off-by: John Garry 
Reviewed-by: Damien Le Moal 
---
  drivers/iommu/dma-iommu.c | 6 ++
  drivers/iommu/iova.c  | 5 +
  include/linux/iova.h  | 2 ++
  3 files changed, 13 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f90251572a5d..9e1586447ee8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1459,6 +1459,11 @@ static unsigned long 
iommu_dma_get_merge_boundary(struct device *dev)

  return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
  }
+static size_t iommu_dma_opt_mapping_size(void)
+{
+    return iova_rcache_range();
+}
+
  static const struct dma_map_ops iommu_dma_ops = {
  .alloc    = iommu_dma_alloc,
  .free    = iommu_dma_free,
@@ -1479,6 +1484,7 @@ static const struct dma_map_ops iommu_dma_ops = {
  .map_resource    = iommu_dma_map_resource,
  .unmap_resource    = iommu_dma_unmap_resource,
  .get_merge_boundary    = iommu_dma_get_merge_boundary,
+    .opt_mapping_size    = iommu_dma_opt_mapping_size,
  };
  /*
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..9f00b58d546e 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct 
iova_domain *iovad,
  static void free_cpu_cached_iovas(unsigned int cpu, struct 
iova_domain *iovad);

  static void free_iova_rcaches(struct iova_domain *iovad);
+unsigned long iova_rcache_range(void)
+{
+    return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
+}
+
  static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
  {
  struct iova_domain *iovad;
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 320a70e40233..c6ba6d95d79c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct 
iova_domain *iovad, dma_addr_t iova)

  int iova_cache_get(void);
  void iova_cache_put(void);
+unsigned long iova_rcache_range(void);
+
  void free_iova(struct iova_domain *iovad, unsigned long pfn);
  void __free_iova(struct iova_domain *iovad, struct iova *iova);
  struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-23 Thread John Garry via iommu

On 10/06/2022 16:37, John Garry via iommu wrote:



On 6/9/22 10:54, John Garry wrote:
ok, but do you have a system where the UFS host controller is behind 
an IOMMU? I had the impression that UFS controllers would be mostly 
found in embedded systems and IOMMUs are not as common on there.


Modern phones have an IOMMU. Below one can find an example from a 
Pixel 6 phone. The UFS storage controller is not controller by the 
IOMMU as far as I can see but I wouldn't be surprised if the security 
team would ask us one day to enable the IOMMU for the UFS controller.


OK, then unfortunately it seems that you have no method to test. I might 
be able to test USB MSC but I am not even sure if I can even get DMA 
mappings who length exceeds the IOVA rcache limit there.


I was able to do some testing on USB MSC for an XHCI controller. The 
result is that limiting the max HW sectors there does not affect 
performance in normal conditions.


However if I hack the USB driver and fiddle with request queue settings 
then it can:

- lift max_sectors limit in usb_stor_host_template 120KB -> 256KB
- lift request queue read_ahead_kb 128KB -> 256KB

In this scenario I can get 42.5MB/s read throughput, as opposed to 
39.5MB/s in normal conditions. Since .can_queue=1 for that host it would 
not fall foul of some issues I experience in IOVA allocator performance, 
so limiting max_sectors would not be required for that reason.


So this is an artificial test, but it may be worth considering only 
applying this DMA mapping optimal max_sectors limit to SAS controllers 
which I know can benefit.


Christoph, any opinion?

thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 7/7] iommu/amd: Do not support IOMMUv2 APIs when SNP is enabled

2022-06-23 Thread Joerg Roedel
On Wed, Jun 22, 2022 at 12:11:31PM -0500, Suravee Suthikulpanit wrote:
>  bool amd_iommu_v2_supported(void)
>  {
> - return amd_iommu_v2_present;
> + /*
> +  * Since DTE[Mode]=0 is prohibited on SNP-enabled system
> +  * (i.e. EFR[SNPSup]=1), IOMMUv2 page table cannot be used without
> +  * setting up IOMMUv1 page table.
> +  */
> + return amd_iommu_v2_present && !amd_iommu_snp_en;

IOMMU_v2 APIs could actually be supported with GIOV and IOMMUv2
page-tables in-use, no?

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/7] iommu/amd: Warn when found inconsistency EFR mask

2022-06-23 Thread Joerg Roedel
On Wed, Jun 22, 2022 at 12:11:25PM -0500, Suravee Suthikulpanit wrote:
>  #ifdef CONFIG_IRQ_REMAP
> +/*
> + * Iterate through all the IOMMUs to verify if the specified
> + * EFR bitmask of IOMMU feature are set.
> + * Warn and return false if found inconsistency.
> + */
>  static bool check_feature_on_all_iommus(u64 mask)
>  {
>   bool ret = false;
>   struct amd_iommu *iommu;
>  
>   for_each_iommu(iommu) {
> - ret = iommu_feature(iommu, mask);
> - if (!ret)
> + bool tmp = iommu_feature(iommu, mask);
> +
> + if ((ret != tmp) &&
> + !list_is_first(>list, _iommu_list)) {
> + pr_err(FW_BUG "Found inconsistent EFR mask (%#llx) on 
> iommu%d (%04x:%02x:%02x.%01x).\n",
> +mask, iommu->index, iommu->pci_seg->id, 
> PCI_BUS_NUM(iommu->devid),
> +PCI_SLOT(iommu->devid), PCI_FUNC(iommu->devid));
>   return false;
> + }
> + ret = tmp;

It is better to implement this by introducing a global feature mask,
which represents the minial set of features supported by any IOMMU in
the system.

The warning is then something like:

if ((global_feature_mask & iommu_features) != global_feature_mask)
pr_warn(...);

This also makes the global variable to track SNP support obsolete.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-06-23 Thread Saravana Kannan via iommu
On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren  wrote:
>
> * Saravana Kannan  [220622 19:05]:
> > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren  wrote:
> > >
> > > Hi,
> > >
> > > * Saravana Kannan  [220621 19:29]:
> > > > On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > * Saravana Kannan  [700101 02:00]:
> > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > "power-domains" property, the execution will never get to the point
> > > > > > where driver_deferred_probe_check_state() is called before the 
> > > > > > supplier
> > > > > > has probed successfully or before deferred probe timeout has 
> > > > > > expired.
> > > > > >
> > > > > > So, delete the call and replace it with -ENODEV.
> > > > >
> > > > > Looks like this causes omaps to not boot in Linux next.
> > > >
> > > > Can you please point me to an example DTS I could use for debugging
> > > > this? I'm assuming you are leaving fw_devlink=on and not turning it
> > > > off or putting it in permissive mode.
> > >
> > > Sure, this seems to happen at least with simple-pm-bus as the top
> > > level interconnect with a configured power-domains property:
> > >
> > > $ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 
> > > simple-pm-bus
> >
> > Thanks for the example. I generally start looking from dts (not dtsi)
> > files in case there are some DT property override/additions after the
> > dtsi files are included in the dts file. But I'll assume for now
> > that's not the case. If there's a specific dts file for a board I can
> > look from that'd be helpful to rule out those kinds of issues.
> >
> > For now, I looked at arch/arm/boot/dts/omap4.dtsi.
>
> OK it should be very similar for all the affected SoCs.
>
> > > This issue is no directly related fw_devlink. It is a side effect of
> > > removing driver_deferred_probe_check_state(). We no longer return
> > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> >
> > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > was deleted because fw_devlink=on should have short circuited the
> > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > probe function and hitting this -ENOENT failure. That's why I was
> > asking the other questions.
>
> OK. So where is the -EPROBE_DEFER supposed to happen without
> driver_deferred_probe_check_state() then?

device_links_check_suppliers() call inside really_probe() would short
circuit and return an -EPROBE_DEFER if the device links are created as
expected.

>
> > > > > On platform_probe() genpd_get_from_provider() returns
> > > > > -ENOENT.
> > > >
> > > > This error is with the series I assume?
> > >
> > > On the first probe genpd_get_from_provider() will return -ENOENT in
> > > both cases. The list is empty on the first probe and there are no
> > > genpd providers at this point.
> > >
> > > Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
> > > ends up getting changed to -EPROBE_DEFER at the end of
> > > driver_deferred_probe_check_state(), we are now missing that.
> >
> > Right, I was aware -ENOENT would be returned if we got this far. But
> > the point of this series is that you shouldn't have gotten that far
> > before your pm domain device is ready. Hence my questions from the
> > earlier reply.
>
> OK
>
> > Can I get answers to rest of my questions in the first reply please?
> > That should help us figure out why fw_devlink let us get this far.
> > Summarize them here to make it easy:
> > * Are you running with fw_devlink=on?
>
> Yes with the default with no specific kernel params so looks like
> FW_DEVLINK_FLAGS_ON.
>
> > * Is the"ti,omap4-prm-inst"/"ti,omap-prm-inst" built-in in this case?
>
> Yes
>
> > * If it's not built-in, can you please try deferred_probe_timeout=0
> > and deferred_probe_timeout=30 and see if either one of them help?
>
> It's built in so I did not try these.
>
> > * Can I get the output of "ls -d supplier:*" and "cat
> > supplier:*/status" output from the sysfs dir for the ocp device
> > without this series where it boots properly.
>
> Hmm so I'm not seeing any supplier for the top level ocp device in
> the booting case without your patches. I see the suppliers for the
> ocp child device instances only.

Hmmm... this is strange (that the device link isn't there), but this
is what I suspected.

Now we need to figure out why it's missing. There are only a few
things that could cause this and I don't see any of those. I already
checked to make sure the power domain in this instance had a proper
driver with a probe() function -- if it didn't, then that's one thing
that'd could have caused the missing device link. The device does seem
to have a proper driver, so looks like I can rule that out.

Can you point me to the dts file that corresponds to the specific
board you are testing this one? I probably won't find anything, but I
want to rule out some of the possibilities.

All the 

Re: [PATCH v1 0/7] iommu/amd: Add Generic IO Page Table Framework Support for v2 Page Table

2022-06-23 Thread Joerg Roedel
On Fri, Jun 03, 2022 at 04:51:00PM +0530, Vasant Hegde wrote:
> - Part 1 (patch 1-4 and 6)
>   Refactor the current IOMMU page table code to adopt the generic IO page
>   table framework, and add AMD IOMMU Guest (v2) page table management code.
> 
> - Part 2 (patch 5)
>   Add support for the AMD IOMMU Guest IO Protection feature (GIOV)
>   where requests from the I/O device without a PASID are treated as
>   if they have PASID of 0.
> 
> - Part 3 (patch 7)
>   Introduce new "amd_iommu_pgtable" command-line to allow users
>   to select the mode of operation (v1 or v2).

Something I didn't get entirely from the review is support level of the
amd_iommu_v2 driver. I think it will continue to work and the
requirement that the device identity maps DMA requests without PASID is
removed, right?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 7/7] iommu/amd: Introduce amd_iommu_pgtable command-line option

2022-06-23 Thread Joerg Roedel
On Fri, Jun 03, 2022 at 04:51:07PM +0530, Vasant Hegde wrote:
> + amd_iommu_pgtable= [HW,X86-64]
> + Specifies one of the following AMD IOMMU page table to
> + be used for DMA remapping for DMA-API:
> + v1 - Use v1 page table (Default)
> + v2 - Use v2 page table
> +

Can we handle this somehow in the amd_iommu= option? Something like
amd_iommu=pgtbl_v2|pgtbl_v1?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 4/7] iommu/amd: Initial support for AMD IOMMU v2 page table

2022-06-23 Thread Joerg Roedel
On Fri, Jun 03, 2022 at 04:51:04PM +0530, Vasant Hegde wrote:
> +/* Allocate page table */
> +static u64 *v2_alloc_pte(u64 *pgd, unsigned long iova,
> +  unsigned long pg_size, bool *updated)
> +{
> + u64 *pte, *page;
> + int level, end_level;
> +
> + BUG_ON(!is_power_of_2(pg_size));
> +
> + level = get_pgtable_level() - 1;
> + end_level = page_size_to_level(pg_size);
> + pte = [PM_LEVEL_INDEX(level, iova)];
> + iova = PAGE_SIZE_ALIGN(iova, PAGE_SIZE);
> +
> + while (level >= end_level) {
> + u64 __pte, __npte;
> +
> + __pte = *pte;
> +
> + if (IOMMU_PTE_PRESENT(__pte) && is_large_pte(__pte)) {
> + /* Unmap large pte */
> + cmpxchg64(pte, *pte, 0ULL);
> + *updated = true;
> + continue;
> + }
> +
> + if (!IOMMU_PTE_PRESENT(__pte)) {
> + page = alloc_pgtable_page();
> + if (!page)
> + return NULL;
> +
> + __npte = set_pgtable_attr(page);
> + /* pte could have been changed somewhere. */
> + if (cmpxchg64(pte, __pte, __npte) != __pte)
> + free_pgtable_page(page);
> + else if (IOMMU_PTE_PRESENT(__pte))
> + *updated = true;
> +
> + continue;
> + }
> +
> + level -= 1;
> + pte = get_pgtable_pte(__pte);
> + pte = [PM_LEVEL_INDEX(level, iova)];
> + }

I know that the V1 page-table code also uses loops for the allocation
path, but the main reason there is the variable amount of page-table
levels. The v2 page-tables have a fixed amount levels, so it is better
to unroll this loop here (and other loops iterating over page-table
levels). This makes the code more clear.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-23 Thread Saravana Kannan via iommu
Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
enabled iommus and dmas dependency enforcement by default. On some
systems, this caused the console device's probe to get delayed until the
deferred_probe_timeout expires.

We need consoles to work as soon as possible, so mark the console device
node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
the probe of the console device for suppliers without drivers. The
driver can then make the decision on where it can probe without those
suppliers or defer its probe.

Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
Reported-by: Sascha Hauer 
Reported-by: Peng Fan 
Signed-off-by: Saravana Kannan 
Tested-by: Peng Fan 
---
 drivers/of/base.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index d4f98c8469ed..a19cd0c73644 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
align))
of_property_read_string(of_aliases, "stdout", );
if (name)
of_stdout = of_find_node_opts_by_path(name, 
_stdout_options);
+   if (of_stdout)
+   of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
}
 
if (!of_aliases)
-- 
2.37.0.rc0.161.g10f37bed90-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/2] driver core: fw_devlink: Allow firmware to mark devices as best effort

2022-06-23 Thread Saravana Kannan via iommu
When firmware sets the FWNODE_FLAG_BEST_EFFORT flag for a fwnode,
fw_devlink will do a best effort ordering for that device where it'll
only enforce the probe/suspend/resume ordering of that device with
suppliers that have drivers. The driver of that device can then decide
if it wants to defer probe or probe without the suppliers.

This will be useful for avoid probe delays of the console device that
were caused by commit 71066545b48e ("driver core: Set
fw_devlink.strict=1 by default").

Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
Reported-by: Sascha Hauer 
Reported-by: Peng Fan 
Signed-off-by: Saravana Kannan 
Tested-by: Peng Fan 
---
 drivers/base/core.c| 3 ++-
 include/linux/fwnode.h | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 839f64485a55..ccdd5b4295de 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -968,7 +968,8 @@ static void device_links_missing_supplier(struct device 
*dev)
 
 static bool dev_is_best_effort(struct device *dev)
 {
-   return fw_devlink_best_effort && dev->can_match;
+   return (fw_devlink_best_effort && dev->can_match) ||
+   (dev->fwnode && (dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT));
 }
 
 /**
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 9a81c4410b9f..89b9bdfca925 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -27,11 +27,15 @@ struct device;
  *  driver needs its child devices to be bound with
  *  their respective drivers as soon as they are
  *  added.
+ * BEST_EFFORT: The fwnode/device needs to probe early and might be missing 
some
+ * suppliers. Only enforce ordering with suppliers that have
+ * drivers.
  */
 #define FWNODE_FLAG_LINKS_ADDEDBIT(0)
 #define FWNODE_FLAG_NOT_DEVICE BIT(1)
 #define FWNODE_FLAG_INITIALIZEDBIT(2)
 #define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD   BIT(3)
+#define FWNODE_FLAG_BEST_EFFORTBIT(4)
 
 struct fwnode_handle {
struct fwnode_handle *secondary;
-- 
2.37.0.rc0.161.g10f37bed90-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/2] Fix console probe delay due to fw_devlink

2022-06-23 Thread Saravana Kannan via iommu
fw_devlink.strict=1 has been enabled by default. This was delaying the
probe of console devices. This series fixes that.

v1->v2:
- Added missing NULL check
- Added Tested-by tags

-Saravana

cc: sascha hauer 
cc: peng fan 
cc: kevin hilman 
cc: ulf hansson 
cc: len brown 
cc: pavel machek 
cc: joerg roedel 
cc: will deacon 
cc: andrew lunn 
cc: heiner kallweit 
cc: russell king 
cc: "david s. miller" 
cc: eric dumazet 
cc: jakub kicinski 
cc: paolo abeni 
cc: linus walleij 
cc: hideaki yoshifuji 
cc: david ahern 
cc: kernel-t...@android.com
cc: linux-ker...@vger.kernel.org
cc: linux...@vger.kernel.org
cc: iommu@lists.linux-foundation.org
cc: net...@vger.kernel.org
cc: linux-g...@vger.kernel.org
Cc: ker...@pengutronix.de

Saravana Kannan (2):
  driver core: fw_devlink: Allow firmware to mark devices as best effort
  of: base: Avoid console probe delay when fw_devlink.strict=1

 drivers/base/core.c| 3 ++-
 drivers/of/base.c  | 2 ++
 include/linux/fwnode.h | 4 
 3 files changed, 8 insertions(+), 1 deletion(-)

-- 
2.37.0.rc0.161.g10f37bed90-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 1/2] driver core: fw_devlink: Allow firmware to mark devices as best effort

2022-06-23 Thread Saravana Kannan via iommu
On Wed, Jun 22, 2022 at 11:50 PM Sascha Hauer  wrote:
>
> On Wed, Jun 22, 2022 at 02:59:10PM -0700, Saravana Kannan wrote:
> > When firmware sets the FWNODE_FLAG_BEST_EFFORT flag for a fwnode,
> > fw_devlink will do a best effort ordering for that device where it'll
> > only enforce the probe/suspend/resume ordering of that device with
> > suppliers that have drivers. The driver of that device can then decide
> > if it wants to defer probe or probe without the suppliers.
> >
> > This will be useful for avoid probe delays of the console device that
> > were caused by commit 71066545b48e ("driver core: Set
> > fw_devlink.strict=1 by default").
> >
> > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > Reported-by: Sascha Hauer 
> > Reported-by: Peng Fan 
> > Signed-off-by: Saravana Kannan 
> > ---
> >  drivers/base/core.c| 3 ++-
> >  include/linux/fwnode.h | 4 
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 839f64485a55..61edd18b7bf3 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -968,7 +968,8 @@ static void device_links_missing_supplier(struct device 
> > *dev)
> >
> >  static bool dev_is_best_effort(struct device *dev)
> >  {
> > - return fw_devlink_best_effort && dev->can_match;
> > + return (fw_devlink_best_effort && dev->can_match) ||
> > + dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT;
>
> Check for dev->fwnode first. I am running in a NULL pointer exception
> here for a device that doesn't have a fwnode.

Oops. Fixed and sent out a v2.

-Saravana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 3/7] iommu/amd: Fix sparse warning

2022-06-23 Thread Joerg Roedel
On Fri, Jun 03, 2022 at 04:51:03PM +0530, Vasant Hegde wrote:
> Fix below sparse warning:
>   CHECK   drivers/iommu/amd/iommu.c
>   drivers/iommu/amd/iommu.c:73:24: warning: symbol 'amd_iommu_ops' was not 
> declared. Should it be static?
> 
> Also we are going to introduce v2 page table which has different
> pgsize_bitmaps. Hence remove 'const' qualifier.

I am not a fan of removing the consts. Please use separate ops
structures for v2 page-tables and make then const as well. This probably
also has some optimization potential in the future when we can make the
ops call-back functions page-table specific.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/35] iommu/amd: Add multiple PCI segments support

2022-06-23 Thread Joerg Roedel
Hi Vasant,

On Wed, May 11, 2022 at 12:51:06PM +0530, Vasant Hegde wrote:
>  .../admin-guide/kernel-parameters.txt |  34 +-
>  drivers/iommu/amd/amd_iommu.h |  13 +-
>  drivers/iommu/amd/amd_iommu_types.h   | 133 +++-
>  drivers/iommu/amd/init.c  | 687 +++---
>  drivers/iommu/amd/iommu.c | 563 --
>  drivers/iommu/amd/iommu_v2.c  |  67 +-
>  drivers/iommu/amd/quirks.c|   4 +-
>  7 files changed, 904 insertions(+), 597 deletions(-)

So this is applied now to the IOMMU tree, thanks for the work. Something
that bothered me while looking at this was the almost complete lack of
locking while accessing the global data structures. Some of them are
lock-less, so it is partially fine, and most of them are used read-only
during system runtime. But I would appreciate if you and/or Suravee
could look over that again and check again if there needs to be more
locking.

The current situation will fire back at the point where you want to
implement IOMMU hotplug. Note that device hotplug is already possible
today, either with real devices or SR-IOV.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-23 Thread Nicolin Chen via iommu
On Thu, Jun 23, 2022 at 03:50:22AM +, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Robin Murphy 
> > Sent: Wednesday, June 22, 2022 3:55 PM
> >
> > On 2022-06-16 23:23, Nicolin Chen wrote:
> > > On Thu, Jun 16, 2022 at 06:40:14AM +, Tian, Kevin wrote:
> > >
> > >>> The domain->ops validation was added, as a precaution, for mixed-
> > driver
> > >>> systems. However, at this moment only one iommu driver is possible. So
> > >>> remove it.
> > >>
> > >> It's true on a physical platform. But I'm not sure whether a virtual
> > platform
> > >> is allowed to include multiple e.g. one virtio-iommu alongside a virtual 
> > >> VT-
> > d
> > >> or a virtual smmu. It might be clearer to claim that (as Robin pointed 
> > >> out)
> > >> there is plenty more significant problems than this to solve instead of
> > simply
> > >> saying that only one iommu driver is possible if we don't have explicit
> > code
> > >> to reject such configuration. 
> > >
> > > Will edit this part. Thanks!
> >
> > Oh, physical platforms with mixed IOMMUs definitely exist already. The
> > main point is that while bus_set_iommu still exists, the core code
> > effectively *does* prevent multiple drivers from registering - even in
> > emulated cases like the example above, virtio-iommu and VT-d would both
> > try to bus_set_iommu(_bus_type), and one of them will lose. The
> > aspect which might warrant clarification is that there's no combination
> > of supported drivers which claim non-overlapping buses *and* could
> > appear in the same system - even if you tried to contrive something by
> > emulating, say, VT-d (PCI) alongside rockchip-iommu (platform), you
> > could still only describe one or the other due to ACPI vs. Devicetree.
> >
> 
> This explanation is much clearer! thanks.

Thanks +1

I've also updated the commit log.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: Use try_cmpxchg64 in alloc_pte and free_clear_pte

2022-06-23 Thread Joerg Roedel
On Wed, May 25, 2022 at 04:54:16PM +0200, Uros Bizjak wrote:
> Use try_cmpxchg64 instead of cmpxchg64 (*ptr, old, new) != old in
> alloc_pte and free_clear_pte.  cmpxchg returns success in ZF flag, so this
> change saves a compare after cmpxchg (and related move instruction
> in front of cmpxchg). Also, remove racy explicit assignment to pteval
> when cmpxchg fails, this is what try_cmpxchg does implicitly from
> *pte in an atomic way.
> 
> Signed-off-by: Uros Bizjak 
> Cc: Joerg Roedel 
> Cc: Suravee Suthikulpanit 
> Cc: Will Deacon 
> ---
>  drivers/iommu/amd/io_pgtable.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/5] mtk_iommu: Specify phandles to infracfg and pericfg

2022-06-23 Thread Joerg Roedel
Hi Matthias,

On Wed, Jun 22, 2022 at 04:12:47PM +0200, Matthias Brugger wrote:
> I wanted to check if you took also 3 and 4, as these should go through my
> tree. Unfortunately you haven't pushed your tree (yet). In case you took the
> whole series, can you please drop the dts patches. I'll apply them now on my
> v5.19-next/dts64 branch.

Yes, I applied the whole series, will drop patches 3 and 4 now.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-06-23 Thread Tony Lindgren
* Saravana Kannan  [220622 19:05]:
> On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren  wrote:
> >
> > Hi,
> >
> > * Saravana Kannan  [220621 19:29]:
> > > On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren  wrote:
> > > >
> > > > Hi,
> > > >
> > > > * Saravana Kannan  [700101 02:00]:
> > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > "power-domains" property, the execution will never get to the point
> > > > > where driver_deferred_probe_check_state() is called before the 
> > > > > supplier
> > > > > has probed successfully or before deferred probe timeout has expired.
> > > > >
> > > > > So, delete the call and replace it with -ENODEV.
> > > >
> > > > Looks like this causes omaps to not boot in Linux next.
> > >
> > > Can you please point me to an example DTS I could use for debugging
> > > this? I'm assuming you are leaving fw_devlink=on and not turning it
> > > off or putting it in permissive mode.
> >
> > Sure, this seems to happen at least with simple-pm-bus as the top
> > level interconnect with a configured power-domains property:
> >
> > $ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 
> > simple-pm-bus
> 
> Thanks for the example. I generally start looking from dts (not dtsi)
> files in case there are some DT property override/additions after the
> dtsi files are included in the dts file. But I'll assume for now
> that's not the case. If there's a specific dts file for a board I can
> look from that'd be helpful to rule out those kinds of issues.
> 
> For now, I looked at arch/arm/boot/dts/omap4.dtsi.

OK it should be very similar for all the affected SoCs.

> > This issue is no directly related fw_devlink. It is a side effect of
> > removing driver_deferred_probe_check_state(). We no longer return
> > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> 
> Yes, I understand the issue. But driver_deferred_probe_check_state()
> was deleted because fw_devlink=on should have short circuited the
> probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> probe function and hitting this -ENOENT failure. That's why I was
> asking the other questions.

OK. So where is the -EPROBE_DEFER supposed to happen without
driver_deferred_probe_check_state() then?

> > > > On platform_probe() genpd_get_from_provider() returns
> > > > -ENOENT.
> > >
> > > This error is with the series I assume?
> >
> > On the first probe genpd_get_from_provider() will return -ENOENT in
> > both cases. The list is empty on the first probe and there are no
> > genpd providers at this point.
> >
> > Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
> > ends up getting changed to -EPROBE_DEFER at the end of
> > driver_deferred_probe_check_state(), we are now missing that.
> 
> Right, I was aware -ENOENT would be returned if we got this far. But
> the point of this series is that you shouldn't have gotten that far
> before your pm domain device is ready. Hence my questions from the
> earlier reply.

OK

> Can I get answers to rest of my questions in the first reply please?
> That should help us figure out why fw_devlink let us get this far.
> Summarize them here to make it easy:
> * Are you running with fw_devlink=on?

Yes with the default with no specific kernel params so looks like
FW_DEVLINK_FLAGS_ON.

> * Is the"ti,omap4-prm-inst"/"ti,omap-prm-inst" built-in in this case?

Yes

> * If it's not built-in, can you please try deferred_probe_timeout=0
> and deferred_probe_timeout=30 and see if either one of them help?

It's built in so I did not try these.

> * Can I get the output of "ls -d supplier:*" and "cat
> supplier:*/status" output from the sysfs dir for the ocp device
> without this series where it boots properly.

Hmm so I'm not seeing any supplier for the top level ocp device in
the booting case without your patches. I see the suppliers for the
ocp child device instances only.

Without your patches I see simple-pm-bus probe initially with
EPROBE_DEFER like I described earlier, and then simple-pm-bus probes
on the second try.

Regards,

Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/1] iommu/vt-d: Fix RID2PASID setup/teardown failure

2022-06-23 Thread Lu Baolu
The IOMMU driver shares the pasid table for PCI alias devices. When the
RID2PASID entry of the shared pasid table has been filled by the first
device, the subsequent device will encounter the "DMAR: Setup RID2PASID
failed" failure as the pasid entry has already been marked as present.
As the result, the IOMMU probing process will be aborted.

On the contrary, when any alias device is hot-removed from the system,
for example, by writing to /sys/bus/pci/devices/.../remove, the shared
RID2PASID will be cleared without any notifications to other devices.
As the result, any DMAs from those rest devices are blocked.

Sharing pasid table among PCI alias devices could save two memory pages
for devices underneath the PCIe-to-PCI bridges. Anyway, considering that
those devices are rare on modern platforms that support VT-d in scalable
mode and the saved memory is negligible, it's reasonable to remove this
part of immature code to make the driver feasible and stable.

Fixes: ef848b7e5a6a0 ("iommu/vt-d: Setup pasid entry for RID2PASID support")
Reported-by: Chenyi Qiang 
Reported-by: Ethan Zhao 
Cc: sta...@vger.kernel.org
Signed-off-by: Lu Baolu 
---
 include/linux/intel-iommu.h |  3 --
 drivers/iommu/intel/pasid.h |  1 -
 drivers/iommu/intel/iommu.c | 24 -
 drivers/iommu/intel/pasid.c | 69 ++---
 4 files changed, 3 insertions(+), 94 deletions(-)

Change log:
v3:
 - Ethan pointed out that there's also problem in the device release
   path. Let's remove this part of immature code for now.

v2:
 - Add domain validity check in RID2PASID entry setup.

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4f29139bbfc3..5fcf89faa31a 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -612,7 +612,6 @@ struct intel_iommu {
 struct device_domain_info {
struct list_head link;  /* link to domain siblings */
struct list_head global; /* link to global list */
-   struct list_head table; /* link to pasid table */
u32 segment;/* PCI segment number */
u8 bus; /* PCI bus number */
u8 devfn;   /* PCI devfn number */
@@ -729,8 +728,6 @@ extern int dmar_ir_support(void);
 void *alloc_pgtable_page(int node);
 void free_pgtable_page(void *vaddr);
 struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
-int for_each_device_domain(int (*fn)(struct device_domain_info *info,
-void *data), void *data);
 void iommu_flush_write_buffer(struct intel_iommu *iommu);
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 583ea67fc783..bf5b937848b4 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -74,7 +74,6 @@ struct pasid_table {
void*table; /* pasid table pointer */
int order;  /* page order of pasid table */
u32 max_pasid;  /* max pasid */
-   struct list_headdev;/* device list */
 };
 
 /* Get PRESENT bit of a PASID directory entry. */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 44016594831d..5c0dce78586a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -320,30 +320,6 @@ EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
 DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
 
-/*
- * Iterate over elements in device_domain_list and call the specified
- * callback @fn against each element.
- */
-int for_each_device_domain(int (*fn)(struct device_domain_info *info,
-void *data), void *data)
-{
-   int ret = 0;
-   unsigned long flags;
-   struct device_domain_info *info;
-
-   spin_lock_irqsave(_domain_lock, flags);
-   list_for_each_entry(info, _domain_list, global) {
-   ret = fn(info, data);
-   if (ret) {
-   spin_unlock_irqrestore(_domain_lock, flags);
-   return ret;
-   }
-   }
-   spin_unlock_irqrestore(_domain_lock, flags);
-
-   return 0;
-}
-
 const struct iommu_ops intel_iommu_ops;
 
 static bool translation_pre_enabled(struct intel_iommu *iommu)
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index cb4c1d0cf25c..17cad7c1f62d 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -86,54 +86,6 @@ void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid)
 /*
  * Per device pasid table management:
  */
-static inline void
-device_attach_pasid_table(struct device_domain_info *info,
- struct pasid_table *pasid_table)
-{
-   info->pasid_table = pasid_table;
-   list_add(>table, _table->dev);
-}
-

RE: [PATCH] dma-direct: use the correct size for dma_set_encrypted()

2022-06-23 Thread Dexuan Cui via iommu
> From: Christoph Hellwig 
> Sent: Wednesday, June 22, 2022 10:44 PM
> To: Dexuan Cui 
>  ...
> On Wed, Jun 22, 2022 at 12:14:24PM -0700, Dexuan Cui wrote:
> > The third parameter of dma_set_encrypted() is a size in bytes rather than
> > the number of pages.
> >
> > Fixes: 4d0564785bb0 ("dma-direct: factor out dma_set_{de,en}crypted
> helpers")
> > Signed-off-by: Dexuan Cui 
> 
> see:
> 
> commit 4a37f3dd9a83186cb88d44808ab35b78375082c9 (tag:
> dma-mapping-5.19-2022-05-25)
> Author: Robin Murphy 
> Date:   Fri May 20 18:10:13 2022 +0100
> 
> dma-direct: don't over-decrypt memory

It looks like commit 4a37f3dd9a831 fixed a different issue?

Here my patch is for the latest mainline:

In dma_direct_alloc()'s error handling path, we pass 'size' to 
dma_set_encrypted():
  out_encrypt_pages:
  if (dma_set_encrypted(dev, page_address(page), size))

However, in dma_direct_free(), we pass ' 1 << page_order ' to 
dma_set_encrypted().
I think the ' 1 << page_order' is incorrect and it should be 'size' as well?

Thanks,
-- Dexuan

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 1/2] driver core: fw_devlink: Allow firmware to mark devices as best effort

2022-06-23 Thread Sascha Hauer
On Wed, Jun 22, 2022 at 02:59:10PM -0700, Saravana Kannan wrote:
> When firmware sets the FWNODE_FLAG_BEST_EFFORT flag for a fwnode,
> fw_devlink will do a best effort ordering for that device where it'll
> only enforce the probe/suspend/resume ordering of that device with
> suppliers that have drivers. The driver of that device can then decide
> if it wants to defer probe or probe without the suppliers.
> 
> This will be useful for avoid probe delays of the console device that
> were caused by commit 71066545b48e ("driver core: Set
> fw_devlink.strict=1 by default").
> 
> Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> Reported-by: Sascha Hauer 
> Reported-by: Peng Fan 
> Signed-off-by: Saravana Kannan 
> ---
>  drivers/base/core.c| 3 ++-
>  include/linux/fwnode.h | 4 
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 839f64485a55..61edd18b7bf3 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -968,7 +968,8 @@ static void device_links_missing_supplier(struct device 
> *dev)
>  
>  static bool dev_is_best_effort(struct device *dev)
>  {
> - return fw_devlink_best_effort && dev->can_match;
> + return (fw_devlink_best_effort && dev->can_match) ||
> + dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT;

Check for dev->fwnode first. I am running in a NULL pointer exception
here for a device that doesn't have a fwnode.

Sascha


-- 
Pengutronix e.K.   | |
Steuerwalder Str. 21   | http://www.pengutronix.de/  |
31137 Hildesheim, Germany  | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts

2022-06-23 Thread Sai Prakash Ranjan

On 5/26/2022 9:44 AM, Sai Prakash Ranjan wrote:

TLB sync timeouts can be due to various reasons such as TBU power down
or pending TCU/TBU invalidation/sync and so on. Debugging these often
require dumping of some implementation defined registers to know the
status of TBU/TCU operations and some of these registers are not
accessible in non-secure world such as from kernel and requires SMC
calls to read them in the secure world. So, add this debug support
to dump implementation defined registers for TLB sync timeout issues.

Signed-off-by: Sai Prakash Ranjan 
---

Changes in v2:
  * Use scm call consistently so that it works on older chipsets where
some of these regs are secure registers.
  * Add device specific data to get the implementation defined register
offsets.

---
  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 161 ++---
  drivers/iommu/arm/arm-smmu/arm-smmu.c  |   2 +
  drivers/iommu/arm/arm-smmu/arm-smmu.h  |   1 +
  3 files changed, 146 insertions(+), 18 deletions(-)


Any comments on this patch?

Thanks,
Sai
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu