[PATCH] swiotlb: fix setting ->force_bounce

2022-06-01 Thread Christoph Hellwig
The swiotlb_init refactor messed up assigning ->force_bounce by doing
it in different places based on what caused the setting of the flag.

Fix this by passing the SWIOTLB_* flags to swiotlb_init_io_tlb_mem
and just setting it there.

Fixes: c6af2aa9ffc9 ("swiotlb: make the swiotlb_init interface more useful")
Reported-by: Nathan Chancellor 
Signed-off-by: Christoph Hellwig 
Tested-by: Nathan Chancellor 
---
 kernel/dma/swiotlb.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index dfa1de89dc944..cb50f8d383606 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -192,7 +192,7 @@ void __init swiotlb_update_mem_attributes(void)
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
-   unsigned long nslabs, bool late_alloc)
+   unsigned long nslabs, unsigned int flags, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
@@ -203,8 +203,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
mem->index = 0;
mem->late_alloc = late_alloc;
 
-   if (swiotlb_force_bounce)
-   mem->force_bounce = true;
+   mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
 
spin_lock_init(>lock);
for (i = 0; i < mem->nslabs; i++) {
@@ -275,8 +274,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
 
-   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
-   mem->force_bounce = flags & SWIOTLB_FORCE;
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags, false);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();
@@ -348,7 +346,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
set_memory_decrypted((unsigned long)vstart,
 (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
-   swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, 0, true);
 
swiotlb_print_info();
return 0;
@@ -835,8 +833,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem 
*rmem,
 
set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
 rmem->size >> PAGE_SHIFT);
-   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
-   mem->force_bounce = true;
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, SWIOTLB_FORCE,
+   false);
mem->for_alloc = true;
 
rmem->priv = mem;
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-06-01 Thread Jason Gunthorpe
On Fri, May 27, 2022 at 04:41:08PM -0600, Logan Gunthorpe wrote:
> > 
> > IIRC this is the last part:
> > 
> > https://lore.kernel.org/linux-mm/20220524190632.3304-1-alex.sie...@amd.com/
> > 
> > And the earlier bit with Christoph's pieces looks like it might get
> > merged to v5.19..
> > 
> > The general idea is once pte_devmap is not set then all the
> > refcounting works the way it should. This is what all new ZONE_DEVICE
> > users should do..
> 
> Ok, I don't actually follow how those patches relate to this.
> 
> Based on your description I guess I don't need to set PFN_DEV and

Yes

> perhaps not use vmf_insert_mixed()? And then just use vm_normal_page()?

I'm not sure ATM the best function to use, but yes, a function that
doesn't set PFN_DEV is needed here.
 
> But the refcounting of the pages seemed like it was already sane to me,
> unless you mean that the code no longer has to synchronize_rcu() before
> returning the pages... 

Right. It also doesn't need to call unmap range or keep track of the
inode, or do any of that stuff unless it really needs mmap revokation
semantics (which I doubt this use case does)

unmap range was only necessary because the refcounting is wrong -
since the pte's don't hold a ref on the page in PFN_DEV mode it is
necessary to wipe all the PTE explicitly before going ahead to
decrement the refcount on this path.

Just stuff the pages into the mmap, and your driver unprobe will
automatically block until all the mmaps are closed - no different than
having an open file descriptor or something.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/7] dt-bindings: iommu: mediatek: Add phandles for mediatek infra/pericfg

2022-06-01 Thread Rob Herring
On Wed, May 18, 2022 at 01:42:20PM +0200, AngeloGioacchino Del Regno wrote:
> Il 18/05/22 13:29, Matthias Brugger ha scritto:
> > 
> > 
> > On 18/05/2022 12:04, AngeloGioacchino Del Regno wrote:
> > > Add properties "mediatek,infracfg" and "mediatek,pericfg" to let the
> > > mtk_iommu driver retrieve phandles to the infracfg and pericfg syscon(s)
> > > instead of performing a per-soc compatible lookup.
> > > 
> > > Signed-off-by: AngeloGioacchino Del Regno 
> > > 
> > > ---
> > >   .../devicetree/bindings/iommu/mediatek,iommu.yaml | 8 
> > >   1 file changed, 8 insertions(+)
> > > 
> > > diff --git
> > > a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> > > b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> > > index 2ae3bbad7f1a..c4af41947593 100644
> > > --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> > > +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> > > @@ -101,6 +101,10 @@ properties:
> > >   items:
> > >     - const: bclk
> > > +  mediatek,infracfg:
> > > +    $ref: /schemas/types.yaml#/definitions/phandle
> > > +    description: The phandle to the mediatek infracfg syscon
> > > +
> > >     mediatek,larbs:
> > >   $ref: /schemas/types.yaml#/definitions/phandle-array
> > >   minItems: 1
> > > @@ -112,6 +116,10 @@ properties:
> > >     Refer to bindings/memory-controllers/mediatek,smi-larb.yaml. It 
> > > must sort
> > >     according to the local arbiter index, like larb0, larb1, larb2...
> > > +  mediatek,pericfg:
> > > +    $ref: /schemas/types.yaml#/definitions/phandle
> > > +    description: The phandle to the mediatek pericfg syscon
> > > +
> > 
> > I didn't explain myself. What I was suguesting was to squash the patch
> > that add requiered mediatek,infracfg with the patch that adds
> > mediatk,infracfg to the binding description. And then squash the both
> > patches adding pericfg as well.
> 
> Sorry Matthias, I'm not sure ... I think I'm misunderstanding you again...
> ...but if I'm not, I don't think that squashing actual code and bindings 
> together
> is something acceptable?
> 
> I've made that kind of mistake in the past and I was told multiple times that
> dt-bindings changes shall be sent separately from the actual driver changes.

Combine patches 1 and 6 is the suggestion, not driver changes.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Nathan Chancellor
On Wed, Jun 01, 2022 at 08:21:41PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 01, 2022 at 11:11:57AM -0700, Nathan Chancellor wrote:
> > On Wed, Jun 01, 2022 at 07:57:43PM +0200, Christoph Hellwig wrote:
> > > On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote:
> > > > On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote:
> > > > > Can you send me the full dmesg and the content of
> > > > > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot?
> > > > 
> > > > Sure thing, they are attached! If there is anything else I can provide
> > > > or test, I am more than happy to do so.
> > > 
> > > Nothing interesting.  But the performance numbers almost look like
> > > swiotlb=force got ignored before (even if I can't explain why).
> > 
> > I was able to get my performance back with this diff but I don't know if
> > this is a hack or a proper fix in the context of the series.
> 
> This looks good, but needs a little tweak.  I'd go for this variant of
> it:

Tested-by: Nathan Chancellor 

Thanks a lot for the quick fix!

> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index dfa1de89dc944..cb50f8d383606 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -192,7 +192,7 @@ void __init swiotlb_update_mem_attributes(void)
>  }
>  
>  static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> start,
> - unsigned long nslabs, bool late_alloc)
> + unsigned long nslabs, unsigned int flags, bool late_alloc)
>  {
>   void *vaddr = phys_to_virt(start);
>   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
> @@ -203,8 +203,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
> *mem, phys_addr_t start,
>   mem->index = 0;
>   mem->late_alloc = late_alloc;
>  
> - if (swiotlb_force_bounce)
> - mem->force_bounce = true;
> + mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
>  
>   spin_lock_init(>lock);
>   for (i = 0; i < mem->nslabs; i++) {
> @@ -275,8 +274,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
> unsigned int flags,
>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> __func__, alloc_size, PAGE_SIZE);
>  
> - swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
> - mem->force_bounce = flags & SWIOTLB_FORCE;
> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags, false);
>  
>   if (flags & SWIOTLB_VERBOSE)
>   swiotlb_print_info();
> @@ -348,7 +346,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  
>   set_memory_decrypted((unsigned long)vstart,
>(nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> - swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
> + swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, 0, true);
>  
>   swiotlb_print_info();
>   return 0;
> @@ -835,8 +833,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem 
> *rmem,
>  
>   set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
>rmem->size >> PAGE_SHIFT);
> - swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
> - mem->force_bounce = true;
> + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, SWIOTLB_FORCE,
> + false);
>   mem->for_alloc = true;
>  
>   rmem->priv = mem;
> 

Cheers,
Nathan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Christoph Hellwig
On Wed, Jun 01, 2022 at 11:11:57AM -0700, Nathan Chancellor wrote:
> On Wed, Jun 01, 2022 at 07:57:43PM +0200, Christoph Hellwig wrote:
> > On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote:
> > > On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote:
> > > > Can you send me the full dmesg and the content of
> > > > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot?
> > > 
> > > Sure thing, they are attached! If there is anything else I can provide
> > > or test, I am more than happy to do so.
> > 
> > Nothing interesting.  But the performance numbers almost look like
> > swiotlb=force got ignored before (even if I can't explain why).
> 
> I was able to get my performance back with this diff but I don't know if
> this is a hack or a proper fix in the context of the series.

This looks good, but needs a little tweak.  I'd go for this variant of
it:


diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index dfa1de89dc944..cb50f8d383606 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -192,7 +192,7 @@ void __init swiotlb_update_mem_attributes(void)
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
-   unsigned long nslabs, bool late_alloc)
+   unsigned long nslabs, unsigned int flags, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
@@ -203,8 +203,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
mem->index = 0;
mem->late_alloc = late_alloc;
 
-   if (swiotlb_force_bounce)
-   mem->force_bounce = true;
+   mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
 
spin_lock_init(>lock);
for (i = 0; i < mem->nslabs; i++) {
@@ -275,8 +274,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
 
-   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
-   mem->force_bounce = flags & SWIOTLB_FORCE;
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags, false);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();
@@ -348,7 +346,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
set_memory_decrypted((unsigned long)vstart,
 (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
-   swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, 0, true);
 
swiotlb_print_info();
return 0;
@@ -835,8 +833,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem 
*rmem,
 
set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
 rmem->size >> PAGE_SHIFT);
-   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
-   mem->force_bounce = true;
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, SWIOTLB_FORCE,
+   false);
mem->for_alloc = true;
 
rmem->priv = mem;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Nathan Chancellor
On Wed, Jun 01, 2022 at 07:57:43PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote:
> > On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote:
> > > Can you send me the full dmesg and the content of
> > > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot?
> > 
> > Sure thing, they are attached! If there is anything else I can provide
> > or test, I am more than happy to do so.
> 
> Nothing interesting.  But the performance numbers almost look like
> swiotlb=force got ignored before (even if I can't explain why).

I was able to get my performance back with this diff but I don't know if
this is a hack or a proper fix in the context of the series.

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index dfa1de89dc94..0bfb2fe3d8c5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -276,7 +276,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
  __func__, alloc_size, PAGE_SIZE);
 
swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
-   mem->force_bounce = flags & SWIOTLB_FORCE;
+   mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();

> Do you get a similar performance with the new kernel without
> swiotlb=force as the old one with that argument by any chance?

I'll see if I can test that, as I am not sure I have control over those
cmdline arguments.

Cheers,
Nathan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Christoph Hellwig
On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote:
> On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote:
> > Can you send me the full dmesg and the content of
> > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot?
> 
> Sure thing, they are attached! If there is anything else I can provide
> or test, I am more than happy to do so.

Nothing interesting.  But the performance numbers almost look like
swiotlb=force got ignored before (even if I can't explain why).
Do you get a similar performance with the new kernel without
swiotlb=force as the old one with that argument by any chance?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Nathan Chancellor
On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote:
> Can you send me the full dmesg and the content of
> /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot?

Sure thing, they are attached! If there is anything else I can provide
or test, I am more than happy to do so.

Cheers,
Nathan
# cat /sys/kernel/debug/swiotlb/io_tlb_nslabs
32768

# dmesg
[0.00] Linux version 
5.18.0-rc3-microsoft-standard-WSL2-8-ga3e230926708 
(nathan@dev-arch.thelio-3990X) (gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38) 
#1 SMP PREEMPT_DYNAMIC Wed Jun 1 10:38:34 MST 2022
[0.00] Command line: initrd=\initrd.img panic=-1 nr_cpus=8 
swiotlb=force earlycon=uart8250,io,0x3f8,115200 console=hvc0 debug 
pty.legacy_count=0
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'compacted' format.
[0.00] signal: max sigframe size: 1776
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009] usable
[0.00] BIOS-e820: [mem 0x000e-0x000e0fff] reserved
[0.00] BIOS-e820: [mem 0x0010-0x001f] ACPI data
[0.00] BIOS-e820: [mem 0x0020-0xf7ff] usable
[0.00] BIOS-e820: [mem 0x0001-0x000407ff] usable
[0.00] earlycon: uart8250 at I/O port 0x3f8 (options '115200')
[0.00] printk: bootconsole [uart8250] enabled
[0.00] NX (Execute Disable) protection: active
[0.00] DMI not present or invalid.
[0.00] Hypervisor detected: Microsoft Hyper-V
[0.00] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, hints 0xc2c, 
misc 0xe0bed7b6
[0.00] Hyper-V: Host Build 10.0.22000.708-0-0
[0.00] Hyper-V: Nested features: 0x4a
[0.00] Hyper-V: LAPIC Timer Frequency: 0x1e8480
[0.00] Hyper-V: Using hypercall for remote TLB flush
[0.00] clocksource: hyperv_clocksource_tsc_page: mask: 
0x max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
[0.05] tsc: Detected 3800.008 MHz processor
[0.001901] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.004593] e820: remove [mem 0x000a-0x000f] usable
[0.006806] last_pfn = 0x408000 max_arch_pfn = 0x4
[0.009042] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[0.011760] last_pfn = 0xf8000 max_arch_pfn = 0x4
[0.013959] Using GB pages for direct mapping
[0.015749] RAMDISK: [mem 0x0371f000-0x03779fff]
[0.017616] ACPI: Early table checksum verification disabled
[0.019854] ACPI: RSDP 0x000E 24 (v02 VRTUAL)
[0.022162] ACPI: XSDT 0x0010 44 (v01 VRTUAL MICROSFT 
0001 MSFT 0001)
[0.025624] ACPI: FACP 0x00101000 000114 (v06 VRTUAL MICROSFT 
0001 MSFT 0001)
[0.029022] ACPI: DSDT 0x001011B8 01E184 (v02 MSFTVM DSDT01   
0001 MSFT 0500)
[0.032413] ACPI: FACS 0x00101114 40
[0.034280] ACPI: OEM0 0x00101154 64 (v01 VRTUAL MICROSFT 
0001 MSFT 0001)
[0.037699] ACPI: SRAT 0x0011F33C 000330 (v02 VRTUAL MICROSFT 
0001 MSFT 0001)
[0.041089] ACPI: APIC 0x0011F66C 88 (v04 VRTUAL MICROSFT 
0001 MSFT 0001)
[0.044475] ACPI: Reserving FACP table memory at [mem 0x101000-0x101113]
[0.047159] ACPI: Reserving DSDT table memory at [mem 0x1011b8-0x11f33b]
[0.049905] ACPI: Reserving FACS table memory at [mem 0x101114-0x101153]
[0.052693] ACPI: Reserving OEM0 table memory at [mem 0x101154-0x1011b7]
[0.055404] ACPI: Reserving SRAT table memory at [mem 0x11f33c-0x11f66b]
[0.058040] ACPI: Reserving APIC table memory at [mem 0x11f66c-0x11f6f3]
[0.061078] Zone ranges:
[0.062074]   DMA  [mem 0x1000-0x00ff]
[0.066106]   DMA32[mem 0x0100-0x]
[0.068763]   Normal   [mem 0x0001-0x000407ff]
[0.071235]   Device   empty
[0.072384] Movable zone start for each node
[0.074058] Early memory node ranges
[0.075515]   node   0: [mem 0x1000-0x0009]
[0.077979]   node   0: [mem 0x0020-0xf7ff]
[0.080483]   node   0: [mem 0x0001-0x000407ff]
[0.082980] Initmem setup node 0 [mem 0x1000-0x000407ff]
[0.085954] On node 0, zone DMA: 1 pages in unavailable ranges
[0.085972] On node 0, zone DMA: 352 

Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Christoph Hellwig
Can you send me the full dmesg and the content of
/sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot?

Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Nathan Chancellor
Hi Christoph,

On Mon, Apr 04, 2022 at 07:05:53AM +0200, Christoph Hellwig wrote:
> Pass a bool to pass if swiotlb needs to be enabled based on the
> addressing needs and replace the verbose argument with a set of
> flags, including one to force enable bounce buffering.
> 
> Note that this patch removes the possibility to force xen-swiotlb
> use using swiotlb=force on the command line on x86 (arm and arm64
> never supported that), but this interface will be restored shortly.
> 
> Signed-off-by: Christoph Hellwig 

I bisected a performance regression in WSL2 to this change as commit
c6af2aa9ffc9 ("swiotlb: make the swiotlb_init interface more useful") in
mainline (bisect log below). I initially noticed it because accessing the
Windows filesystem through the /mnt/c mount is about 40x slower if I am doing
my math right based on the benchmarks below.

Before:

$ uname -r; and hyperfine "ls -l /mnt/c/Users/natec/Downloads"
5.18.0-rc3-microsoft-standard-WSL2-8-ga3e230926708
Benchmark 1: ls -l /mnt/c/Users/natec/Downloads
  Time (mean ± σ): 564.5 ms ±  24.1 ms[User: 2.5 ms, System: 130.3 ms]
  Range (min … max):   510.2 ms … 588.0 ms10 runs

After

$ uname -r; and hyperfine "ls -l /mnt/c/Users/natec/Downloads"
5.18.0-rc3-microsoft-standard-WSL2-9-gc6af2aa9ffc9
Benchmark 1: ls -l /mnt/c/Users/natec/Downloads
  Time (mean ± σ): 23.282 s ±  1.220 s[User: 0.013 s, System: 0.101 s]
  Range (min … max):   21.793 s … 25.317 s10 runs

I do see 'swiotlb=force' on the cmdline:

$ cat /proc/cmdline
initrd=\initrd.img panic=-1 nr_cpus=8 swiotlb=force 
earlycon=uart8250,io,0x3f8,115200 console=hvc0 debug pty.legacy_count=0

/mnt/c appears to be a 9p mount, not sure if that is relevant here:

$ mount &| grep /mnt/c
drvfs on /mnt/c type 9p 
(rw,noatime,dirsync,aname=drvfs;path=C:\;uid=1000;gid=1000;symlinkroot=/mnt/,mmap,access=client,msize=262144,trans=virtio)

If there is any other information I can provide, please let me know.

Cheers,
Nathan

# bad: [700170bf6b4d773e328fa54ebb70ba444007c702] Merge tag 'nfs-for-5.19-1' of 
git://git.linux-nfs.org/projects/anna/linux-nfs
# good: [4b0986a3613c92f4ec1bdc7f60ec66fea135991f] Linux 5.18
git bisect start '700170bf6b4d773e328fa54ebb70ba444007c702' 'v5.18'
# good: [86c87bea6b42100c67418af690919c44de6ede6e] Merge tag 
'devicetree-for-5.19' of 
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect good 86c87bea6b42100c67418af690919c44de6ede6e
# bad: [ae862183285cbb2ef9032770d98ffa9becffe9d5] Merge tag 'arm-dt-5.19' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad ae862183285cbb2ef9032770d98ffa9becffe9d5
# good: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag 
'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
git bisect good 2518f226c60d8e04d18ba4295500a5b0b8ac7659
# bad: [babf0bb978e3c9fce6c4eba6b744c8754fd43d8e] Merge tag 
'xfs-5.19-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad babf0bb978e3c9fce6c4eba6b744c8754fd43d8e
# good: [beed983621fbdfd291e6e3a0cdc4d10517e60af8] ASoC: Intel: avs: Machine 
board registration
git bisect good beed983621fbdfd291e6e3a0cdc4d10517e60af8
# good: [fbe86daca0ba878b04fa241b85e26e54d17d4229] Merge tag 'scsi-misc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect good fbe86daca0ba878b04fa241b85e26e54d17d4229
# good: [166afc45ed5523298541fd0297f9ad585cc2708c] Merge tag 
'reflink-speedups-5.19_2022-04-28' of 
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into 
xfs-5.19-for-next
git bisect good 166afc45ed5523298541fd0297f9ad585cc2708c
# bad: [e375780b631a5fc2a61a3b4fa12429255361a31e] Merge tag 
'fsnotify_for_v5.19-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
git bisect bad e375780b631a5fc2a61a3b4fa12429255361a31e
# bad: [4a37f3dd9a83186cb88d44808ab35b78375082c9] dma-direct: don't 
over-decrypt memory
git bisect bad 4a37f3dd9a83186cb88d44808ab35b78375082c9
# bad: [742519538e6b07250c8085bbff4bd358bc03bf16] swiotlb: pass a gfp_mask 
argument to swiotlb_init_late
git bisect bad 742519538e6b07250c8085bbff4bd358bc03bf16
# good: [9bbe7a7fc126e3d14fefa4b035854aba080926d9] arm/xen: don't check for 
xen_initial_domain() in xen_create_contiguous_region
git bisect good 9bbe7a7fc126e3d14fefa4b035854aba080926d9
# good: [a3e230926708125205ffd06d3dc2175a8263ae7e] x86: centralize setting 
SWIOTLB_FORCE when guest memory encryption is enabled
git bisect good a3e230926708125205ffd06d3dc2175a8263ae7e
# bad: [8ba2ed1be90fc210126f68186564707478552c95] swiotlb: add a SWIOTLB_ANY 
flag to lift the low memory restriction
git bisect bad 8ba2ed1be90fc210126f68186564707478552c95
# bad: [c6af2aa9ffc9763826607bc2664ef3ea4475ed18] swiotlb: make the 
swiotlb_init interface more useful
git bisect bad c6af2aa9ffc9763826607bc2664ef3ea4475ed18
# first bad commit: [c6af2aa9ffc9763826607bc2664ef3ea4475ed18] swiotlb: make 
the swiotlb_init interface more useful
___
iommu mailing 

[PATCH v2] dma-debug: Make things less spammy under memory pressure

2022-06-01 Thread Rob Clark
From: Rob Clark 

Limit the error msg to avoid flooding the console.  If you have a lot of
threads hitting this at once, they could have already gotten passed the
dma_debug_disabled() check before they get to the point of allocation
failure, resulting in quite a lot of this error message spamming the
log.  Use pr_err_once() to limit that.

Signed-off-by: Rob Clark 
---
v2: Use pr_err_once() instead of ratelimited, and spiff out commit msg a bit.

 kernel/dma/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index f8ff598596b8..754e3456f017 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -564,7 +564,7 @@ static void add_dma_entry(struct dma_debug_entry *entry, 
unsigned long attrs)
 
rc = active_cacheline_insert(entry);
if (rc == -ENOMEM) {
-   pr_err("cacheline tracking ENOMEM, dma-debug disabled\n");
+   pr_err_once("cacheline tracking ENOMEM, dma-debug disabled\n");
global_disable = true;
} else if (rc == -EEXIST && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
err_printk(entry->dev, entry,
-- 
2.36.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-06-01 Thread Jason Gunthorpe via iommu
On Wed, Jun 01, 2022 at 02:52:05PM +0100, Joao Martins wrote:
> On 6/1/22 13:33, Jason Gunthorpe wrote:
> > On Wed, Jun 01, 2022 at 01:18:52PM +0100, Joao Martins wrote:
> > 
> >>> So having safe racy reading in the kernel is probably best, and so RCU
> >>> would be good here too.
> >>
> >> Reading dirties ought to be similar to map/unmap but slightly simpler as
> >> I supposedly don't need to care about the pte changing under the hood (or
> >> so I initially thought). I was wrestling at some point if test-and-clear
> >> was enough or whether I switch back cmpxchg to detect the pte has changed
> >> and only mark dirty based on the old value[*]. The latter would align with
> >> how map/unmap performs the pte updates.
> > 
> > test-and-clear should be fine, but this all needs to be done under a
> > RCU context while the page tables themsevles are freed by RCU. Then
> > you can safely chase the page table pointers down to each level
> > without fear of UAF.
> > 
> 
> I was actually thinking more towards holding the same IOVA range lock to
> align with the rest of map/unmap/demote/etc? All these IO page table
> manip have all have the same performance requirements.

IMHO ideally we would not make read dirty use the IOVA range lock because
we want to read dirty big swaths of IOVA a time and it shouldn't be
forced to chunk based on the arbitary area construction.

But yes this could also be an option.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-06-01 Thread Joao Martins
On 6/1/22 13:33, Jason Gunthorpe wrote:
> On Wed, Jun 01, 2022 at 01:18:52PM +0100, Joao Martins wrote:
> 
>>> So having safe racy reading in the kernel is probably best, and so RCU
>>> would be good here too.
>>
>> Reading dirties ought to be similar to map/unmap but slightly simpler as
>> I supposedly don't need to care about the pte changing under the hood (or
>> so I initially thought). I was wrestling at some point if test-and-clear
>> was enough or whether I switch back cmpxchg to detect the pte has changed
>> and only mark dirty based on the old value[*]. The latter would align with
>> how map/unmap performs the pte updates.
> 
> test-and-clear should be fine, but this all needs to be done under a
> RCU context while the page tables themsevles are freed by RCU. Then
> you can safely chase the page table pointers down to each level
> without fear of UAF.
> 

I was actually thinking more towards holding the same IOVA range lock to
align with the rest of map/unmap/demote/etc? All these IO page table
manip have all have the same performance requirements.

>> I am not sure yet on dynamic demote/promote of page sizes if it changes this.
> 
> For this kind of primitive the caller must provide the locking, just
> like map/unmap.
> 
Ah OK.

> Effectively you can consider the iommu_domain has having externally
> provided range-locks over the IOVA space. map/unmap/demote/promote
> must run serially over intersecting IOVA ranges.
> 
> In terms of iommufd this means we always have to hold a lock related
> to the area (which is the IOVA range) before issuing any iommu call on
> the domain.

/me nods
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 5/8] dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops

2022-06-01 Thread Oleksandr



On 01.06.22 03:34, Stefano Stabellini wrote:

Hello Stefano


On Tue, 31 May 2022, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

The main purpose of this binding is to communicate Xen specific
information using generic IOMMU device tree bindings (which is
a good fit here) rather than introducing a custom property.

Introduce Xen specific IOMMU for the virtualized device (e.g. virtio)
to be used by Xen grant DMA-mapping layer in the subsequent commit.

The reference to Xen specific IOMMU node using "iommus" property
indicates that Xen grant mappings need to be enabled for the device,
and it specifies the ID of the domain where the corresponding backend
resides. The domid (domain ID) is used as an argument to the Xen grant
mapping APIs.

This is needed for the option to restrict memory access using Xen grant
mappings to work which primary goal is to enable using virtio devices
in Xen guests.

Signed-off-by: Oleksandr Tyshchenko 
---
Changes RFC -> V1:
- update commit subject/description and text in description
- move to devicetree/bindings/arm/

Changes V1 -> V2:
- update text in description
- change the maintainer of the binding
- fix validation issue
- reference xen,dev-domid.yaml schema from virtio/mmio.yaml

Change V2 -> V3:
- Stefano already gave his Reviewed-by, I dropped it due to the changes 
(significant)
- use generic IOMMU device tree bindings instead of custom property
  "xen,dev-domid"
- change commit subject and description, was
  "dt-bindings: Add xen,dev-domid property description for xen-grant DMA 
ops"
---
  .../devicetree/bindings/iommu/xen,grant-dma.yaml   | 49 ++
  1 file changed, 49 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml

diff --git a/Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml 
b/Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml
new file mode 100644
index ..ab5765c
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml
@@ -0,0 +1,49 @@
+# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/xen,grant-dma.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Xen specific IOMMU for virtualized devices (e.g. virtio)
+
+maintainers:
+  - Stefano Stabellini 
+
+description:
+  The reference to Xen specific IOMMU node using "iommus" property indicates
+  that Xen grant mappings need to be enabled for the device, and it specifies
+  the ID of the domain where the corresponding backend resides.
+  The binding is required to restrict memory access using Xen grant mappings.

I think this is OK and in line with the discussion we had on the list. I
propose the following wording instead:

"""
The Xen IOMMU represents the Xen grant table interface. Grant mappings
are to be used with devices connected to the Xen IOMMU using the
"iommus" property, which also specifies the ID of the backend domain.
The binding is required to restrict memory access using Xen grant
mappings.
"""



+properties:
+  compatible:
+const: xen,grant-dma
+
+  '#iommu-cells':
+const: 1
+description:
+  Xen specific IOMMU is multiple-master IOMMU device.
+  The single cell describes the domid (domain ID) of the domain where
+  the backend is running.

Here I would say:

"""
The single cell is the domid (domain ID) of the domain where the backend
is running.
"""

With the two wording improvements:


I am happy with proposed wording improvements, will update.




Reviewed-by: Stefano Stabellini 


Thanks!






+required:
+  - compatible
+  - "#iommu-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+xen_iommu {
+compatible = "xen,grant-dma";
+#iommu-cells = <1>;
+};
+
+virtio@3000 {
+compatible = "virtio,mmio";
+reg = <0x3000 0x100>;
+interrupts = <41>;
+
+/* The backend is located in Xen domain with ID 1 */
+iommus = <_iommu 1>;
+};
--
2.7.4


___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


--
Regards,

Oleksandr Tyshchenko

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-06-01 Thread Jason Gunthorpe via iommu
On Wed, Jun 01, 2022 at 01:18:52PM +0100, Joao Martins wrote:

> > So having safe racy reading in the kernel is probably best, and so RCU
> > would be good here too.
> 
> Reading dirties ought to be similar to map/unmap but slightly simpler as
> I supposedly don't need to care about the pte changing under the hood (or
> so I initially thought). I was wrestling at some point if test-and-clear
> was enough or whether I switch back cmpxchg to detect the pte has changed
> and only mark dirty based on the old value[*]. The latter would align with
> how map/unmap performs the pte updates.

test-and-clear should be fine, but this all needs to be done under a
RCU context while the page tables themsevles are freed by RCU. Then
you can safely chase the page table pointers down to each level
without fear of UAF.

> I am not sure yet on dynamic demote/promote of page sizes if it changes this.

For this kind of primitive the caller must provide the locking, just
like map/unmap.

Effectively you can consider the iommu_domain has having externally
provided range-locks over the IOVA space. map/unmap/demote/promote
must run serially over intersecting IOVA ranges.

In terms of iommufd this means we always have to hold a lock related
to the area (which is the IOVA range) before issuing any iommu call on
the domain.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-06-01 Thread Joao Martins
On 6/1/22 00:10, Jason Gunthorpe wrote:
> On Tue, May 31, 2022 at 10:22:32PM +0100, Robin Murphy wrote:
>> There are only 3 instances where we'll free a table while the domain is
>> live. The first is the one legitimate race condition, where two map requests
>> targeting relatively nearby PTEs both go to fill in an intermediate level of
>> table; whoever loses that race frees the table they allocated, but it was
>> never visible to anyone else so that's definitely fine. The second is if
>> we're mapping a block entry, and find that there's already a table entry
>> there, wherein we assume the table must be empty, clear the entry,
>> invalidate any walk caches, install the block entry, then free the orphaned
>> table; since we're mapping the entire IOVA range covered by that table,
>> there should be no other operations on that IOVA range attempting to walk
>> the table at the same time, so it's fine. 
> 
> I saw these two in the Intel driver
> 
>> The third is effectively the inverse, if we get a block-sized unmap
>> but find a table entry rather than a block at that point (on the
>> assumption that it's de-facto allowed for a single unmap to cover
>> multiple adjacent mappings as long as it does so exactly); similarly
>> we assume that the table must be full, and no other operations
>> should be racing because we're unmapping its whole IOVA range, so we
>> remove the table entry, invalidate, and free as before.
> 
> Not sure I noticed this one though
> 
> This all it all makes sense though.
> 

I saw all three incantations in AMD iommu /I think/

AMD has sort of a replicated PTEs concept representing a power of 2 page size
mapped in 'default' page sizes(4K, 2M, 1G), which we need to check every single
one of them. Which upon writing the RFC I realized it's not really the most
efficient thing to keep considering the IOMMU hardware doesn't guarantee the
dirty bit is on every replicated pte. And maybe it's clobbering the fact we
don't attempt to pick the best page-size for the IOVA mapping (like Intel),
to instead have all powers of 2 page sizes allowed and IOMMU hw tries its
best to cope.

>> Although we don't have debug dumping for io-pgtable-arm, it's good to be
>> thinking about this, since it's made me realise that dirty-tracking sweeps
>> per that proposal might pose a similar kind of concern, so we might still
>> need to harden these corners for the sake of that.
> 
> Lets make sure Joao sees this..
> 
> It is interesting because we probably don't want the big latency
> spikes that would come from using locking to block map/unmap while
> dirty reading is happening - eg at the iommfd level.
> 
Specially when we might be scanning big IOVA ranges.

> From a consistency POV the only case that matters is unmap and unmap
> should already be doing a dedicated dirty read directly prior to the
> unmap (as per that other long thread)
> 
/me nods, yes

FWIW, I've already removed the unmap_read_dirty ops.

> So having safe racy reading in the kernel is probably best, and so RCU
> would be good here too.
> 

Reading dirties ought to be similar to map/unmap but slightly simpler as
I supposedly don't need to care about the pte changing under the hood (or
so I initially thought). I was wrestling at some point if test-and-clear
was enough or whether I switch back cmpxchg to detect the pte has changed
and only mark dirty based on the old value[*]. The latter would align with
how map/unmap performs the pte updates.

[*] e.g. potentially the new entry hasn't been 'seen' by IOMMU walker or
might be a smaller size then what got dirtied with iopte split or racing
with a new map

>> that somewhere I have some half-finished patches making io-pgtable-arm use
>> the iommu_iotlb_gather freelist, so maybe I'll tackle both concerns at once
>> (perhaps we might even be able to RCU-ify the freelist generically? I'll see
>> how it goes when I get there).
> 
> This is all very similar to how the mm's tlb gather stuff works,
> especially on PPC which requires RCU protection of page tables instead
> of the IPI trick.
> 
> Currently the rcu_head and the lru overlap in the struct page. To do
> this I'd suggest following what was done for slab - see ca1a46d6f506
> ("Merge tag 'slab-for-5.17'..) and create a 'struct page' sub-class
> like 'struct slab' for use with iommu page tables and put the
> list_head and rcu_head sequentially in the struct iommu_pt_page.
> 
> Continue to use put_pages_list() just with the list_head in the new
> struct not the lru.
> 
> If we need it for dirty tracking then it makes sense to start
> progressing parts at least..
> 
The suggestions here seem to apply to both map/unmap too, not
just read_dirty() alone IIUC

I am not sure yet on dynamic demote/promote of page sizes if it changes this.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] MAINTAINERS: refurbish SWIOTLB SUBSYSTEM sections after refactoring

2022-06-01 Thread Christoph Hellwig
On Wed, Jun 01, 2022 at 09:56:13AM +0200, Lukas Bulwahn wrote:
> +F:   arch/x86/kernel/pci-dma.c

I think this file is better left for the x86 maintainers.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/12] iommu/vt-d: Check device list of domain in domain free path

2022-06-01 Thread Baolu Lu

On 2022/6/1 17:28, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Friday, May 27, 2022 2:30 PM

When the IOMMU domain is about to be freed, it should not be set on any
device. Instead of silently dealing with some bug cases, it's better to
trigger a warning to report and fix any potential bugs at the first time.





  static void domain_exit(struct dmar_domain *domain)
  {
-
-   /* Remove associated devices and clear attached or cached domains
*/
-   domain_remove_dev_info(domain);
+   if (WARN_ON(!list_empty(>devices)))
+   return;



warning is good but it doesn't mean the driver shouldn't deal with
that situation to make it safer e.g. blocking DMA from all attached
device...


I have ever thought the same thing. :-)

Blocking DMA from attached device should be done when setting blocking
domain to the device. It should not be part of freeing a domain.

Here, the caller asks the driver to free the domain, but the driver
finds that something is wrong. Therefore, it warns and returns directly.
The domain will still be there in use until the next set_domain().

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/12] iommu/vt-d: Acquiring lock in pasid manipulation helpers

2022-06-01 Thread Baolu Lu

On 2022/6/1 17:18, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Friday, May 27, 2022 2:30 PM

The iommu->lock is used to protect the per-IOMMU pasid directory table
and pasid table. Move the spinlock acquisition/release into the helpers
to make the code self-contained.

Signed-off-by: Lu Baolu 


Reviewed-by: Kevin Tian , with one nit



-   /* Caller must ensure PASID entry is not in use. */
-   if (pasid_pte_is_present(pte))
-   return -EBUSY;
+   spin_lock(>lock);
+   pte = get_non_present_pasid_entry(dev, pasid);
+   if (!pte) {
+   spin_unlock(>lock);
+   return -ENODEV;
+   }


I don't think above is a good abstraction and it changes the error
code for an present entry from -EBUSY to -ENODEV.


Sure. I will roll it back to -EBUSY. I added this helper because the
same code appears at least three times in this file.

Best regards,
baolu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/12] iommu/vt-d: Acquiring lock in domain ID allocation helpers

2022-06-01 Thread Baolu Lu

Hi Kevin,

Thank you for the comments.

On 2022/6/1 17:09, Tian, Kevin wrote:

From: Lu Baolu
Sent: Friday, May 27, 2022 2:30 PM

The iommu->lock is used to protect the per-IOMMU domain ID resource.
Move the spinlock acquisition/release into the helpers where domain
IDs are allocated and freed. The device_domain_lock is irrelevant to
domain ID resources, remove its assertion as well.

while moving the lock you also replace spin_lock_irqsave() with spin_lock().
It'd be cleaner to just do movement here and then replace all _irqsave()
in patch 8.


Yeah, that will be clearer.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-06-01 Thread Tian, Kevin
> From: Baolu Lu 
> Sent: Wednesday, June 1, 2022 5:37 PM
> 
> On 2022/6/1 09:43, Tian, Kevin wrote:
> >> From: Jacob Pan
> >> Sent: Wednesday, June 1, 2022 1:30 AM
>  In both cases the pasid is stored in the attach data instead of the
>  domain.
> 
> >> So during IOTLB flush for the domain, do we loop through the attach data?
> > Yes and it's required.
> 
> What does the attach data mean here? Do you mean group->pasid_array?

any structure describing the attaching relationship from {device, pasid} to
a domain

> 
> Why not tracking the {device, pasid} info in the iommu driver when
> setting domain to {device, pasid}? We have tracked device in a list when
> setting a domain to device.
> 

Yes, that tracking structure is the attach data. 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/10] dmapool: improve scalability of dma_pool_free

2022-06-01 Thread Robin Murphy

On 2022-05-31 23:10, Tony Battersby wrote:

On 5/31/22 17:54, Keith Busch wrote:

On Tue, May 31, 2022 at 02:23:44PM -0400, Tony Battersby wrote:

dma_pool_free() scales poorly when the pool contains many pages because
pool_find_page() does a linear scan of all allocated pages.  Improve its
scalability by replacing the linear scan with a red-black tree lookup.
In big O notation, this improves the algorithm from O(n^2) to O(n * log n).

The improvement should say O(n) to O(log n), right?


That would be the improvement for a single call to dma_pool_alloc or
dma_pool_free, but I was going with the improvement for "n" calls
instead, which is consistent with the improvement for the example in the
cover letter for mpt3sas.  I would have to look up the convention to be
sure of the proper notation in a situation like this, but I usually
think "inserting N items takes N^2 time"; to me it makes less sense to
say "inserting 1 item takes N time", because the N seems to come out of
nowhere.


No, n represents the size of the set that the algorithm is inserting 
into (or removing from, searching within, etc.). Hence why constant time 
is represented by O(1), i.e. not involving the variable at all.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-06-01 Thread Baolu Lu

On 2022/6/1 09:43, Tian, Kevin wrote:

From: Jacob Pan
Sent: Wednesday, June 1, 2022 1:30 AM

In both cases the pasid is stored in the attach data instead of the
domain.


So during IOTLB flush for the domain, do we loop through the attach data?

Yes and it's required.


What does the attach data mean here? Do you mean group->pasid_array?

Why not tracking the {device, pasid} info in the iommu driver when
setting domain to {device, pasid}? We have tracked device in a list when
setting a domain to device.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 09/12] iommu/vt-d: Check device list of domain in domain free path

2022-06-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> When the IOMMU domain is about to be freed, it should not be set on any
> device. Instead of silently dealing with some bug cases, it's better to
> trigger a warning to report and fix any potential bugs at the first time.
> 


>  static void domain_exit(struct dmar_domain *domain)
>  {
> -
> - /* Remove associated devices and clear attached or cached domains
> */
> - domain_remove_dev_info(domain);
> + if (WARN_ON(!list_empty(>devices)))
> + return;
> 

warning is good but it doesn't mean the driver shouldn't deal with
that situation to make it safer e.g. blocking DMA from all attached
device...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 07/12] iommu/vt-d: Acquiring lock in pasid manipulation helpers

2022-06-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> The iommu->lock is used to protect the per-IOMMU pasid directory table
> and pasid table. Move the spinlock acquisition/release into the helpers
> to make the code self-contained.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian , with one nit

> 
> - /* Caller must ensure PASID entry is not in use. */
> - if (pasid_pte_is_present(pte))
> - return -EBUSY;
> + spin_lock(>lock);
> + pte = get_non_present_pasid_entry(dev, pasid);
> + if (!pte) {
> + spin_unlock(>lock);
> + return -ENODEV;
> + }

I don't think above is a good abstraction and it changes the error
code for an present entry from -EBUSY to -ENODEV.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 06/12] iommu/vt-d: Acquiring lock in domain ID allocation helpers

2022-06-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> The iommu->lock is used to protect the per-IOMMU domain ID resource.
> Move the spinlock acquisition/release into the helpers where domain
> IDs are allocated and freed. The device_domain_lock is irrelevant to
> domain ID resources, remove its assertion as well.

while moving the lock you also replace spin_lock_irqsave() with spin_lock().
It'd be cleaner to just do movement here and then replace all _irqsave()
in patch 8.

> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 25 +
>  1 file changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 2d5f02b85de8..0da937ce0534 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1774,16 +1774,13 @@ static struct dmar_domain
> *alloc_domain(unsigned int type)
>   return domain;
>  }
> 
> -/* Must be called with iommu->lock */
>  static int domain_attach_iommu(struct dmar_domain *domain,
>  struct intel_iommu *iommu)
>  {
>   unsigned long ndomains;
> - int num;
> -
> - assert_spin_locked(_domain_lock);
> - assert_spin_locked(>lock);
> + int num, ret = 0;
> 
> + spin_lock(>lock);
>   domain->iommu_refcnt[iommu->seq_id] += 1;
>   if (domain->iommu_refcnt[iommu->seq_id] == 1) {
>   ndomains = cap_ndoms(iommu->cap);
> @@ -1792,7 +1789,8 @@ static int domain_attach_iommu(struct
> dmar_domain *domain,
>   if (num >= ndomains) {
>   pr_err("%s: No free domain ids\n", iommu->name);
>   domain->iommu_refcnt[iommu->seq_id] -= 1;
> - return -ENOSPC;
> + ret = -ENOSPC;
> + goto out_unlock;
>   }
> 
>   set_bit(num, iommu->domain_ids);
> @@ -1801,7 +1799,9 @@ static int domain_attach_iommu(struct
> dmar_domain *domain,
>   domain_update_iommu_cap(domain);
>   }
> 
> - return 0;
> +out_unlock:
> + spin_unlock(>lock);
> + return ret;
>  }
> 
>  static void domain_detach_iommu(struct dmar_domain *domain,
> @@ -1809,9 +1809,7 @@ static void domain_detach_iommu(struct
> dmar_domain *domain,
>  {
>   int num;
> 
> - assert_spin_locked(_domain_lock);
> - assert_spin_locked(>lock);
> -
> + spin_lock(>lock);
>   domain->iommu_refcnt[iommu->seq_id] -= 1;
>   if (domain->iommu_refcnt[iommu->seq_id] == 0) {
>   num = domain->iommu_did[iommu->seq_id];
> @@ -1819,6 +1817,7 @@ static void domain_detach_iommu(struct
> dmar_domain *domain,
>   domain_update_iommu_cap(domain);
>   domain->iommu_did[iommu->seq_id] = 0;
>   }
> + spin_unlock(>lock);
>  }
> 
>  static inline int guestwidth_to_adjustwidth(int gaw)
> @@ -2471,9 +2470,7 @@ static int domain_add_dev_info(struct
> dmar_domain *domain, struct device *dev)
> 
>   spin_lock_irqsave(_domain_lock, flags);
>   info->domain = domain;
> - spin_lock(>lock);
>   ret = domain_attach_iommu(domain, iommu);
> - spin_unlock(>lock);
>   if (ret) {
>   spin_unlock_irqrestore(_domain_lock, flags);
>   return ret;
> @@ -4158,7 +4155,6 @@ static void __dmar_remove_one_dev_info(struct
> device_domain_info *info)
>  {
>   struct dmar_domain *domain;
>   struct intel_iommu *iommu;
> - unsigned long flags;
> 
>   assert_spin_locked(_domain_lock);
> 
> @@ -4179,10 +4175,7 @@ static void __dmar_remove_one_dev_info(struct
> device_domain_info *info)
>   }
> 
>   list_del(>link);
> -
> - spin_lock_irqsave(>lock, flags);
>   domain_detach_iommu(domain, iommu);
> - spin_unlock_irqrestore(>lock, flags);
>  }
> 
>  static void dmar_remove_one_dev_info(struct device *dev)
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 05/12] iommu/vt-d: Unncessary spinlock for root table alloc and free

2022-06-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> The IOMMU root table is allocated and freed in the IOMMU initialization
> code in static boot or hot-plug paths. There's no need for a spinlock.

s/hot-plug/hot-remove/

> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/intel/iommu.c | 18 +-
>  1 file changed, 5 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index bbdd3417a1b1..2d5f02b85de8 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -809,14 +809,12 @@ static int device_context_mapped(struct
> intel_iommu *iommu, u8 bus, u8 devfn)
> 
>  static void free_context_table(struct intel_iommu *iommu)
>  {
> - int i;
> - unsigned long flags;
>   struct context_entry *context;
> + int i;
> +
> + if (!iommu->root_entry)
> + return;
> 
> - spin_lock_irqsave(>lock, flags);
> - if (!iommu->root_entry) {
> - goto out;
> - }
>   for (i = 0; i < ROOT_ENTRY_NR; i++) {
>   context = iommu_context_addr(iommu, i, 0, 0);
>   if (context)
> @@ -828,12 +826,10 @@ static void free_context_table(struct intel_iommu
> *iommu)
>   context = iommu_context_addr(iommu, i, 0x80, 0);
>   if (context)
>   free_pgtable_page(context);
> -
>   }
> +
>   free_pgtable_page(iommu->root_entry);
>   iommu->root_entry = NULL;
> -out:
> - spin_unlock_irqrestore(>lock, flags);
>  }
> 
>  #ifdef CONFIG_DMAR_DEBUG
> @@ -1232,7 +1228,6 @@ static void domain_unmap(struct dmar_domain
> *domain, unsigned long start_pfn,
>  static int iommu_alloc_root_entry(struct intel_iommu *iommu)
>  {
>   struct root_entry *root;
> - unsigned long flags;
> 
>   root = (struct root_entry *)alloc_pgtable_page(iommu->node);
>   if (!root) {
> @@ -1242,10 +1237,7 @@ static int iommu_alloc_root_entry(struct
> intel_iommu *iommu)
>   }
> 
>   __iommu_flush_cache(iommu, root, ROOT_SIZE);
> -
> - spin_lock_irqsave(>lock, flags);
>   iommu->root_entry = root;
> - spin_unlock_irqrestore(>lock, flags);
> 
>   return 0;
>  }
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 04/12] iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()

2022-06-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> Use pci_get_domain_bus_and_slot() instead of searching the global list
> to retrieve the pci device pointer. This removes device_domain_list
> global list as there are no consumers anymore.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/intel/iommu.h |  1 -
>  drivers/iommu/intel/iommu.c | 33 ++---
>  2 files changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 2f4a5b9509c0..6724703d573b 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -609,7 +609,6 @@ struct intel_iommu {
>  /* PCI domain-device relationship */
>  struct device_domain_info {
>   struct list_head link;  /* link to domain siblings */
> - struct list_head global; /* link to global list */
>   struct list_head table; /* link to pasid table */
>   u32 segment;/* PCI segment number */
>   u8 bus; /* PCI bus number */
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 25d4c5200526..bbdd3417a1b1 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -131,8 +131,6 @@ static struct intel_iommu **g_iommus;
> 
>  static void __init check_tylersburg_isoch(void);
>  static int rwbf_quirk;
> -static inline struct device_domain_info *
> -dmar_search_domain_by_dev_info(int segment, int bus, int devfn);
> 
>  /*
>   * set to 1 to panic kernel if can't successfully enable VT-d
> @@ -315,7 +313,6 @@ static int iommu_skip_te_disable;
>  #define IDENTMAP_AZALIA  4
> 
>  static DEFINE_SPINLOCK(device_domain_lock);
> -static LIST_HEAD(device_domain_list);
>  const struct iommu_ops intel_iommu_ops;
> 
>  static bool translation_pre_enabled(struct intel_iommu *iommu)
> @@ -845,9 +842,14 @@ static void pgtable_walk(struct intel_iommu
> *iommu, unsigned long pfn, u8 bus, u
>   struct device_domain_info *info;
>   struct dma_pte *parent, *pte;
>   struct dmar_domain *domain;
> + struct pci_dev *pdev;
>   int offset, level;
> 
> - info = dmar_search_domain_by_dev_info(iommu->segment, bus,
> devfn);
> + pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn);
> + if (!pdev)
> + return;
> +
> + info = dev_iommu_priv_get(>dev);
>   if (!info || !info->domain) {
>   pr_info("device [%02x:%02x.%d] not probed\n",
>   bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> @@ -2348,19 +2350,6 @@ static void domain_remove_dev_info(struct
> dmar_domain *domain)
>   spin_unlock_irqrestore(_domain_lock, flags);
>  }
> 
> -static inline struct device_domain_info *
> -dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
> -{
> - struct device_domain_info *info;
> -
> - list_for_each_entry(info, _domain_list, global)
> - if (info->segment == segment && info->bus == bus &&
> - info->devfn == devfn)
> - return info;
> -
> - return NULL;
> -}
> -
>  static int domain_setup_first_level(struct intel_iommu *iommu,
>   struct dmar_domain *domain,
>   struct device *dev,
> @@ -4564,7 +4553,6 @@ static struct iommu_device
> *intel_iommu_probe_device(struct device *dev)
>   struct pci_dev *pdev = dev_is_pci(dev) ? to_pci_dev(dev) : NULL;
>   struct device_domain_info *info;
>   struct intel_iommu *iommu;
> - unsigned long flags;
>   u8 bus, devfn;
> 
>   iommu = device_to_iommu(dev, , );
> @@ -4607,10 +4595,7 @@ static struct iommu_device
> *intel_iommu_probe_device(struct device *dev)
>   }
>   }
> 
> - spin_lock_irqsave(_domain_lock, flags);
> - list_add(>global, _domain_list);
>   dev_iommu_priv_set(dev, info);
> - spin_unlock_irqrestore(_domain_lock, flags);
> 
>   return >iommu;
>  }
> @@ -4618,15 +4603,9 @@ static struct iommu_device
> *intel_iommu_probe_device(struct device *dev)
>  static void intel_iommu_release_device(struct device *dev)
>  {
>   struct device_domain_info *info = dev_iommu_priv_get(dev);
> - unsigned long flags;
> 
>   dmar_remove_one_dev_info(dev);
> -
> - spin_lock_irqsave(_domain_lock, flags);
>   dev_iommu_priv_set(dev, NULL);
> - list_del(>global);
> - spin_unlock_irqrestore(_domain_lock, flags);
> -
>   kfree(info);
>   set_dma_ops(dev, NULL);
>  }
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 02/12] iommu/vt-d: Remove for_each_device_domain()

2022-06-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Friday, May 27, 2022 2:30 PM
> 
> The per-device device_domain_info data could be retrieved from the
> device itself. There's no need to search a global list.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/intel/iommu.h |  2 --
>  drivers/iommu/intel/iommu.c | 25 -
>  drivers/iommu/intel/pasid.c | 37 +++--
>  3 files changed, 11 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 8a6d64d726c0..2f4a5b9509c0 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -727,8 +727,6 @@ extern int dmar_ir_support(void);
>  void *alloc_pgtable_page(int node);
>  void free_pgtable_page(void *vaddr);
>  struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
> -int for_each_device_domain(int (*fn)(struct device_domain_info *info,
> -  void *data), void *data);
>  void iommu_flush_write_buffer(struct intel_iommu *iommu);
>  int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device
> *dev);
>  struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8
> *devfn);
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index cacae8bdaa65..6549b09d7f32 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -316,31 +316,6 @@ static int iommu_skip_te_disable;
> 
>  static DEFINE_SPINLOCK(device_domain_lock);
>  static LIST_HEAD(device_domain_list);
> -
> -/*
> - * Iterate over elements in device_domain_list and call the specified
> - * callback @fn against each element.
> - */
> -int for_each_device_domain(int (*fn)(struct device_domain_info *info,
> -  void *data), void *data)
> -{
> - int ret = 0;
> - unsigned long flags;
> - struct device_domain_info *info;
> -
> - spin_lock_irqsave(_domain_lock, flags);
> - list_for_each_entry(info, _domain_list, global) {
> - ret = fn(info, data);
> - if (ret) {
> - spin_unlock_irqrestore(_domain_lock, flags);
> - return ret;
> - }
> - }
> - spin_unlock_irqrestore(_domain_lock, flags);
> -
> - return 0;
> -}
> -
>  const struct iommu_ops intel_iommu_ops;
> 
>  static bool translation_pre_enabled(struct intel_iommu *iommu)
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index b2ac5869286f..0627d6465f25 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -103,36 +103,20 @@ device_detach_pasid_table(struct
> device_domain_info *info,
>  }
> 
>  struct pasid_table_opaque {
> - struct pasid_table  **pasid_table;
> - int segment;
> - int bus;
> - int devfn;
> + struct pasid_table  *pasid_table;
>  };
> 
> -static int search_pasid_table(struct device_domain_info *info, void *opaque)
> -{
> - struct pasid_table_opaque *data = opaque;
> -
> - if (info->iommu->segment == data->segment &&
> - info->bus == data->bus &&
> - info->devfn == data->devfn &&
> - info->pasid_table) {
> - *data->pasid_table = info->pasid_table;
> - return 1;
> - }
> -
> - return 0;
> -}
> -
>  static int get_alias_pasid_table(struct pci_dev *pdev, u16 alias, void
> *opaque)
>  {
>   struct pasid_table_opaque *data = opaque;
> + struct device_domain_info *info;
> 
> - data->segment = pci_domain_nr(pdev->bus);
> - data->bus = PCI_BUS_NUM(alias);
> - data->devfn = alias & 0xff;
> + info = dev_iommu_priv_get(>dev);
> + if (!info || !info->pasid_table)
> + return 0;
> 
> - return for_each_device_domain(_pasid_table, data);
> + data->pasid_table = info->pasid_table;
> + return 1;
>  }
> 
>  /*
> @@ -141,9 +125,9 @@ static int get_alias_pasid_table(struct pci_dev *pdev,
> u16 alias, void *opaque)
>   */
>  int intel_pasid_alloc_table(struct device *dev)
>  {
> + struct pasid_table_opaque data = { NULL };
>   struct device_domain_info *info;
>   struct pasid_table *pasid_table;
> - struct pasid_table_opaque data;
>   struct page *pages;
>   u32 max_pasid = 0;
>   int ret, order;
> @@ -155,11 +139,12 @@ int intel_pasid_alloc_table(struct device *dev)
>   return -EINVAL;
> 
>   /* DMA alias device already has a pasid table, use it: */
> - data.pasid_table = _table;
>   ret = pci_for_each_dma_alias(to_pci_dev(dev),
>_alias_pasid_table, );
> - if (ret)
> + if (ret) {
> + pasid_table = data.pasid_table;
>   goto attach_out;
> + }
> 
>   pasid_table = kzalloc(sizeof(*pasid_table), GFP_KERNEL);
>   if (!pasid_table)
> --
> 2.25.1

___
iommu mailing list

RE: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-06-01 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, June 1, 2022 7:11 AM
> 
> On Tue, May 31, 2022 at 10:22:32PM +0100, Robin Murphy wrote:
> 
> > There are only 3 instances where we'll free a table while the domain is
> > live. The first is the one legitimate race condition, where two map requests
> > targeting relatively nearby PTEs both go to fill in an intermediate level of
> > table; whoever loses that race frees the table they allocated, but it was
> > never visible to anyone else so that's definitely fine. The second is if
> > we're mapping a block entry, and find that there's already a table entry
> > there, wherein we assume the table must be empty, clear the entry,
> > invalidate any walk caches, install the block entry, then free the orphaned
> > table; since we're mapping the entire IOVA range covered by that table,
> > there should be no other operations on that IOVA range attempting to walk
> > the table at the same time, so it's fine.
> 
> I saw these two in the Intel driver
> 
> > The third is effectively the inverse, if we get a block-sized unmap
> > but find a table entry rather than a block at that point (on the
> > assumption that it's de-facto allowed for a single unmap to cover
> > multiple adjacent mappings as long as it does so exactly); similarly
> > we assume that the table must be full, and no other operations
> > should be racing because we're unmapping its whole IOVA range, so we
> > remove the table entry, invalidate, and free as before.
> 
> Not sure I noticed this one though
> 
> This all it all makes sense though.

Intel driver also does this. See dma_pte_clear_level():

/* If range covers entire pagetable, free it */
if (start_pfn <= level_pfn &&
 last_pfn >= level_pfn + level_size(level) - 1) {
...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] MAINTAINERS: refurbish SWIOTLB SUBSYSTEM sections after refactoring

2022-06-01 Thread Juergen Gross via iommu

On 01.06.22 09:56, Lukas Bulwahn wrote:

Commit 78013eaadf69 ("x86: remove the IOMMU table infrastructure")
refactored the generic swiotlb/swiotlb-xen setup into pci-dma.c, but
misses to adjust MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about
broken references.

Update the SWIOTLB SUBSYSTEM to include arch/x86/kernel/pci-dma.c, which
contains the swiotlb setup now and drop the file pattern that does not
match any files.

Further, update the XEN SWIOTLB SUBSYSTEM to include all swiotlb-xen
headers and replace the pattern in drivers with the specific one file that
matches this pattern.

Signed-off-by: Lukas Bulwahn 


Acked-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] MAINTAINERS: refurbish SWIOTLB SUBSYSTEM sections after refactoring

2022-06-01 Thread Lukas Bulwahn
Commit 78013eaadf69 ("x86: remove the IOMMU table infrastructure")
refactored the generic swiotlb/swiotlb-xen setup into pci-dma.c, but
misses to adjust MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about
broken references.

Update the SWIOTLB SUBSYSTEM to include arch/x86/kernel/pci-dma.c, which
contains the swiotlb setup now and drop the file pattern that does not
match any files.

Further, update the XEN SWIOTLB SUBSYSTEM to include all swiotlb-xen
headers and replace the pattern in drivers with the specific one file that
matches this pattern.

Signed-off-by: Lukas Bulwahn 
---
Christoph, please pick this minor non-urgent clean-up patch for swiotlb.

 MAINTAINERS | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d5ea4ef223f8..cc12a3aaad45 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19166,7 +19166,7 @@ L:  iommu@lists.linux-foundation.org
 S: Supported
 W: http://git.infradead.org/users/hch/dma-mapping.git
 T: git git://git.infradead.org/users/hch/dma-mapping.git
-F: arch/*/kernel/pci-swiotlb.c
+F: arch/x86/kernel/pci-dma.c
 F: include/linux/swiotlb.h
 F: kernel/dma/swiotlb.c
 
@@ -21831,8 +21831,10 @@ M: Stefano Stabellini 
 L: xen-de...@lists.xenproject.org (moderated for non-subscribers)
 L: iommu@lists.linux-foundation.org
 S: Supported
-F: arch/x86/xen/*swiotlb*
-F: drivers/xen/*swiotlb*
+F: arch/*/include/asm/xen/swiotlb-xen.h
+F: drivers/xen/swiotlb-xen.c
+F: include/xen/arm/swiotlb-xen.h
+F: include/xen/swiotlb-xen.h
 
 XFS FILESYSTEM
 C: irc://irc.oftc.net/xfs
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 9/9] driver core: Delete driver_deferred_probe_check_state()

2022-06-01 Thread Saravana Kannan via iommu
The function is no longer used. So delete it.

Signed-off-by: Saravana Kannan 
---
 drivers/base/dd.c | 30 --
 include/linux/device/driver.h |  1 -
 2 files changed, 31 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 335e71d3a618..e600dd2afc35 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -274,42 +274,12 @@ static int __init deferred_probe_timeout_setup(char *str)
 }
 __setup("deferred_probe_timeout=", deferred_probe_timeout_setup);
 
-/**
- * driver_deferred_probe_check_state() - Check deferred probe state
- * @dev: device to check
- *
- * Return:
- * * -ENODEV if initcalls have completed and modules are disabled.
- * * -ETIMEDOUT if the deferred probe timeout was set and has expired
- *   and modules are enabled.
- * * -EPROBE_DEFER in other cases.
- *
- * Drivers or subsystems can opt-in to calling this function instead of 
directly
- * returning -EPROBE_DEFER.
- */
-int driver_deferred_probe_check_state(struct device *dev)
-{
-   if (!IS_ENABLED(CONFIG_MODULES) && initcalls_done) {
-   dev_warn(dev, "ignoring dependency for device, assuming no 
driver\n");
-   return -ENODEV;
-   }
-
-   if (!driver_deferred_probe_timeout && initcalls_done) {
-   dev_warn(dev, "deferred probe timeout, ignoring dependency\n");
-   return -ETIMEDOUT;
-   }
-
-   return -EPROBE_DEFER;
-}
-EXPORT_SYMBOL_GPL(driver_deferred_probe_check_state);
-
 static void deferred_probe_timeout_work_func(struct work_struct *work)
 {
struct device_private *p;
 
fw_devlink_drivers_done();
 
-   driver_deferred_probe_timeout = 0;
driver_deferred_probe_trigger();
flush_work(_probe_work);
 
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 2114d65b862f..7acaabde5396 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -242,7 +242,6 @@ driver_find_device_by_acpi_dev(struct device_driver *drv, 
const void *adev)
 
 extern int driver_deferred_probe_timeout;
 void driver_deferred_probe_add(struct device *dev);
-int driver_deferred_probe_check_state(struct device *dev);
 void driver_init(void);
 
 /**
-- 
2.36.1.255.ge46751e96f-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 8/9] iommu/of: Delete usage of driver_deferred_probe_check_state()

2022-06-01 Thread Saravana Kannan via iommu
Now that fw_devlink=on and fw_devlink.strict=1 by default and fw_devlink
supports iommu DT properties, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/iommu/of_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5696314ae69e..41f4eb005219 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -40,7 +40,7 @@ static int of_iommu_xlate(struct device *dev,
 * a proper probe-ordering dependency mechanism in future.
 */
if (!ops)
-   return driver_deferred_probe_check_state(dev);
+   return -ENODEV;
 
if (!try_module_get(ops->owner))
return -ENODEV;
-- 
2.36.1.255.ge46751e96f-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default

2022-06-01 Thread Saravana Kannan via iommu
Now that deferred_probe_timeout is non-zero by default, fw_devlink will
never permanently block the probing of devices. It'll try its best to
probe the devices in the right order and then finally let devices probe
even if their suppliers don't have any drivers.

Signed-off-by: Saravana Kannan 
---
 drivers/base/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 61fdfe99b348..977b379a495b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1613,7 +1613,7 @@ static int __init fw_devlink_setup(char *arg)
 }
 early_param("fw_devlink", fw_devlink_setup);
 
-static bool fw_devlink_strict;
+static bool fw_devlink_strict = true;
 static int __init fw_devlink_strict_setup(char *arg)
 {
return strtobool(arg, _devlink_strict);
-- 
2.36.1.255.ge46751e96f-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0."

2022-06-01 Thread Saravana Kannan via iommu
This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.

Let's take another shot at getting deferred_probe_timeout=10 to work.

Signed-off-by: Saravana Kannan 
---
 drivers/base/dd.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 4a55fbb7e0da..335e71d3a618 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void 
*data)
 }
 DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
+#ifdef CONFIG_MODULES
+int driver_deferred_probe_timeout = 10;
+#else
 int driver_deferred_probe_timeout;
+#endif
+
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
 
 static int __init deferred_probe_timeout_setup(char *str)
-- 
2.36.1.255.ge46751e96f-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/9] net: ipconfig: Relax fw_devlink if we need to mount a network rootfs

2022-06-01 Thread Saravana Kannan via iommu
If there are network devices that could probe without some of their
suppliers probing and those network devices are needed to mount a
network rootfs, then fw_devlink=on might break that usecase by blocking
the network devices from probing by the time IP auto config starts.

So, if no network devices are available when IP auto config is enabled
and we have a network rootfs, make sure fw_devlink doesn't block the
probing of any device that has a driver and then retry finding a network
device.

Signed-off-by: Saravana Kannan 
---
 net/ipv4/ipconfig.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 9d41d5d5cd1e..2342debd7066 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1434,6 +1434,7 @@ __be32 __init root_nfs_parse_addr(char *name)
 static int __init wait_for_devices(void)
 {
int i;
+   bool try_init_devs = true;
 
for (i = 0; i < DEVICE_WAIT_MAX; i++) {
struct net_device *dev;
@@ -1452,6 +1453,11 @@ static int __init wait_for_devices(void)
rtnl_unlock();
if (found)
return 0;
+   if (try_init_devs &&
+   (ROOT_DEV == Root_NFS || ROOT_DEV == Root_CIFS)) {
+   try_init_devs = false;
+   wait_for_init_devices_probe();
+   }
ssleep(1);
}
return -ENODEV;
-- 
2.36.1.255.ge46751e96f-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/9] driver core: Add wait_for_init_devices_probe helper function

2022-06-01 Thread Saravana Kannan via iommu
Some devices might need to be probed and bound successfully before the
kernel boot sequence can finish and move on to init/userspace. For
example, a network interface might need to be bound to be able to mount
a NFS rootfs.

With fw_devlink=on by default, some of these devices might be blocked
from probing because they are waiting on a optional supplier that
doesn't have a driver. While fw_devlink will eventually identify such
devices and unblock the probing automatically, it might be too late by
the time it unblocks the probing of devices. For example, the IP4
autoconfig might timeout before fw_devlink unblocks probing of the
network interface.

This function is available to temporarily try and probe all devices that
have a driver even if some of their suppliers haven't been added or
don't have drivers.

The drivers can then decide which of the suppliers are optional vs
mandatory and probe the device if possible. By the time this function
returns, all such "best effort" probes are guaranteed to be completed.
If a device successfully probes in this mode, we delete all fw_devlink
discovered dependencies of that device where the supplier hasn't yet
probed successfully because they have to be optional dependencies.

This also means that some devices that aren't needed for init and could
have waited for their optional supplier to probe (when the supplier's
module is loaded later on) would end up probing prematurely with limited
functionality.  So call this function only when boot would fail without
it.

Signed-off-by: Saravana Kannan 
---
 drivers/base/base.h   |   1 +
 drivers/base/core.c   | 100 --
 drivers/base/dd.c |  19 +--
 include/linux/device/driver.h |   1 +
 4 files changed, 110 insertions(+), 11 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index ab71403d102f..b3a43a164dcd 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -160,6 +160,7 @@ extern int devres_release_all(struct device *dev);
 extern void device_block_probing(void);
 extern void device_unblock_probing(void);
 extern void deferred_probe_extend_timeout(void);
+extern void driver_deferred_probe_trigger(void);
 
 /* /sys/devices directory */
 extern struct kset *devices_kset;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 7cd789c4985d..61fdfe99b348 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -54,6 +54,7 @@ static unsigned int defer_sync_state_count = 1;
 static DEFINE_MUTEX(fwnode_link_lock);
 static bool fw_devlink_is_permissive(void);
 static bool fw_devlink_drv_reg_done;
+static bool fw_devlink_best_effort;
 
 /**
  * fwnode_link_add - Create a link between two fwnode_handles.
@@ -965,6 +966,11 @@ static void device_links_missing_supplier(struct device 
*dev)
}
 }
 
+static bool dev_is_best_effort(struct device *dev)
+{
+   return fw_devlink_best_effort && dev->can_match;
+}
+
 /**
  * device_links_check_suppliers - Check presence of supplier drivers.
  * @dev: Consumer device.
@@ -984,7 +990,7 @@ static void device_links_missing_supplier(struct device 
*dev)
 int device_links_check_suppliers(struct device *dev)
 {
struct device_link *link;
-   int ret = 0;
+   int ret = 0, fwnode_ret = 0;
struct fwnode_handle *sup_fw;
 
/*
@@ -997,12 +1003,17 @@ int device_links_check_suppliers(struct device *dev)
sup_fw = list_first_entry(>fwnode->suppliers,
  struct fwnode_link,
  c_hook)->supplier;
-   dev_err_probe(dev, -EPROBE_DEFER, "wait for supplier %pfwP\n",
- sup_fw);
-   mutex_unlock(_link_lock);
-   return -EPROBE_DEFER;
+   if (!dev_is_best_effort(dev)) {
+   fwnode_ret = -EPROBE_DEFER;
+   dev_err_probe(dev, -EPROBE_DEFER,
+   "wait for supplier %pfwP\n", sup_fw);
+   } else {
+   fwnode_ret = -EAGAIN;
+   }
}
mutex_unlock(_link_lock);
+   if (fwnode_ret == -EPROBE_DEFER)
+   return fwnode_ret;
 
device_links_write_lock();
 
@@ -1012,6 +1023,14 @@ int device_links_check_suppliers(struct device *dev)
 
if (link->status != DL_STATE_AVAILABLE &&
!(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
+
+   if (dev_is_best_effort(dev) &&
+   link->flags & DL_FLAG_INFERRED &&
+   !link->supplier->can_match) {
+   ret = -EAGAIN;
+   continue;
+   }
+
device_links_missing_supplier(dev);
dev_err_probe(dev, -EPROBE_DEFER,
  "supplier %s not ready\n",
@@ -1024,7 +1043,8 @@ int 

[PATCH v2 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()

2022-06-01 Thread Saravana Kannan via iommu
Now that fw_devlink=on by default and fw_devlink supports interrupt
properties, the execution will never get to the point where
driver_deferred_probe_check_state() is called before the supplier has
probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/net/mdio/fwnode_mdio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c
index 1c1584fca632..3e79c2c51929 100644
--- a/drivers/net/mdio/fwnode_mdio.c
+++ b/drivers/net/mdio/fwnode_mdio.c
@@ -47,9 +47,7 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio,
 * just fall back to poll mode
 */
if (rc == -EPROBE_DEFER)
-   rc = driver_deferred_probe_check_state(>mdio.dev);
-   if (rc == -EPROBE_DEFER)
-   return rc;
+   rc = -ENODEV;
 
if (rc > 0) {
phy->irq = rc;
-- 
2.36.1.255.ge46751e96f-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/9] pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()

2022-06-01 Thread Saravana Kannan via iommu
Now that fw_devlink=on by default and fw_devlink supports
"pinctrl-[0-8]" property, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/pinctrl/devicetree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c
index 3fb238714718..ef898ee8ca6b 100644
--- a/drivers/pinctrl/devicetree.c
+++ b/drivers/pinctrl/devicetree.c
@@ -129,7 +129,7 @@ static int dt_to_map_one_config(struct pinctrl *p,
np_pctldev = of_get_next_parent(np_pctldev);
if (!np_pctldev || of_node_is_root(np_pctldev)) {
of_node_put(np_pctldev);
-   ret = driver_deferred_probe_check_state(p->dev);
+   ret = -ENODEV;
/* keep deferring if modules are enabled */
if (IS_ENABLED(CONFIG_MODULES) && !allow_default && ret 
< 0)
ret = -EPROBE_DEFER;
-- 
2.36.1.255.ge46751e96f-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/9] deferred_probe_timeout logic clean up

2022-06-01 Thread Saravana Kannan via iommu
This series is based on linux-next + these 2 small patches applies on top:
https://lore.kernel.org/lkml/20220526034609.480766-1-sarava...@google.com/

A lot of the deferred_probe_timeout logic is redundant with
fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
a few cases.

This series tries to delete the redundant logic, simplify the frameworks
that use driver_deferred_probe_check_state(), enable
deferred_probe_timeout=10 by default, and fixes the nfsroot failure
case.

The overall idea of this series is to replace the global behavior of
driver_deferred_probe_check_state() where all devices give up waiting on
supplier at the same time with a more granular behavior:

1. Devices with all their suppliers successfully probed by late_initcall
   probe as usual and avoid unnecessary deferred probe attempts.

2. At or after late_initcall, in cases where boot would break because of
   fw_devlink=on being strict about the ordering, we

   a. Temporarily relax the enforcement to probe any unprobed devices
  that can probe successfully in the current state of the system.
  For example, when we boot with a NFS rootfs and no network device
  has probed.
   b. Go back to enforcing the ordering for any devices that haven't
  probed.

3. After deferred probe timeout expires, we permanently give up waiting
   on supplier devices without drivers. At this point, whatever devices
   can probe without some of their optional suppliers end up probing.

In the case where module support is disabled, it's fairly
straightforward and all device probes are completed before the initcalls
are done.

Patches 1 to 3 are fairly straightforward and can probably be applied
right away.

Patches 4 to 6 are for fixing the NFS rootfs issue and setting the
default deferred_probe_timeout back to 10 seconds when modules are
enabled.

Patches 7 to 9 are further clean up of the deferred_probe_timeout logic
so that no framework has to know/care about deferred_probe_timeout.

Yoshihiro/Geert,

If you can test this patch series and confirm that the NFS root case
works, I'd really appreciate that.

Thanks,
Saravana

v1 -> v2:
Rewrote the NFS rootfs fix to be a lot less destructive on the
fw_devlink ordering for devices that don't end up probing during the
"best effort" attempt at probing all devices needed for a network rootfs

Saravana Kannan (9):
  PM: domains: Delete usage of driver_deferred_probe_check_state()
  pinctrl: devicetree: Delete usage of
driver_deferred_probe_check_state()
  net: mdio: Delete usage of driver_deferred_probe_check_state()
  driver core: Add wait_for_init_devices_probe helper function
  net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
  Revert "driver core: Set default deferred_probe_timeout back to 0."
  driver core: Set fw_devlink.strict=1 by default
  iommu/of: Delete usage of driver_deferred_probe_check_state()
  driver core: Delete driver_deferred_probe_check_state()

 drivers/base/base.h|   1 +
 drivers/base/core.c| 102 ++---
 drivers/base/dd.c  |  54 ++---
 drivers/base/power/domain.c|   2 +-
 drivers/iommu/of_iommu.c   |   2 +-
 drivers/net/mdio/fwnode_mdio.c |   4 +-
 drivers/pinctrl/devicetree.c   |   2 +-
 include/linux/device/driver.h  |   2 +-
 net/ipv4/ipconfig.c|   6 ++
 9 files changed, 126 insertions(+), 49 deletions(-)

-- 
2.36.1.255.ge46751e96f-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-06-01 Thread Saravana Kannan via iommu
Now that fw_devlink=on by default and fw_devlink supports
"power-domains" property, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/base/power/domain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 739e52cd4aba..3e86772d5fac 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, 
struct device *base_dev,
mutex_unlock(_list_lock);
dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
__func__, PTR_ERR(pd));
-   return driver_deferred_probe_check_state(base_dev);
+   return -ENODEV;
}
 
dev_dbg(dev, "adding to PM domain %s\n", pd->name);
-- 
2.36.1.255.ge46751e96f-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-06-01 Thread Baolu Lu

On 2022/6/1 02:51, Jason Gunthorpe wrote:

Oh, I've spent the last couple of weeks hacking up horrible things
manipulating entries in init_mm, and never realised that that was actually
the special case. Oh well, live and learn.

The init_mm is sort of different, it doesn't have zap in quite the
same way, for example. I was talking about the typical process mm.

Anyhow, the right solution is to use RCU as I described before, Baolu
do you want to try?


Yes, of course.

Your discussion with Robin gave me a lot of inspiration. Very
appreciated! I want to use a separate patch to solve this debugfs
problem, because it has exceeded the original intention of this series.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu