[PATCH] swiotlb: fix setting ->force_bounce
The swiotlb_init refactor messed up assigning ->force_bounce by doing it in different places based on what caused the setting of the flag. Fix this by passing the SWIOTLB_* flags to swiotlb_init_io_tlb_mem and just setting it there. Fixes: c6af2aa9ffc9 ("swiotlb: make the swiotlb_init interface more useful") Reported-by: Nathan Chancellor Signed-off-by: Christoph Hellwig Tested-by: Nathan Chancellor --- kernel/dma/swiotlb.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index dfa1de89dc944..cb50f8d383606 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -192,7 +192,7 @@ void __init swiotlb_update_mem_attributes(void) } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, - unsigned long nslabs, bool late_alloc) + unsigned long nslabs, unsigned int flags, bool late_alloc) { void *vaddr = phys_to_virt(start); unsigned long bytes = nslabs << IO_TLB_SHIFT, i; @@ -203,8 +203,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->index = 0; mem->late_alloc = late_alloc; - if (swiotlb_force_bounce) - mem->force_bounce = true; + mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE); spin_lock_init(>lock); for (i = 0; i < mem->nslabs; i++) { @@ -275,8 +274,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags, panic("%s: Failed to allocate %zu bytes align=0x%lx\n", __func__, alloc_size, PAGE_SIZE); - swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); - mem->force_bounce = flags & SWIOTLB_FORCE; + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags, false); if (flags & SWIOTLB_VERBOSE) swiotlb_print_info(); @@ -348,7 +346,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, set_memory_decrypted((unsigned long)vstart, (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT); - swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true); + swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, 0, true); swiotlb_print_info(); return 0; @@ -835,8 +833,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem, set_memory_decrypted((unsigned long)phys_to_virt(rmem->base), rmem->size >> PAGE_SHIFT); - swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false); - mem->force_bounce = true; + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, SWIOTLB_FORCE, + false); mem->for_alloc = true; rmem->priv = mem; -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()
On Fri, May 27, 2022 at 04:41:08PM -0600, Logan Gunthorpe wrote: > > > > IIRC this is the last part: > > > > https://lore.kernel.org/linux-mm/20220524190632.3304-1-alex.sie...@amd.com/ > > > > And the earlier bit with Christoph's pieces looks like it might get > > merged to v5.19.. > > > > The general idea is once pte_devmap is not set then all the > > refcounting works the way it should. This is what all new ZONE_DEVICE > > users should do.. > > Ok, I don't actually follow how those patches relate to this. > > Based on your description I guess I don't need to set PFN_DEV and Yes > perhaps not use vmf_insert_mixed()? And then just use vm_normal_page()? I'm not sure ATM the best function to use, but yes, a function that doesn't set PFN_DEV is needed here. > But the refcounting of the pages seemed like it was already sane to me, > unless you mean that the code no longer has to synchronize_rcu() before > returning the pages... Right. It also doesn't need to call unmap range or keep track of the inode, or do any of that stuff unless it really needs mmap revokation semantics (which I doubt this use case does) unmap range was only necessary because the refcounting is wrong - since the pte's don't hold a ref on the page in PFN_DEV mode it is necessary to wipe all the PTE explicitly before going ahead to decrement the refcount on this path. Just stuff the pages into the mmap, and your driver unprobe will automatically block until all the mmaps are closed - no different than having an open file descriptor or something. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/7] dt-bindings: iommu: mediatek: Add phandles for mediatek infra/pericfg
On Wed, May 18, 2022 at 01:42:20PM +0200, AngeloGioacchino Del Regno wrote: > Il 18/05/22 13:29, Matthias Brugger ha scritto: > > > > > > On 18/05/2022 12:04, AngeloGioacchino Del Regno wrote: > > > Add properties "mediatek,infracfg" and "mediatek,pericfg" to let the > > > mtk_iommu driver retrieve phandles to the infracfg and pericfg syscon(s) > > > instead of performing a per-soc compatible lookup. > > > > > > Signed-off-by: AngeloGioacchino Del Regno > > > > > > --- > > > .../devicetree/bindings/iommu/mediatek,iommu.yaml | 8 > > > 1 file changed, 8 insertions(+) > > > > > > diff --git > > > a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > > > b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > > > index 2ae3bbad7f1a..c4af41947593 100644 > > > --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > > > +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > > > @@ -101,6 +101,10 @@ properties: > > > items: > > > - const: bclk > > > + mediatek,infracfg: > > > + $ref: /schemas/types.yaml#/definitions/phandle > > > + description: The phandle to the mediatek infracfg syscon > > > + > > > mediatek,larbs: > > > $ref: /schemas/types.yaml#/definitions/phandle-array > > > minItems: 1 > > > @@ -112,6 +116,10 @@ properties: > > > Refer to bindings/memory-controllers/mediatek,smi-larb.yaml. It > > > must sort > > > according to the local arbiter index, like larb0, larb1, larb2... > > > + mediatek,pericfg: > > > + $ref: /schemas/types.yaml#/definitions/phandle > > > + description: The phandle to the mediatek pericfg syscon > > > + > > > > I didn't explain myself. What I was suguesting was to squash the patch > > that add requiered mediatek,infracfg with the patch that adds > > mediatk,infracfg to the binding description. And then squash the both > > patches adding pericfg as well. > > Sorry Matthias, I'm not sure ... I think I'm misunderstanding you again... > ...but if I'm not, I don't think that squashing actual code and bindings > together > is something acceptable? > > I've made that kind of mistake in the past and I was told multiple times that > dt-bindings changes shall be sent separately from the actual driver changes. Combine patches 1 and 6 is the suggestion, not driver changes. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful
On Wed, Jun 01, 2022 at 08:21:41PM +0200, Christoph Hellwig wrote: > On Wed, Jun 01, 2022 at 11:11:57AM -0700, Nathan Chancellor wrote: > > On Wed, Jun 01, 2022 at 07:57:43PM +0200, Christoph Hellwig wrote: > > > On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote: > > > > On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote: > > > > > Can you send me the full dmesg and the content of > > > > > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot? > > > > > > > > Sure thing, they are attached! If there is anything else I can provide > > > > or test, I am more than happy to do so. > > > > > > Nothing interesting. But the performance numbers almost look like > > > swiotlb=force got ignored before (even if I can't explain why). > > > > I was able to get my performance back with this diff but I don't know if > > this is a hack or a proper fix in the context of the series. > > This looks good, but needs a little tweak. I'd go for this variant of > it: Tested-by: Nathan Chancellor Thanks a lot for the quick fix! > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index dfa1de89dc944..cb50f8d383606 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -192,7 +192,7 @@ void __init swiotlb_update_mem_attributes(void) > } > > static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t > start, > - unsigned long nslabs, bool late_alloc) > + unsigned long nslabs, unsigned int flags, bool late_alloc) > { > void *vaddr = phys_to_virt(start); > unsigned long bytes = nslabs << IO_TLB_SHIFT, i; > @@ -203,8 +203,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem > *mem, phys_addr_t start, > mem->index = 0; > mem->late_alloc = late_alloc; > > - if (swiotlb_force_bounce) > - mem->force_bounce = true; > + mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE); > > spin_lock_init(>lock); > for (i = 0; i < mem->nslabs; i++) { > @@ -275,8 +274,7 @@ void __init swiotlb_init_remap(bool addressing_limit, > unsigned int flags, > panic("%s: Failed to allocate %zu bytes align=0x%lx\n", > __func__, alloc_size, PAGE_SIZE); > > - swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); > - mem->force_bounce = flags & SWIOTLB_FORCE; > + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags, false); > > if (flags & SWIOTLB_VERBOSE) > swiotlb_print_info(); > @@ -348,7 +346,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, > > set_memory_decrypted((unsigned long)vstart, >(nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT); > - swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true); > + swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, 0, true); > > swiotlb_print_info(); > return 0; > @@ -835,8 +833,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem > *rmem, > > set_memory_decrypted((unsigned long)phys_to_virt(rmem->base), >rmem->size >> PAGE_SHIFT); > - swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false); > - mem->force_bounce = true; > + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, SWIOTLB_FORCE, > + false); > mem->for_alloc = true; > > rmem->priv = mem; > Cheers, Nathan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful
On Wed, Jun 01, 2022 at 11:11:57AM -0700, Nathan Chancellor wrote: > On Wed, Jun 01, 2022 at 07:57:43PM +0200, Christoph Hellwig wrote: > > On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote: > > > On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote: > > > > Can you send me the full dmesg and the content of > > > > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot? > > > > > > Sure thing, they are attached! If there is anything else I can provide > > > or test, I am more than happy to do so. > > > > Nothing interesting. But the performance numbers almost look like > > swiotlb=force got ignored before (even if I can't explain why). > > I was able to get my performance back with this diff but I don't know if > this is a hack or a proper fix in the context of the series. This looks good, but needs a little tweak. I'd go for this variant of it: diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index dfa1de89dc944..cb50f8d383606 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -192,7 +192,7 @@ void __init swiotlb_update_mem_attributes(void) } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, - unsigned long nslabs, bool late_alloc) + unsigned long nslabs, unsigned int flags, bool late_alloc) { void *vaddr = phys_to_virt(start); unsigned long bytes = nslabs << IO_TLB_SHIFT, i; @@ -203,8 +203,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->index = 0; mem->late_alloc = late_alloc; - if (swiotlb_force_bounce) - mem->force_bounce = true; + mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE); spin_lock_init(>lock); for (i = 0; i < mem->nslabs; i++) { @@ -275,8 +274,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags, panic("%s: Failed to allocate %zu bytes align=0x%lx\n", __func__, alloc_size, PAGE_SIZE); - swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); - mem->force_bounce = flags & SWIOTLB_FORCE; + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags, false); if (flags & SWIOTLB_VERBOSE) swiotlb_print_info(); @@ -348,7 +346,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, set_memory_decrypted((unsigned long)vstart, (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT); - swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true); + swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, 0, true); swiotlb_print_info(); return 0; @@ -835,8 +833,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem, set_memory_decrypted((unsigned long)phys_to_virt(rmem->base), rmem->size >> PAGE_SHIFT); - swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false); - mem->force_bounce = true; + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, SWIOTLB_FORCE, + false); mem->for_alloc = true; rmem->priv = mem; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful
On Wed, Jun 01, 2022 at 07:57:43PM +0200, Christoph Hellwig wrote: > On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote: > > On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote: > > > Can you send me the full dmesg and the content of > > > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot? > > > > Sure thing, they are attached! If there is anything else I can provide > > or test, I am more than happy to do so. > > Nothing interesting. But the performance numbers almost look like > swiotlb=force got ignored before (even if I can't explain why). I was able to get my performance back with this diff but I don't know if this is a hack or a proper fix in the context of the series. diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index dfa1de89dc94..0bfb2fe3d8c5 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -276,7 +276,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags, __func__, alloc_size, PAGE_SIZE); swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); - mem->force_bounce = flags & SWIOTLB_FORCE; + mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE); if (flags & SWIOTLB_VERBOSE) swiotlb_print_info(); > Do you get a similar performance with the new kernel without > swiotlb=force as the old one with that argument by any chance? I'll see if I can test that, as I am not sure I have control over those cmdline arguments. Cheers, Nathan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful
On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote: > On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote: > > Can you send me the full dmesg and the content of > > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot? > > Sure thing, they are attached! If there is anything else I can provide > or test, I am more than happy to do so. Nothing interesting. But the performance numbers almost look like swiotlb=force got ignored before (even if I can't explain why). Do you get a similar performance with the new kernel without swiotlb=force as the old one with that argument by any chance? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful
On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote: > Can you send me the full dmesg and the content of > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot? Sure thing, they are attached! If there is anything else I can provide or test, I am more than happy to do so. Cheers, Nathan # cat /sys/kernel/debug/swiotlb/io_tlb_nslabs 32768 # dmesg [0.00] Linux version 5.18.0-rc3-microsoft-standard-WSL2-8-ga3e230926708 (nathan@dev-arch.thelio-3990X) (gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT_DYNAMIC Wed Jun 1 10:38:34 MST 2022 [0.00] Command line: initrd=\initrd.img panic=-1 nr_cpus=8 swiotlb=force earlycon=uart8250,io,0x3f8,115200 console=hvc0 debug pty.legacy_count=0 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format. [0.00] signal: max sigframe size: 1776 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009] usable [0.00] BIOS-e820: [mem 0x000e-0x000e0fff] reserved [0.00] BIOS-e820: [mem 0x0010-0x001f] ACPI data [0.00] BIOS-e820: [mem 0x0020-0xf7ff] usable [0.00] BIOS-e820: [mem 0x0001-0x000407ff] usable [0.00] earlycon: uart8250 at I/O port 0x3f8 (options '115200') [0.00] printk: bootconsole [uart8250] enabled [0.00] NX (Execute Disable) protection: active [0.00] DMI not present or invalid. [0.00] Hypervisor detected: Microsoft Hyper-V [0.00] Hyper-V: privilege flags low 0xae7f, high 0x3b8030, hints 0xc2c, misc 0xe0bed7b6 [0.00] Hyper-V: Host Build 10.0.22000.708-0-0 [0.00] Hyper-V: Nested features: 0x4a [0.00] Hyper-V: LAPIC Timer Frequency: 0x1e8480 [0.00] Hyper-V: Using hypercall for remote TLB flush [0.00] clocksource: hyperv_clocksource_tsc_page: mask: 0x max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns [0.05] tsc: Detected 3800.008 MHz processor [0.001901] e820: update [mem 0x-0x0fff] usable ==> reserved [0.004593] e820: remove [mem 0x000a-0x000f] usable [0.006806] last_pfn = 0x408000 max_arch_pfn = 0x4 [0.009042] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.011760] last_pfn = 0xf8000 max_arch_pfn = 0x4 [0.013959] Using GB pages for direct mapping [0.015749] RAMDISK: [mem 0x0371f000-0x03779fff] [0.017616] ACPI: Early table checksum verification disabled [0.019854] ACPI: RSDP 0x000E 24 (v02 VRTUAL) [0.022162] ACPI: XSDT 0x0010 44 (v01 VRTUAL MICROSFT 0001 MSFT 0001) [0.025624] ACPI: FACP 0x00101000 000114 (v06 VRTUAL MICROSFT 0001 MSFT 0001) [0.029022] ACPI: DSDT 0x001011B8 01E184 (v02 MSFTVM DSDT01 0001 MSFT 0500) [0.032413] ACPI: FACS 0x00101114 40 [0.034280] ACPI: OEM0 0x00101154 64 (v01 VRTUAL MICROSFT 0001 MSFT 0001) [0.037699] ACPI: SRAT 0x0011F33C 000330 (v02 VRTUAL MICROSFT 0001 MSFT 0001) [0.041089] ACPI: APIC 0x0011F66C 88 (v04 VRTUAL MICROSFT 0001 MSFT 0001) [0.044475] ACPI: Reserving FACP table memory at [mem 0x101000-0x101113] [0.047159] ACPI: Reserving DSDT table memory at [mem 0x1011b8-0x11f33b] [0.049905] ACPI: Reserving FACS table memory at [mem 0x101114-0x101153] [0.052693] ACPI: Reserving OEM0 table memory at [mem 0x101154-0x1011b7] [0.055404] ACPI: Reserving SRAT table memory at [mem 0x11f33c-0x11f66b] [0.058040] ACPI: Reserving APIC table memory at [mem 0x11f66c-0x11f6f3] [0.061078] Zone ranges: [0.062074] DMA [mem 0x1000-0x00ff] [0.066106] DMA32[mem 0x0100-0x] [0.068763] Normal [mem 0x0001-0x000407ff] [0.071235] Device empty [0.072384] Movable zone start for each node [0.074058] Early memory node ranges [0.075515] node 0: [mem 0x1000-0x0009] [0.077979] node 0: [mem 0x0020-0xf7ff] [0.080483] node 0: [mem 0x0001-0x000407ff] [0.082980] Initmem setup node 0 [mem 0x1000-0x000407ff] [0.085954] On node 0, zone DMA: 1 pages in unavailable ranges [0.085972] On node 0, zone DMA: 352
Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful
Can you send me the full dmesg and the content of /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot? Thanks! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful
Hi Christoph, On Mon, Apr 04, 2022 at 07:05:53AM +0200, Christoph Hellwig wrote: > Pass a bool to pass if swiotlb needs to be enabled based on the > addressing needs and replace the verbose argument with a set of > flags, including one to force enable bounce buffering. > > Note that this patch removes the possibility to force xen-swiotlb > use using swiotlb=force on the command line on x86 (arm and arm64 > never supported that), but this interface will be restored shortly. > > Signed-off-by: Christoph Hellwig I bisected a performance regression in WSL2 to this change as commit c6af2aa9ffc9 ("swiotlb: make the swiotlb_init interface more useful") in mainline (bisect log below). I initially noticed it because accessing the Windows filesystem through the /mnt/c mount is about 40x slower if I am doing my math right based on the benchmarks below. Before: $ uname -r; and hyperfine "ls -l /mnt/c/Users/natec/Downloads" 5.18.0-rc3-microsoft-standard-WSL2-8-ga3e230926708 Benchmark 1: ls -l /mnt/c/Users/natec/Downloads Time (mean ± σ): 564.5 ms ± 24.1 ms[User: 2.5 ms, System: 130.3 ms] Range (min … max): 510.2 ms … 588.0 ms10 runs After $ uname -r; and hyperfine "ls -l /mnt/c/Users/natec/Downloads" 5.18.0-rc3-microsoft-standard-WSL2-9-gc6af2aa9ffc9 Benchmark 1: ls -l /mnt/c/Users/natec/Downloads Time (mean ± σ): 23.282 s ± 1.220 s[User: 0.013 s, System: 0.101 s] Range (min … max): 21.793 s … 25.317 s10 runs I do see 'swiotlb=force' on the cmdline: $ cat /proc/cmdline initrd=\initrd.img panic=-1 nr_cpus=8 swiotlb=force earlycon=uart8250,io,0x3f8,115200 console=hvc0 debug pty.legacy_count=0 /mnt/c appears to be a 9p mount, not sure if that is relevant here: $ mount &| grep /mnt/c drvfs on /mnt/c type 9p (rw,noatime,dirsync,aname=drvfs;path=C:\;uid=1000;gid=1000;symlinkroot=/mnt/,mmap,access=client,msize=262144,trans=virtio) If there is any other information I can provide, please let me know. Cheers, Nathan # bad: [700170bf6b4d773e328fa54ebb70ba444007c702] Merge tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs # good: [4b0986a3613c92f4ec1bdc7f60ec66fea135991f] Linux 5.18 git bisect start '700170bf6b4d773e328fa54ebb70ba444007c702' 'v5.18' # good: [86c87bea6b42100c67418af690919c44de6ede6e] Merge tag 'devicetree-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux git bisect good 86c87bea6b42100c67418af690919c44de6ede6e # bad: [ae862183285cbb2ef9032770d98ffa9becffe9d5] Merge tag 'arm-dt-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc git bisect bad ae862183285cbb2ef9032770d98ffa9becffe9d5 # good: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm git bisect good 2518f226c60d8e04d18ba4295500a5b0b8ac7659 # bad: [babf0bb978e3c9fce6c4eba6b744c8754fd43d8e] Merge tag 'xfs-5.19-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux git bisect bad babf0bb978e3c9fce6c4eba6b744c8754fd43d8e # good: [beed983621fbdfd291e6e3a0cdc4d10517e60af8] ASoC: Intel: avs: Machine board registration git bisect good beed983621fbdfd291e6e3a0cdc4d10517e60af8 # good: [fbe86daca0ba878b04fa241b85e26e54d17d4229] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi git bisect good fbe86daca0ba878b04fa241b85e26e54d17d4229 # good: [166afc45ed5523298541fd0297f9ad585cc2708c] Merge tag 'reflink-speedups-5.19_2022-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.19-for-next git bisect good 166afc45ed5523298541fd0297f9ad585cc2708c # bad: [e375780b631a5fc2a61a3b4fa12429255361a31e] Merge tag 'fsnotify_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs git bisect bad e375780b631a5fc2a61a3b4fa12429255361a31e # bad: [4a37f3dd9a83186cb88d44808ab35b78375082c9] dma-direct: don't over-decrypt memory git bisect bad 4a37f3dd9a83186cb88d44808ab35b78375082c9 # bad: [742519538e6b07250c8085bbff4bd358bc03bf16] swiotlb: pass a gfp_mask argument to swiotlb_init_late git bisect bad 742519538e6b07250c8085bbff4bd358bc03bf16 # good: [9bbe7a7fc126e3d14fefa4b035854aba080926d9] arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region git bisect good 9bbe7a7fc126e3d14fefa4b035854aba080926d9 # good: [a3e230926708125205ffd06d3dc2175a8263ae7e] x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled git bisect good a3e230926708125205ffd06d3dc2175a8263ae7e # bad: [8ba2ed1be90fc210126f68186564707478552c95] swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction git bisect bad 8ba2ed1be90fc210126f68186564707478552c95 # bad: [c6af2aa9ffc9763826607bc2664ef3ea4475ed18] swiotlb: make the swiotlb_init interface more useful git bisect bad c6af2aa9ffc9763826607bc2664ef3ea4475ed18 # first bad commit: [c6af2aa9ffc9763826607bc2664ef3ea4475ed18] swiotlb: make the swiotlb_init interface more useful ___ iommu mailing
[PATCH v2] dma-debug: Make things less spammy under memory pressure
From: Rob Clark Limit the error msg to avoid flooding the console. If you have a lot of threads hitting this at once, they could have already gotten passed the dma_debug_disabled() check before they get to the point of allocation failure, resulting in quite a lot of this error message spamming the log. Use pr_err_once() to limit that. Signed-off-by: Rob Clark --- v2: Use pr_err_once() instead of ratelimited, and spiff out commit msg a bit. kernel/dma/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index f8ff598596b8..754e3456f017 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -564,7 +564,7 @@ static void add_dma_entry(struct dma_debug_entry *entry, unsigned long attrs) rc = active_cacheline_insert(entry); if (rc == -ENOMEM) { - pr_err("cacheline tracking ENOMEM, dma-debug disabled\n"); + pr_err_once("cacheline tracking ENOMEM, dma-debug disabled\n"); global_disable = true; } else if (rc == -EEXIST && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { err_printk(entry->dev, entry, -- 2.36.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs
On Wed, Jun 01, 2022 at 02:52:05PM +0100, Joao Martins wrote: > On 6/1/22 13:33, Jason Gunthorpe wrote: > > On Wed, Jun 01, 2022 at 01:18:52PM +0100, Joao Martins wrote: > > > >>> So having safe racy reading in the kernel is probably best, and so RCU > >>> would be good here too. > >> > >> Reading dirties ought to be similar to map/unmap but slightly simpler as > >> I supposedly don't need to care about the pte changing under the hood (or > >> so I initially thought). I was wrestling at some point if test-and-clear > >> was enough or whether I switch back cmpxchg to detect the pte has changed > >> and only mark dirty based on the old value[*]. The latter would align with > >> how map/unmap performs the pte updates. > > > > test-and-clear should be fine, but this all needs to be done under a > > RCU context while the page tables themsevles are freed by RCU. Then > > you can safely chase the page table pointers down to each level > > without fear of UAF. > > > > I was actually thinking more towards holding the same IOVA range lock to > align with the rest of map/unmap/demote/etc? All these IO page table > manip have all have the same performance requirements. IMHO ideally we would not make read dirty use the IOVA range lock because we want to read dirty big swaths of IOVA a time and it shouldn't be forced to chunk based on the arbitary area construction. But yes this could also be an option. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs
On 6/1/22 13:33, Jason Gunthorpe wrote: > On Wed, Jun 01, 2022 at 01:18:52PM +0100, Joao Martins wrote: > >>> So having safe racy reading in the kernel is probably best, and so RCU >>> would be good here too. >> >> Reading dirties ought to be similar to map/unmap but slightly simpler as >> I supposedly don't need to care about the pte changing under the hood (or >> so I initially thought). I was wrestling at some point if test-and-clear >> was enough or whether I switch back cmpxchg to detect the pte has changed >> and only mark dirty based on the old value[*]. The latter would align with >> how map/unmap performs the pte updates. > > test-and-clear should be fine, but this all needs to be done under a > RCU context while the page tables themsevles are freed by RCU. Then > you can safely chase the page table pointers down to each level > without fear of UAF. > I was actually thinking more towards holding the same IOVA range lock to align with the rest of map/unmap/demote/etc? All these IO page table manip have all have the same performance requirements. >> I am not sure yet on dynamic demote/promote of page sizes if it changes this. > > For this kind of primitive the caller must provide the locking, just > like map/unmap. > Ah OK. > Effectively you can consider the iommu_domain has having externally > provided range-locks over the IOVA space. map/unmap/demote/promote > must run serially over intersecting IOVA ranges. > > In terms of iommufd this means we always have to hold a lock related > to the area (which is the IOVA range) before issuing any iommu call on > the domain. /me nods ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V3 5/8] dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops
On 01.06.22 03:34, Stefano Stabellini wrote: Hello Stefano On Tue, 31 May 2022, Oleksandr Tyshchenko wrote: From: Oleksandr Tyshchenko The main purpose of this binding is to communicate Xen specific information using generic IOMMU device tree bindings (which is a good fit here) rather than introducing a custom property. Introduce Xen specific IOMMU for the virtualized device (e.g. virtio) to be used by Xen grant DMA-mapping layer in the subsequent commit. The reference to Xen specific IOMMU node using "iommus" property indicates that Xen grant mappings need to be enabled for the device, and it specifies the ID of the domain where the corresponding backend resides. The domid (domain ID) is used as an argument to the Xen grant mapping APIs. This is needed for the option to restrict memory access using Xen grant mappings to work which primary goal is to enable using virtio devices in Xen guests. Signed-off-by: Oleksandr Tyshchenko --- Changes RFC -> V1: - update commit subject/description and text in description - move to devicetree/bindings/arm/ Changes V1 -> V2: - update text in description - change the maintainer of the binding - fix validation issue - reference xen,dev-domid.yaml schema from virtio/mmio.yaml Change V2 -> V3: - Stefano already gave his Reviewed-by, I dropped it due to the changes (significant) - use generic IOMMU device tree bindings instead of custom property "xen,dev-domid" - change commit subject and description, was "dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops" --- .../devicetree/bindings/iommu/xen,grant-dma.yaml | 49 ++ 1 file changed, 49 insertions(+) create mode 100644 Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml diff --git a/Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml b/Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml new file mode 100644 index ..ab5765c --- /dev/null +++ b/Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml @@ -0,0 +1,49 @@ +# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iommu/xen,grant-dma.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Xen specific IOMMU for virtualized devices (e.g. virtio) + +maintainers: + - Stefano Stabellini + +description: + The reference to Xen specific IOMMU node using "iommus" property indicates + that Xen grant mappings need to be enabled for the device, and it specifies + the ID of the domain where the corresponding backend resides. + The binding is required to restrict memory access using Xen grant mappings. I think this is OK and in line with the discussion we had on the list. I propose the following wording instead: """ The Xen IOMMU represents the Xen grant table interface. Grant mappings are to be used with devices connected to the Xen IOMMU using the "iommus" property, which also specifies the ID of the backend domain. The binding is required to restrict memory access using Xen grant mappings. """ +properties: + compatible: +const: xen,grant-dma + + '#iommu-cells': +const: 1 +description: + Xen specific IOMMU is multiple-master IOMMU device. + The single cell describes the domid (domain ID) of the domain where + the backend is running. Here I would say: """ The single cell is the domid (domain ID) of the domain where the backend is running. """ With the two wording improvements: I am happy with proposed wording improvements, will update. Reviewed-by: Stefano Stabellini Thanks! +required: + - compatible + - "#iommu-cells" + +additionalProperties: false + +examples: + - | +xen_iommu { +compatible = "xen,grant-dma"; +#iommu-cells = <1>; +}; + +virtio@3000 { +compatible = "virtio,mmio"; +reg = <0x3000 0x100>; +interrupts = <41>; + +/* The backend is located in Xen domain with ID 1 */ +iommus = <_iommu 1>; +}; -- 2.7.4 ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- Regards, Oleksandr Tyshchenko ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs
On Wed, Jun 01, 2022 at 01:18:52PM +0100, Joao Martins wrote: > > So having safe racy reading in the kernel is probably best, and so RCU > > would be good here too. > > Reading dirties ought to be similar to map/unmap but slightly simpler as > I supposedly don't need to care about the pte changing under the hood (or > so I initially thought). I was wrestling at some point if test-and-clear > was enough or whether I switch back cmpxchg to detect the pte has changed > and only mark dirty based on the old value[*]. The latter would align with > how map/unmap performs the pte updates. test-and-clear should be fine, but this all needs to be done under a RCU context while the page tables themsevles are freed by RCU. Then you can safely chase the page table pointers down to each level without fear of UAF. > I am not sure yet on dynamic demote/promote of page sizes if it changes this. For this kind of primitive the caller must provide the locking, just like map/unmap. Effectively you can consider the iommu_domain has having externally provided range-locks over the IOVA space. map/unmap/demote/promote must run serially over intersecting IOVA ranges. In terms of iommufd this means we always have to hold a lock related to the area (which is the IOVA range) before issuing any iommu call on the domain. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs
On 6/1/22 00:10, Jason Gunthorpe wrote: > On Tue, May 31, 2022 at 10:22:32PM +0100, Robin Murphy wrote: >> There are only 3 instances where we'll free a table while the domain is >> live. The first is the one legitimate race condition, where two map requests >> targeting relatively nearby PTEs both go to fill in an intermediate level of >> table; whoever loses that race frees the table they allocated, but it was >> never visible to anyone else so that's definitely fine. The second is if >> we're mapping a block entry, and find that there's already a table entry >> there, wherein we assume the table must be empty, clear the entry, >> invalidate any walk caches, install the block entry, then free the orphaned >> table; since we're mapping the entire IOVA range covered by that table, >> there should be no other operations on that IOVA range attempting to walk >> the table at the same time, so it's fine. > > I saw these two in the Intel driver > >> The third is effectively the inverse, if we get a block-sized unmap >> but find a table entry rather than a block at that point (on the >> assumption that it's de-facto allowed for a single unmap to cover >> multiple adjacent mappings as long as it does so exactly); similarly >> we assume that the table must be full, and no other operations >> should be racing because we're unmapping its whole IOVA range, so we >> remove the table entry, invalidate, and free as before. > > Not sure I noticed this one though > > This all it all makes sense though. > I saw all three incantations in AMD iommu /I think/ AMD has sort of a replicated PTEs concept representing a power of 2 page size mapped in 'default' page sizes(4K, 2M, 1G), which we need to check every single one of them. Which upon writing the RFC I realized it's not really the most efficient thing to keep considering the IOMMU hardware doesn't guarantee the dirty bit is on every replicated pte. And maybe it's clobbering the fact we don't attempt to pick the best page-size for the IOVA mapping (like Intel), to instead have all powers of 2 page sizes allowed and IOMMU hw tries its best to cope. >> Although we don't have debug dumping for io-pgtable-arm, it's good to be >> thinking about this, since it's made me realise that dirty-tracking sweeps >> per that proposal might pose a similar kind of concern, so we might still >> need to harden these corners for the sake of that. > > Lets make sure Joao sees this.. > > It is interesting because we probably don't want the big latency > spikes that would come from using locking to block map/unmap while > dirty reading is happening - eg at the iommfd level. > Specially when we might be scanning big IOVA ranges. > From a consistency POV the only case that matters is unmap and unmap > should already be doing a dedicated dirty read directly prior to the > unmap (as per that other long thread) > /me nods, yes FWIW, I've already removed the unmap_read_dirty ops. > So having safe racy reading in the kernel is probably best, and so RCU > would be good here too. > Reading dirties ought to be similar to map/unmap but slightly simpler as I supposedly don't need to care about the pte changing under the hood (or so I initially thought). I was wrestling at some point if test-and-clear was enough or whether I switch back cmpxchg to detect the pte has changed and only mark dirty based on the old value[*]. The latter would align with how map/unmap performs the pte updates. [*] e.g. potentially the new entry hasn't been 'seen' by IOMMU walker or might be a smaller size then what got dirtied with iopte split or racing with a new map >> that somewhere I have some half-finished patches making io-pgtable-arm use >> the iommu_iotlb_gather freelist, so maybe I'll tackle both concerns at once >> (perhaps we might even be able to RCU-ify the freelist generically? I'll see >> how it goes when I get there). > > This is all very similar to how the mm's tlb gather stuff works, > especially on PPC which requires RCU protection of page tables instead > of the IPI trick. > > Currently the rcu_head and the lru overlap in the struct page. To do > this I'd suggest following what was done for slab - see ca1a46d6f506 > ("Merge tag 'slab-for-5.17'..) and create a 'struct page' sub-class > like 'struct slab' for use with iommu page tables and put the > list_head and rcu_head sequentially in the struct iommu_pt_page. > > Continue to use put_pages_list() just with the list_head in the new > struct not the lru. > > If we need it for dirty tracking then it makes sense to start > progressing parts at least.. > The suggestions here seem to apply to both map/unmap too, not just read_dirty() alone IIUC I am not sure yet on dynamic demote/promote of page sizes if it changes this. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] MAINTAINERS: refurbish SWIOTLB SUBSYSTEM sections after refactoring
On Wed, Jun 01, 2022 at 09:56:13AM +0200, Lukas Bulwahn wrote: > +F: arch/x86/kernel/pci-dma.c I think this file is better left for the x86 maintainers. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/12] iommu/vt-d: Check device list of domain in domain free path
On 2022/6/1 17:28, Tian, Kevin wrote: From: Lu Baolu Sent: Friday, May 27, 2022 2:30 PM When the IOMMU domain is about to be freed, it should not be set on any device. Instead of silently dealing with some bug cases, it's better to trigger a warning to report and fix any potential bugs at the first time. static void domain_exit(struct dmar_domain *domain) { - - /* Remove associated devices and clear attached or cached domains */ - domain_remove_dev_info(domain); + if (WARN_ON(!list_empty(>devices))) + return; warning is good but it doesn't mean the driver shouldn't deal with that situation to make it safer e.g. blocking DMA from all attached device... I have ever thought the same thing. :-) Blocking DMA from attached device should be done when setting blocking domain to the device. It should not be part of freeing a domain. Here, the caller asks the driver to free the domain, but the driver finds that something is wrong. Therefore, it warns and returns directly. The domain will still be there in use until the next set_domain(). Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/12] iommu/vt-d: Acquiring lock in pasid manipulation helpers
On 2022/6/1 17:18, Tian, Kevin wrote: From: Lu Baolu Sent: Friday, May 27, 2022 2:30 PM The iommu->lock is used to protect the per-IOMMU pasid directory table and pasid table. Move the spinlock acquisition/release into the helpers to make the code self-contained. Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian , with one nit - /* Caller must ensure PASID entry is not in use. */ - if (pasid_pte_is_present(pte)) - return -EBUSY; + spin_lock(>lock); + pte = get_non_present_pasid_entry(dev, pasid); + if (!pte) { + spin_unlock(>lock); + return -ENODEV; + } I don't think above is a good abstraction and it changes the error code for an present entry from -EBUSY to -ENODEV. Sure. I will roll it back to -EBUSY. I added this helper because the same code appears at least three times in this file. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 06/12] iommu/vt-d: Acquiring lock in domain ID allocation helpers
Hi Kevin, Thank you for the comments. On 2022/6/1 17:09, Tian, Kevin wrote: From: Lu Baolu Sent: Friday, May 27, 2022 2:30 PM The iommu->lock is used to protect the per-IOMMU domain ID resource. Move the spinlock acquisition/release into the helpers where domain IDs are allocated and freed. The device_domain_lock is irrelevant to domain ID resources, remove its assertion as well. while moving the lock you also replace spin_lock_irqsave() with spin_lock(). It'd be cleaner to just do movement here and then replace all _irqsave() in patch 8. Yeah, that will be clearer. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API
> From: Baolu Lu > Sent: Wednesday, June 1, 2022 5:37 PM > > On 2022/6/1 09:43, Tian, Kevin wrote: > >> From: Jacob Pan > >> Sent: Wednesday, June 1, 2022 1:30 AM > In both cases the pasid is stored in the attach data instead of the > domain. > > >> So during IOTLB flush for the domain, do we loop through the attach data? > > Yes and it's required. > > What does the attach data mean here? Do you mean group->pasid_array? any structure describing the attaching relationship from {device, pasid} to a domain > > Why not tracking the {device, pasid} info in the iommu driver when > setting domain to {device, pasid}? We have tracked device in a list when > setting a domain to device. > Yes, that tracking structure is the attach data. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/10] dmapool: improve scalability of dma_pool_free
On 2022-05-31 23:10, Tony Battersby wrote: On 5/31/22 17:54, Keith Busch wrote: On Tue, May 31, 2022 at 02:23:44PM -0400, Tony Battersby wrote: dma_pool_free() scales poorly when the pool contains many pages because pool_find_page() does a linear scan of all allocated pages. Improve its scalability by replacing the linear scan with a red-black tree lookup. In big O notation, this improves the algorithm from O(n^2) to O(n * log n). The improvement should say O(n) to O(log n), right? That would be the improvement for a single call to dma_pool_alloc or dma_pool_free, but I was going with the improvement for "n" calls instead, which is consistent with the improvement for the example in the cover letter for mpt3sas. I would have to look up the convention to be sure of the proper notation in a situation like this, but I usually think "inserting N items takes N^2 time"; to me it makes less sense to say "inserting 1 item takes N time", because the N seems to come out of nowhere. No, n represents the size of the set that the algorithm is inserting into (or removing from, searching within, etc.). Hence why constant time is represented by O(1), i.e. not involving the variable at all. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API
On 2022/6/1 09:43, Tian, Kevin wrote: From: Jacob Pan Sent: Wednesday, June 1, 2022 1:30 AM In both cases the pasid is stored in the attach data instead of the domain. So during IOTLB flush for the domain, do we loop through the attach data? Yes and it's required. What does the attach data mean here? Do you mean group->pasid_array? Why not tracking the {device, pasid} info in the iommu driver when setting domain to {device, pasid}? We have tracked device in a list when setting a domain to device. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 09/12] iommu/vt-d: Check device list of domain in domain free path
> From: Lu Baolu > Sent: Friday, May 27, 2022 2:30 PM > > When the IOMMU domain is about to be freed, it should not be set on any > device. Instead of silently dealing with some bug cases, it's better to > trigger a warning to report and fix any potential bugs at the first time. > > static void domain_exit(struct dmar_domain *domain) > { > - > - /* Remove associated devices and clear attached or cached domains > */ > - domain_remove_dev_info(domain); > + if (WARN_ON(!list_empty(>devices))) > + return; > warning is good but it doesn't mean the driver shouldn't deal with that situation to make it safer e.g. blocking DMA from all attached device... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 07/12] iommu/vt-d: Acquiring lock in pasid manipulation helpers
> From: Lu Baolu > Sent: Friday, May 27, 2022 2:30 PM > > The iommu->lock is used to protect the per-IOMMU pasid directory table > and pasid table. Move the spinlock acquisition/release into the helpers > to make the code self-contained. > > Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian , with one nit > > - /* Caller must ensure PASID entry is not in use. */ > - if (pasid_pte_is_present(pte)) > - return -EBUSY; > + spin_lock(>lock); > + pte = get_non_present_pasid_entry(dev, pasid); > + if (!pte) { > + spin_unlock(>lock); > + return -ENODEV; > + } I don't think above is a good abstraction and it changes the error code for an present entry from -EBUSY to -ENODEV. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 06/12] iommu/vt-d: Acquiring lock in domain ID allocation helpers
> From: Lu Baolu > Sent: Friday, May 27, 2022 2:30 PM > > The iommu->lock is used to protect the per-IOMMU domain ID resource. > Move the spinlock acquisition/release into the helpers where domain > IDs are allocated and freed. The device_domain_lock is irrelevant to > domain ID resources, remove its assertion as well. while moving the lock you also replace spin_lock_irqsave() with spin_lock(). It'd be cleaner to just do movement here and then replace all _irqsave() in patch 8. > > Signed-off-by: Lu Baolu > --- > drivers/iommu/intel/iommu.c | 25 + > 1 file changed, 9 insertions(+), 16 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 2d5f02b85de8..0da937ce0534 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -1774,16 +1774,13 @@ static struct dmar_domain > *alloc_domain(unsigned int type) > return domain; > } > > -/* Must be called with iommu->lock */ > static int domain_attach_iommu(struct dmar_domain *domain, > struct intel_iommu *iommu) > { > unsigned long ndomains; > - int num; > - > - assert_spin_locked(_domain_lock); > - assert_spin_locked(>lock); > + int num, ret = 0; > > + spin_lock(>lock); > domain->iommu_refcnt[iommu->seq_id] += 1; > if (domain->iommu_refcnt[iommu->seq_id] == 1) { > ndomains = cap_ndoms(iommu->cap); > @@ -1792,7 +1789,8 @@ static int domain_attach_iommu(struct > dmar_domain *domain, > if (num >= ndomains) { > pr_err("%s: No free domain ids\n", iommu->name); > domain->iommu_refcnt[iommu->seq_id] -= 1; > - return -ENOSPC; > + ret = -ENOSPC; > + goto out_unlock; > } > > set_bit(num, iommu->domain_ids); > @@ -1801,7 +1799,9 @@ static int domain_attach_iommu(struct > dmar_domain *domain, > domain_update_iommu_cap(domain); > } > > - return 0; > +out_unlock: > + spin_unlock(>lock); > + return ret; > } > > static void domain_detach_iommu(struct dmar_domain *domain, > @@ -1809,9 +1809,7 @@ static void domain_detach_iommu(struct > dmar_domain *domain, > { > int num; > > - assert_spin_locked(_domain_lock); > - assert_spin_locked(>lock); > - > + spin_lock(>lock); > domain->iommu_refcnt[iommu->seq_id] -= 1; > if (domain->iommu_refcnt[iommu->seq_id] == 0) { > num = domain->iommu_did[iommu->seq_id]; > @@ -1819,6 +1817,7 @@ static void domain_detach_iommu(struct > dmar_domain *domain, > domain_update_iommu_cap(domain); > domain->iommu_did[iommu->seq_id] = 0; > } > + spin_unlock(>lock); > } > > static inline int guestwidth_to_adjustwidth(int gaw) > @@ -2471,9 +2470,7 @@ static int domain_add_dev_info(struct > dmar_domain *domain, struct device *dev) > > spin_lock_irqsave(_domain_lock, flags); > info->domain = domain; > - spin_lock(>lock); > ret = domain_attach_iommu(domain, iommu); > - spin_unlock(>lock); > if (ret) { > spin_unlock_irqrestore(_domain_lock, flags); > return ret; > @@ -4158,7 +4155,6 @@ static void __dmar_remove_one_dev_info(struct > device_domain_info *info) > { > struct dmar_domain *domain; > struct intel_iommu *iommu; > - unsigned long flags; > > assert_spin_locked(_domain_lock); > > @@ -4179,10 +4175,7 @@ static void __dmar_remove_one_dev_info(struct > device_domain_info *info) > } > > list_del(>link); > - > - spin_lock_irqsave(>lock, flags); > domain_detach_iommu(domain, iommu); > - spin_unlock_irqrestore(>lock, flags); > } > > static void dmar_remove_one_dev_info(struct device *dev) > -- > 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 05/12] iommu/vt-d: Unncessary spinlock for root table alloc and free
> From: Lu Baolu > Sent: Friday, May 27, 2022 2:30 PM > > The IOMMU root table is allocated and freed in the IOMMU initialization > code in static boot or hot-plug paths. There's no need for a spinlock. s/hot-plug/hot-remove/ > > Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian > --- > drivers/iommu/intel/iommu.c | 18 +- > 1 file changed, 5 insertions(+), 13 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index bbdd3417a1b1..2d5f02b85de8 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -809,14 +809,12 @@ static int device_context_mapped(struct > intel_iommu *iommu, u8 bus, u8 devfn) > > static void free_context_table(struct intel_iommu *iommu) > { > - int i; > - unsigned long flags; > struct context_entry *context; > + int i; > + > + if (!iommu->root_entry) > + return; > > - spin_lock_irqsave(>lock, flags); > - if (!iommu->root_entry) { > - goto out; > - } > for (i = 0; i < ROOT_ENTRY_NR; i++) { > context = iommu_context_addr(iommu, i, 0, 0); > if (context) > @@ -828,12 +826,10 @@ static void free_context_table(struct intel_iommu > *iommu) > context = iommu_context_addr(iommu, i, 0x80, 0); > if (context) > free_pgtable_page(context); > - > } > + > free_pgtable_page(iommu->root_entry); > iommu->root_entry = NULL; > -out: > - spin_unlock_irqrestore(>lock, flags); > } > > #ifdef CONFIG_DMAR_DEBUG > @@ -1232,7 +1228,6 @@ static void domain_unmap(struct dmar_domain > *domain, unsigned long start_pfn, > static int iommu_alloc_root_entry(struct intel_iommu *iommu) > { > struct root_entry *root; > - unsigned long flags; > > root = (struct root_entry *)alloc_pgtable_page(iommu->node); > if (!root) { > @@ -1242,10 +1237,7 @@ static int iommu_alloc_root_entry(struct > intel_iommu *iommu) > } > > __iommu_flush_cache(iommu, root, ROOT_SIZE); > - > - spin_lock_irqsave(>lock, flags); > iommu->root_entry = root; > - spin_unlock_irqrestore(>lock, flags); > > return 0; > } > -- > 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 04/12] iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()
> From: Lu Baolu > Sent: Friday, May 27, 2022 2:30 PM > > Use pci_get_domain_bus_and_slot() instead of searching the global list > to retrieve the pci device pointer. This removes device_domain_list > global list as there are no consumers anymore. > > Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian > --- > drivers/iommu/intel/iommu.h | 1 - > drivers/iommu/intel/iommu.c | 33 ++--- > 2 files changed, 6 insertions(+), 28 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h > index 2f4a5b9509c0..6724703d573b 100644 > --- a/drivers/iommu/intel/iommu.h > +++ b/drivers/iommu/intel/iommu.h > @@ -609,7 +609,6 @@ struct intel_iommu { > /* PCI domain-device relationship */ > struct device_domain_info { > struct list_head link; /* link to domain siblings */ > - struct list_head global; /* link to global list */ > struct list_head table; /* link to pasid table */ > u32 segment;/* PCI segment number */ > u8 bus; /* PCI bus number */ > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 25d4c5200526..bbdd3417a1b1 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -131,8 +131,6 @@ static struct intel_iommu **g_iommus; > > static void __init check_tylersburg_isoch(void); > static int rwbf_quirk; > -static inline struct device_domain_info * > -dmar_search_domain_by_dev_info(int segment, int bus, int devfn); > > /* > * set to 1 to panic kernel if can't successfully enable VT-d > @@ -315,7 +313,6 @@ static int iommu_skip_te_disable; > #define IDENTMAP_AZALIA 4 > > static DEFINE_SPINLOCK(device_domain_lock); > -static LIST_HEAD(device_domain_list); > const struct iommu_ops intel_iommu_ops; > > static bool translation_pre_enabled(struct intel_iommu *iommu) > @@ -845,9 +842,14 @@ static void pgtable_walk(struct intel_iommu > *iommu, unsigned long pfn, u8 bus, u > struct device_domain_info *info; > struct dma_pte *parent, *pte; > struct dmar_domain *domain; > + struct pci_dev *pdev; > int offset, level; > > - info = dmar_search_domain_by_dev_info(iommu->segment, bus, > devfn); > + pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn); > + if (!pdev) > + return; > + > + info = dev_iommu_priv_get(>dev); > if (!info || !info->domain) { > pr_info("device [%02x:%02x.%d] not probed\n", > bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); > @@ -2348,19 +2350,6 @@ static void domain_remove_dev_info(struct > dmar_domain *domain) > spin_unlock_irqrestore(_domain_lock, flags); > } > > -static inline struct device_domain_info * > -dmar_search_domain_by_dev_info(int segment, int bus, int devfn) > -{ > - struct device_domain_info *info; > - > - list_for_each_entry(info, _domain_list, global) > - if (info->segment == segment && info->bus == bus && > - info->devfn == devfn) > - return info; > - > - return NULL; > -} > - > static int domain_setup_first_level(struct intel_iommu *iommu, > struct dmar_domain *domain, > struct device *dev, > @@ -4564,7 +4553,6 @@ static struct iommu_device > *intel_iommu_probe_device(struct device *dev) > struct pci_dev *pdev = dev_is_pci(dev) ? to_pci_dev(dev) : NULL; > struct device_domain_info *info; > struct intel_iommu *iommu; > - unsigned long flags; > u8 bus, devfn; > > iommu = device_to_iommu(dev, , ); > @@ -4607,10 +4595,7 @@ static struct iommu_device > *intel_iommu_probe_device(struct device *dev) > } > } > > - spin_lock_irqsave(_domain_lock, flags); > - list_add(>global, _domain_list); > dev_iommu_priv_set(dev, info); > - spin_unlock_irqrestore(_domain_lock, flags); > > return >iommu; > } > @@ -4618,15 +4603,9 @@ static struct iommu_device > *intel_iommu_probe_device(struct device *dev) > static void intel_iommu_release_device(struct device *dev) > { > struct device_domain_info *info = dev_iommu_priv_get(dev); > - unsigned long flags; > > dmar_remove_one_dev_info(dev); > - > - spin_lock_irqsave(_domain_lock, flags); > dev_iommu_priv_set(dev, NULL); > - list_del(>global); > - spin_unlock_irqrestore(_domain_lock, flags); > - > kfree(info); > set_dma_ops(dev, NULL); > } > -- > 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 02/12] iommu/vt-d: Remove for_each_device_domain()
> From: Lu Baolu > Sent: Friday, May 27, 2022 2:30 PM > > The per-device device_domain_info data could be retrieved from the > device itself. There's no need to search a global list. > > Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian > --- > drivers/iommu/intel/iommu.h | 2 -- > drivers/iommu/intel/iommu.c | 25 - > drivers/iommu/intel/pasid.c | 37 +++-- > 3 files changed, 11 insertions(+), 53 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h > index 8a6d64d726c0..2f4a5b9509c0 100644 > --- a/drivers/iommu/intel/iommu.h > +++ b/drivers/iommu/intel/iommu.h > @@ -727,8 +727,6 @@ extern int dmar_ir_support(void); > void *alloc_pgtable_page(int node); > void free_pgtable_page(void *vaddr); > struct intel_iommu *domain_get_iommu(struct dmar_domain *domain); > -int for_each_device_domain(int (*fn)(struct device_domain_info *info, > - void *data), void *data); > void iommu_flush_write_buffer(struct intel_iommu *iommu); > int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device > *dev); > struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 > *devfn); > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index cacae8bdaa65..6549b09d7f32 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -316,31 +316,6 @@ static int iommu_skip_te_disable; > > static DEFINE_SPINLOCK(device_domain_lock); > static LIST_HEAD(device_domain_list); > - > -/* > - * Iterate over elements in device_domain_list and call the specified > - * callback @fn against each element. > - */ > -int for_each_device_domain(int (*fn)(struct device_domain_info *info, > - void *data), void *data) > -{ > - int ret = 0; > - unsigned long flags; > - struct device_domain_info *info; > - > - spin_lock_irqsave(_domain_lock, flags); > - list_for_each_entry(info, _domain_list, global) { > - ret = fn(info, data); > - if (ret) { > - spin_unlock_irqrestore(_domain_lock, flags); > - return ret; > - } > - } > - spin_unlock_irqrestore(_domain_lock, flags); > - > - return 0; > -} > - > const struct iommu_ops intel_iommu_ops; > > static bool translation_pre_enabled(struct intel_iommu *iommu) > diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c > index b2ac5869286f..0627d6465f25 100644 > --- a/drivers/iommu/intel/pasid.c > +++ b/drivers/iommu/intel/pasid.c > @@ -103,36 +103,20 @@ device_detach_pasid_table(struct > device_domain_info *info, > } > > struct pasid_table_opaque { > - struct pasid_table **pasid_table; > - int segment; > - int bus; > - int devfn; > + struct pasid_table *pasid_table; > }; > > -static int search_pasid_table(struct device_domain_info *info, void *opaque) > -{ > - struct pasid_table_opaque *data = opaque; > - > - if (info->iommu->segment == data->segment && > - info->bus == data->bus && > - info->devfn == data->devfn && > - info->pasid_table) { > - *data->pasid_table = info->pasid_table; > - return 1; > - } > - > - return 0; > -} > - > static int get_alias_pasid_table(struct pci_dev *pdev, u16 alias, void > *opaque) > { > struct pasid_table_opaque *data = opaque; > + struct device_domain_info *info; > > - data->segment = pci_domain_nr(pdev->bus); > - data->bus = PCI_BUS_NUM(alias); > - data->devfn = alias & 0xff; > + info = dev_iommu_priv_get(>dev); > + if (!info || !info->pasid_table) > + return 0; > > - return for_each_device_domain(_pasid_table, data); > + data->pasid_table = info->pasid_table; > + return 1; > } > > /* > @@ -141,9 +125,9 @@ static int get_alias_pasid_table(struct pci_dev *pdev, > u16 alias, void *opaque) > */ > int intel_pasid_alloc_table(struct device *dev) > { > + struct pasid_table_opaque data = { NULL }; > struct device_domain_info *info; > struct pasid_table *pasid_table; > - struct pasid_table_opaque data; > struct page *pages; > u32 max_pasid = 0; > int ret, order; > @@ -155,11 +139,12 @@ int intel_pasid_alloc_table(struct device *dev) > return -EINVAL; > > /* DMA alias device already has a pasid table, use it: */ > - data.pasid_table = _table; > ret = pci_for_each_dma_alias(to_pci_dev(dev), >_alias_pasid_table, ); > - if (ret) > + if (ret) { > + pasid_table = data.pasid_table; > goto attach_out; > + } > > pasid_table = kzalloc(sizeof(*pasid_table), GFP_KERNEL); > if (!pasid_table) > -- > 2.25.1 ___ iommu mailing list
RE: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs
> From: Jason Gunthorpe > Sent: Wednesday, June 1, 2022 7:11 AM > > On Tue, May 31, 2022 at 10:22:32PM +0100, Robin Murphy wrote: > > > There are only 3 instances where we'll free a table while the domain is > > live. The first is the one legitimate race condition, where two map requests > > targeting relatively nearby PTEs both go to fill in an intermediate level of > > table; whoever loses that race frees the table they allocated, but it was > > never visible to anyone else so that's definitely fine. The second is if > > we're mapping a block entry, and find that there's already a table entry > > there, wherein we assume the table must be empty, clear the entry, > > invalidate any walk caches, install the block entry, then free the orphaned > > table; since we're mapping the entire IOVA range covered by that table, > > there should be no other operations on that IOVA range attempting to walk > > the table at the same time, so it's fine. > > I saw these two in the Intel driver > > > The third is effectively the inverse, if we get a block-sized unmap > > but find a table entry rather than a block at that point (on the > > assumption that it's de-facto allowed for a single unmap to cover > > multiple adjacent mappings as long as it does so exactly); similarly > > we assume that the table must be full, and no other operations > > should be racing because we're unmapping its whole IOVA range, so we > > remove the table entry, invalidate, and free as before. > > Not sure I noticed this one though > > This all it all makes sense though. Intel driver also does this. See dma_pte_clear_level(): /* If range covers entire pagetable, free it */ if (start_pfn <= level_pfn && last_pfn >= level_pfn + level_size(level) - 1) { ... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] MAINTAINERS: refurbish SWIOTLB SUBSYSTEM sections after refactoring
On 01.06.22 09:56, Lukas Bulwahn wrote: Commit 78013eaadf69 ("x86: remove the IOMMU table infrastructure") refactored the generic swiotlb/swiotlb-xen setup into pci-dma.c, but misses to adjust MAINTAINERS. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about broken references. Update the SWIOTLB SUBSYSTEM to include arch/x86/kernel/pci-dma.c, which contains the swiotlb setup now and drop the file pattern that does not match any files. Further, update the XEN SWIOTLB SUBSYSTEM to include all swiotlb-xen headers and replace the pattern in drivers with the specific one file that matches this pattern. Signed-off-by: Lukas Bulwahn Acked-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] MAINTAINERS: refurbish SWIOTLB SUBSYSTEM sections after refactoring
Commit 78013eaadf69 ("x86: remove the IOMMU table infrastructure") refactored the generic swiotlb/swiotlb-xen setup into pci-dma.c, but misses to adjust MAINTAINERS. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about broken references. Update the SWIOTLB SUBSYSTEM to include arch/x86/kernel/pci-dma.c, which contains the swiotlb setup now and drop the file pattern that does not match any files. Further, update the XEN SWIOTLB SUBSYSTEM to include all swiotlb-xen headers and replace the pattern in drivers with the specific one file that matches this pattern. Signed-off-by: Lukas Bulwahn --- Christoph, please pick this minor non-urgent clean-up patch for swiotlb. MAINTAINERS | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index d5ea4ef223f8..cc12a3aaad45 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -19166,7 +19166,7 @@ L: iommu@lists.linux-foundation.org S: Supported W: http://git.infradead.org/users/hch/dma-mapping.git T: git git://git.infradead.org/users/hch/dma-mapping.git -F: arch/*/kernel/pci-swiotlb.c +F: arch/x86/kernel/pci-dma.c F: include/linux/swiotlb.h F: kernel/dma/swiotlb.c @@ -21831,8 +21831,10 @@ M: Stefano Stabellini L: xen-de...@lists.xenproject.org (moderated for non-subscribers) L: iommu@lists.linux-foundation.org S: Supported -F: arch/x86/xen/*swiotlb* -F: drivers/xen/*swiotlb* +F: arch/*/include/asm/xen/swiotlb-xen.h +F: drivers/xen/swiotlb-xen.c +F: include/xen/arm/swiotlb-xen.h +F: include/xen/swiotlb-xen.h XFS FILESYSTEM C: irc://irc.oftc.net/xfs -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 9/9] driver core: Delete driver_deferred_probe_check_state()
The function is no longer used. So delete it. Signed-off-by: Saravana Kannan --- drivers/base/dd.c | 30 -- include/linux/device/driver.h | 1 - 2 files changed, 31 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 335e71d3a618..e600dd2afc35 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -274,42 +274,12 @@ static int __init deferred_probe_timeout_setup(char *str) } __setup("deferred_probe_timeout=", deferred_probe_timeout_setup); -/** - * driver_deferred_probe_check_state() - Check deferred probe state - * @dev: device to check - * - * Return: - * * -ENODEV if initcalls have completed and modules are disabled. - * * -ETIMEDOUT if the deferred probe timeout was set and has expired - * and modules are enabled. - * * -EPROBE_DEFER in other cases. - * - * Drivers or subsystems can opt-in to calling this function instead of directly - * returning -EPROBE_DEFER. - */ -int driver_deferred_probe_check_state(struct device *dev) -{ - if (!IS_ENABLED(CONFIG_MODULES) && initcalls_done) { - dev_warn(dev, "ignoring dependency for device, assuming no driver\n"); - return -ENODEV; - } - - if (!driver_deferred_probe_timeout && initcalls_done) { - dev_warn(dev, "deferred probe timeout, ignoring dependency\n"); - return -ETIMEDOUT; - } - - return -EPROBE_DEFER; -} -EXPORT_SYMBOL_GPL(driver_deferred_probe_check_state); - static void deferred_probe_timeout_work_func(struct work_struct *work) { struct device_private *p; fw_devlink_drivers_done(); - driver_deferred_probe_timeout = 0; driver_deferred_probe_trigger(); flush_work(_probe_work); diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index 2114d65b862f..7acaabde5396 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -242,7 +242,6 @@ driver_find_device_by_acpi_dev(struct device_driver *drv, const void *adev) extern int driver_deferred_probe_timeout; void driver_deferred_probe_add(struct device *dev); -int driver_deferred_probe_check_state(struct device *dev); void driver_init(void); /** -- 2.36.1.255.ge46751e96f-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 8/9] iommu/of: Delete usage of driver_deferred_probe_check_state()
Now that fw_devlink=on and fw_devlink.strict=1 by default and fw_devlink supports iommu DT properties, the execution will never get to the point where driver_deferred_probe_check_state() is called before the supplier has probed successfully or before deferred probe timeout has expired. So, delete the call and replace it with -ENODEV. Signed-off-by: Saravana Kannan --- drivers/iommu/of_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index 5696314ae69e..41f4eb005219 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -40,7 +40,7 @@ static int of_iommu_xlate(struct device *dev, * a proper probe-ordering dependency mechanism in future. */ if (!ops) - return driver_deferred_probe_check_state(dev); + return -ENODEV; if (!try_module_get(ops->owner)) return -ENODEV; -- 2.36.1.255.ge46751e96f-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
Now that deferred_probe_timeout is non-zero by default, fw_devlink will never permanently block the probing of devices. It'll try its best to probe the devices in the right order and then finally let devices probe even if their suppliers don't have any drivers. Signed-off-by: Saravana Kannan --- drivers/base/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 61fdfe99b348..977b379a495b 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -1613,7 +1613,7 @@ static int __init fw_devlink_setup(char *arg) } early_param("fw_devlink", fw_devlink_setup); -static bool fw_devlink_strict; +static bool fw_devlink_strict = true; static int __init fw_devlink_strict_setup(char *arg) { return strtobool(arg, _devlink_strict); -- 2.36.1.255.ge46751e96f-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0."
This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61. Let's take another shot at getting deferred_probe_timeout=10 to work. Signed-off-by: Saravana Kannan --- drivers/base/dd.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 4a55fbb7e0da..335e71d3a618 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void *data) } DEFINE_SHOW_ATTRIBUTE(deferred_devs); +#ifdef CONFIG_MODULES +int driver_deferred_probe_timeout = 10; +#else int driver_deferred_probe_timeout; +#endif + EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout); static int __init deferred_probe_timeout_setup(char *str) -- 2.36.1.255.ge46751e96f-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 5/9] net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
If there are network devices that could probe without some of their suppliers probing and those network devices are needed to mount a network rootfs, then fw_devlink=on might break that usecase by blocking the network devices from probing by the time IP auto config starts. So, if no network devices are available when IP auto config is enabled and we have a network rootfs, make sure fw_devlink doesn't block the probing of any device that has a driver and then retry finding a network device. Signed-off-by: Saravana Kannan --- net/ipv4/ipconfig.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c index 9d41d5d5cd1e..2342debd7066 100644 --- a/net/ipv4/ipconfig.c +++ b/net/ipv4/ipconfig.c @@ -1434,6 +1434,7 @@ __be32 __init root_nfs_parse_addr(char *name) static int __init wait_for_devices(void) { int i; + bool try_init_devs = true; for (i = 0; i < DEVICE_WAIT_MAX; i++) { struct net_device *dev; @@ -1452,6 +1453,11 @@ static int __init wait_for_devices(void) rtnl_unlock(); if (found) return 0; + if (try_init_devs && + (ROOT_DEV == Root_NFS || ROOT_DEV == Root_CIFS)) { + try_init_devs = false; + wait_for_init_devices_probe(); + } ssleep(1); } return -ENODEV; -- 2.36.1.255.ge46751e96f-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/9] driver core: Add wait_for_init_devices_probe helper function
Some devices might need to be probed and bound successfully before the kernel boot sequence can finish and move on to init/userspace. For example, a network interface might need to be bound to be able to mount a NFS rootfs. With fw_devlink=on by default, some of these devices might be blocked from probing because they are waiting on a optional supplier that doesn't have a driver. While fw_devlink will eventually identify such devices and unblock the probing automatically, it might be too late by the time it unblocks the probing of devices. For example, the IP4 autoconfig might timeout before fw_devlink unblocks probing of the network interface. This function is available to temporarily try and probe all devices that have a driver even if some of their suppliers haven't been added or don't have drivers. The drivers can then decide which of the suppliers are optional vs mandatory and probe the device if possible. By the time this function returns, all such "best effort" probes are guaranteed to be completed. If a device successfully probes in this mode, we delete all fw_devlink discovered dependencies of that device where the supplier hasn't yet probed successfully because they have to be optional dependencies. This also means that some devices that aren't needed for init and could have waited for their optional supplier to probe (when the supplier's module is loaded later on) would end up probing prematurely with limited functionality. So call this function only when boot would fail without it. Signed-off-by: Saravana Kannan --- drivers/base/base.h | 1 + drivers/base/core.c | 100 -- drivers/base/dd.c | 19 +-- include/linux/device/driver.h | 1 + 4 files changed, 110 insertions(+), 11 deletions(-) diff --git a/drivers/base/base.h b/drivers/base/base.h index ab71403d102f..b3a43a164dcd 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -160,6 +160,7 @@ extern int devres_release_all(struct device *dev); extern void device_block_probing(void); extern void device_unblock_probing(void); extern void deferred_probe_extend_timeout(void); +extern void driver_deferred_probe_trigger(void); /* /sys/devices directory */ extern struct kset *devices_kset; diff --git a/drivers/base/core.c b/drivers/base/core.c index 7cd789c4985d..61fdfe99b348 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -54,6 +54,7 @@ static unsigned int defer_sync_state_count = 1; static DEFINE_MUTEX(fwnode_link_lock); static bool fw_devlink_is_permissive(void); static bool fw_devlink_drv_reg_done; +static bool fw_devlink_best_effort; /** * fwnode_link_add - Create a link between two fwnode_handles. @@ -965,6 +966,11 @@ static void device_links_missing_supplier(struct device *dev) } } +static bool dev_is_best_effort(struct device *dev) +{ + return fw_devlink_best_effort && dev->can_match; +} + /** * device_links_check_suppliers - Check presence of supplier drivers. * @dev: Consumer device. @@ -984,7 +990,7 @@ static void device_links_missing_supplier(struct device *dev) int device_links_check_suppliers(struct device *dev) { struct device_link *link; - int ret = 0; + int ret = 0, fwnode_ret = 0; struct fwnode_handle *sup_fw; /* @@ -997,12 +1003,17 @@ int device_links_check_suppliers(struct device *dev) sup_fw = list_first_entry(>fwnode->suppliers, struct fwnode_link, c_hook)->supplier; - dev_err_probe(dev, -EPROBE_DEFER, "wait for supplier %pfwP\n", - sup_fw); - mutex_unlock(_link_lock); - return -EPROBE_DEFER; + if (!dev_is_best_effort(dev)) { + fwnode_ret = -EPROBE_DEFER; + dev_err_probe(dev, -EPROBE_DEFER, + "wait for supplier %pfwP\n", sup_fw); + } else { + fwnode_ret = -EAGAIN; + } } mutex_unlock(_link_lock); + if (fwnode_ret == -EPROBE_DEFER) + return fwnode_ret; device_links_write_lock(); @@ -1012,6 +1023,14 @@ int device_links_check_suppliers(struct device *dev) if (link->status != DL_STATE_AVAILABLE && !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) { + + if (dev_is_best_effort(dev) && + link->flags & DL_FLAG_INFERRED && + !link->supplier->can_match) { + ret = -EAGAIN; + continue; + } + device_links_missing_supplier(dev); dev_err_probe(dev, -EPROBE_DEFER, "supplier %s not ready\n", @@ -1024,7 +1043,8 @@ int
[PATCH v2 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()
Now that fw_devlink=on by default and fw_devlink supports interrupt properties, the execution will never get to the point where driver_deferred_probe_check_state() is called before the supplier has probed successfully or before deferred probe timeout has expired. So, delete the call and replace it with -ENODEV. Signed-off-by: Saravana Kannan --- drivers/net/mdio/fwnode_mdio.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c index 1c1584fca632..3e79c2c51929 100644 --- a/drivers/net/mdio/fwnode_mdio.c +++ b/drivers/net/mdio/fwnode_mdio.c @@ -47,9 +47,7 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio, * just fall back to poll mode */ if (rc == -EPROBE_DEFER) - rc = driver_deferred_probe_check_state(>mdio.dev); - if (rc == -EPROBE_DEFER) - return rc; + rc = -ENODEV; if (rc > 0) { phy->irq = rc; -- 2.36.1.255.ge46751e96f-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/9] pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
Now that fw_devlink=on by default and fw_devlink supports "pinctrl-[0-8]" property, the execution will never get to the point where driver_deferred_probe_check_state() is called before the supplier has probed successfully or before deferred probe timeout has expired. So, delete the call and replace it with -ENODEV. Signed-off-by: Saravana Kannan --- drivers/pinctrl/devicetree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c index 3fb238714718..ef898ee8ca6b 100644 --- a/drivers/pinctrl/devicetree.c +++ b/drivers/pinctrl/devicetree.c @@ -129,7 +129,7 @@ static int dt_to_map_one_config(struct pinctrl *p, np_pctldev = of_get_next_parent(np_pctldev); if (!np_pctldev || of_node_is_root(np_pctldev)) { of_node_put(np_pctldev); - ret = driver_deferred_probe_check_state(p->dev); + ret = -ENODEV; /* keep deferring if modules are enabled */ if (IS_ENABLED(CONFIG_MODULES) && !allow_default && ret < 0) ret = -EPROBE_DEFER; -- 2.36.1.255.ge46751e96f-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/9] deferred_probe_timeout logic clean up
This series is based on linux-next + these 2 small patches applies on top: https://lore.kernel.org/lkml/20220526034609.480766-1-sarava...@google.com/ A lot of the deferred_probe_timeout logic is redundant with fw_devlink=on. Also, enabling deferred_probe_timeout by default breaks a few cases. This series tries to delete the redundant logic, simplify the frameworks that use driver_deferred_probe_check_state(), enable deferred_probe_timeout=10 by default, and fixes the nfsroot failure case. The overall idea of this series is to replace the global behavior of driver_deferred_probe_check_state() where all devices give up waiting on supplier at the same time with a more granular behavior: 1. Devices with all their suppliers successfully probed by late_initcall probe as usual and avoid unnecessary deferred probe attempts. 2. At or after late_initcall, in cases where boot would break because of fw_devlink=on being strict about the ordering, we a. Temporarily relax the enforcement to probe any unprobed devices that can probe successfully in the current state of the system. For example, when we boot with a NFS rootfs and no network device has probed. b. Go back to enforcing the ordering for any devices that haven't probed. 3. After deferred probe timeout expires, we permanently give up waiting on supplier devices without drivers. At this point, whatever devices can probe without some of their optional suppliers end up probing. In the case where module support is disabled, it's fairly straightforward and all device probes are completed before the initcalls are done. Patches 1 to 3 are fairly straightforward and can probably be applied right away. Patches 4 to 6 are for fixing the NFS rootfs issue and setting the default deferred_probe_timeout back to 10 seconds when modules are enabled. Patches 7 to 9 are further clean up of the deferred_probe_timeout logic so that no framework has to know/care about deferred_probe_timeout. Yoshihiro/Geert, If you can test this patch series and confirm that the NFS root case works, I'd really appreciate that. Thanks, Saravana v1 -> v2: Rewrote the NFS rootfs fix to be a lot less destructive on the fw_devlink ordering for devices that don't end up probing during the "best effort" attempt at probing all devices needed for a network rootfs Saravana Kannan (9): PM: domains: Delete usage of driver_deferred_probe_check_state() pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() net: mdio: Delete usage of driver_deferred_probe_check_state() driver core: Add wait_for_init_devices_probe helper function net: ipconfig: Relax fw_devlink if we need to mount a network rootfs Revert "driver core: Set default deferred_probe_timeout back to 0." driver core: Set fw_devlink.strict=1 by default iommu/of: Delete usage of driver_deferred_probe_check_state() driver core: Delete driver_deferred_probe_check_state() drivers/base/base.h| 1 + drivers/base/core.c| 102 ++--- drivers/base/dd.c | 54 ++--- drivers/base/power/domain.c| 2 +- drivers/iommu/of_iommu.c | 2 +- drivers/net/mdio/fwnode_mdio.c | 4 +- drivers/pinctrl/devicetree.c | 2 +- include/linux/device/driver.h | 2 +- net/ipv4/ipconfig.c| 6 ++ 9 files changed, 126 insertions(+), 49 deletions(-) -- 2.36.1.255.ge46751e96f-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
Now that fw_devlink=on by default and fw_devlink supports "power-domains" property, the execution will never get to the point where driver_deferred_probe_check_state() is called before the supplier has probed successfully or before deferred probe timeout has expired. So, delete the call and replace it with -ENODEV. Signed-off-by: Saravana Kannan --- drivers/base/power/domain.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 739e52cd4aba..3e86772d5fac 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev, mutex_unlock(_list_lock); dev_dbg(dev, "%s() failed to find PM domain: %ld\n", __func__, PTR_ERR(pd)); - return driver_deferred_probe_check_state(base_dev); + return -ENODEV; } dev_dbg(dev, "adding to PM domain %s\n", pd->name); -- 2.36.1.255.ge46751e96f-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs
On 2022/6/1 02:51, Jason Gunthorpe wrote: Oh, I've spent the last couple of weeks hacking up horrible things manipulating entries in init_mm, and never realised that that was actually the special case. Oh well, live and learn. The init_mm is sort of different, it doesn't have zap in quite the same way, for example. I was talking about the typical process mm. Anyhow, the right solution is to use RCU as I described before, Baolu do you want to try? Yes, of course. Your discussion with Robin gave me a lot of inspiration. Very appreciated! I want to use a separate patch to solve this debugfs problem, because it has exceeded the original intention of this series. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu