Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry
On Wed, Sep 08, 2021 at 10:33:26PM -0500, Jeremy Linton wrote: > +DPAA2, netdev maintainers > Hi, > > On 5/18/21 7:54 AM, Hamza Mahfooz wrote: > > Since, overlapping mappings are not supported by the DMA API we should > > report an error if active_cacheline_insert returns -EEXIST. > > It seems this patch found a victim. I was trying to run iperf3 on a > honeycomb (5.14.0, fedora 35) and the console is blasting this error message > at 100% cpu. So, I changed it to a WARN_ONCE() to get the call trace, which > is attached below. > Thanks for the report. I don't have access to hardware at the moment to actually see what's happening since I'm on vacation. I'll work on it in a few days. Ioana ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry
On Wed, Sep 08, 2021 at 10:33:26PM -0500, Jeremy Linton wrote: > PS, it might not hurt to rate limit/_once this somehow to avoid a runtime > problem if it starts to trigger. Yes, that might be a good idea. Care to prepare a patch? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] dma-debug: prevent an error message from causing runtime problems
For some drivers, that call add_dma_entry() from somewhere down the call stack. If this error condition is triggered once, it causes the error message to spam the kernel's printk buffer and bring the CPU usage up to 100%. Also, since there is at least one driver that is in the mainline and suffers from the error condition, it is more useful to WARN_ON() here instead of just printing the error message (in hopes that it will make it easier for other drivers that suffer from this issue to be spotted). Link: https://lkml.kernel.org/r/fd67fbac-64bf-f0ea-01e1-5938ccfab...@arm.com Reported-by: Jeremy Linton Signed-off-by: Hamza Mahfooz --- kernel/dma/debug.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 6c90c69e5311..d9806689666e 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -567,7 +567,9 @@ static void add_dma_entry(struct dma_debug_entry *entry) pr_err("cacheline tracking ENOMEM, dma-debug disabled\n"); global_disable = true; } else if (rc == -EEXIST) { - pr_err("cacheline tracking EEXIST, overlapping mappings aren't supported\n"); + WARN_ONCE(1, + pr_fmt("cacheline tracking EEXIST, overlapping mappings aren't supported\n" +)); } } -- 2.33.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-debug: prevent an error message from causing runtime problems
On 2021-09-10 13:05, Hamza Mahfooz wrote: For some drivers, that call add_dma_entry() from somewhere down the call stack. Nit: strictly, drivers don't call add_dma_entry(). Drivers only call the DMA API functions, and it is the DMA API internals which take a detour through dma-debug when desired. If this error condition is triggered once, it causes the error message to spam the kernel's printk buffer Is that true? It doesn't look like anything in dma-debug itself can obviously lead to that; I was assuming that in Jeremy's case it's the driver which has managed to do something such that every new mapping call it makes ends up hitting the warning. A busy network interface is probably more than capable of saturating the kernel log with a print for every packet (particularly a great big 100GBE-capable multi-queue thing like that one). and bring the CPU usage up to 100%. Also, since there is at least one driver that is in the mainline and suffers from the error condition, it is more useful to WARN_ON() here instead of just printing the error message (in hopes that it will make it easier for other drivers that suffer from this issue to be spotted). Link: https://lkml.kernel.org/r/fd67fbac-64bf-f0ea-01e1-5938ccfab...@arm.com Reported-by: Jeremy Linton Signed-off-by: Hamza Mahfooz --- kernel/dma/debug.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 6c90c69e5311..d9806689666e 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -567,7 +567,9 @@ static void add_dma_entry(struct dma_debug_entry *entry) pr_err("cacheline tracking ENOMEM, dma-debug disabled\n"); global_disable = true; } else if (rc == -EEXIST) { - pr_err("cacheline tracking EEXIST, overlapping mappings aren't supported\n"); + WARN_ONCE(1, + pr_fmt("cacheline tracking EEXIST, overlapping mappings aren't supported\n" +)); Unless there's some subtlety I'm missing, it would be better to use err_printk() here - not only for consistency of output, but also to tie in with dma-debug's existing output-limiting controls. Robin. } } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/8] mm: Introduce a function to check for confidential computing features
On Wed, Sep 08, 2021 at 05:58:33PM -0500, Tom Lendacky wrote: > In prep for other confidential computing technologies, introduce a generic preparation > helper function, cc_platform_has(), that can be used to check for specific > active confidential computing attributes, like memory encryption. This is > intended to eliminate having to add multiple technology-specific checks to > the code (e.g. if (sev_active() || tdx_active())). ... > diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h > new file mode 100644 > index ..253f3ea66cd8 > --- /dev/null > +++ b/include/linux/cc_platform.h > @@ -0,0 +1,88 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Confidential Computing Platform Capability checks > + * > + * Copyright (C) 2021 Advanced Micro Devices, Inc. > + * > + * Author: Tom Lendacky > + */ > + > +#ifndef _CC_PLATFORM_H _LINUX_CC_PLATFORM_H > +#define _CC_PLATFORM_H -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[git pull] IOMMU Fixes for Linux v5.15-rc0
Hi Linus, The following changes since commit d8768d7eb9c21ef928adb93402d9348bcc4a6915: Merge branches 'apple/dart', 'arm/smmu', 'iommu/fixes', 'x86/amd', 'x86/vt-d' and 'core' into next (2021-08-20 17:14:35 +0200) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git tags/iommu-fixes-v5.15-rc0 for you to fetch changes up to 8cc633190b524c678b740c87fa1fc37447151a6b: iommu: Clarify default domain Kconfig (2021-09-09 13:18:07 +0200) IOMMU Fixes for v5.15-rc1 Including: - Intel VT-d: - PASID leakage in intel_svm_unbind_mm(); - Deadlock in intel_svm_drain_prq(). - AMD IOMMU: Fixes for an unhandled page-fault bug when AVIC is used for a KVM guest. - Make CONFIG_IOMMU_DEFAULT_DMA_LAZY architecture instead of IOMMU driver dependent Fenghua Yu (2): iommu/vt-d: Fix PASID leak in intel_svm_unbind_mm() iommu/vt-d: Fix a deadlock in intel_svm_drain_prq() Robin Murphy (1): iommu: Clarify default domain Kconfig Suravee Suthikulpanit (1): iommu/amd: Remove iommu_init_ga() Wei Huang (1): iommu/amd: Relocate GAMSup check to early_enable_iommus drivers/iommu/Kconfig | 2 +- drivers/iommu/amd/init.c | 48 +++ drivers/iommu/intel/svm.c | 15 --- 3 files changed, 41 insertions(+), 24 deletions(-) Please pull. Thanks, Joerg signature.asc Description: Digital signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/3] iommu/amd: Fix unable to handle page fault due to AVIC
Thanks. We did verify the correctness of this patch: we didn't see host crash with guest reboot when this patch is applied. Tested-by: Terry Bowman Thanks, -Wei On 9/9/21 6:17 AM, Joerg Roedel wrote: > Okay, after this triggered a defconfig compile warning, I squashed patch > 1 and 2 into one and also #ifdef'ed check_feature_on_all_iommus(). The > result is here: > > From c3811a50addd23b9bb5a36278609ee1638debcf6 Mon Sep 17 00:00:00 2001 > From: Wei Huang > Date: Fri, 20 Aug 2021 15:29:55 -0500 > Subject: [PATCH] iommu/amd: Relocate GAMSup check to early_enable_iommus > > Currently, iommu_init_ga() checks and disables IOMMU VAPIC support > (i.e. AMD AVIC support in IOMMU) when GAMSup feature bit is not set. > However it forgets to clear IRQ_POSTING_CAP from the previously set > amd_iommu_irq_ops.capability. > > This triggers an invalid page fault bug during guest VM warm reboot > if AVIC is enabled since the irq_remapping_cap(IRQ_POSTING_CAP) is > incorrectly set, and crash the system with the following kernel trace. > > BUG: unable to handle page fault for address: 00400dd8 > RIP: 0010:amd_iommu_deactivate_guest_mode+0x19/0xbc > Call Trace: > svm_set_pi_irte_mode+0x8a/0xc0 [kvm_amd] > ? kvm_make_all_cpus_request_except+0x50/0x70 [kvm] > kvm_request_apicv_update+0x10c/0x150 [kvm] > svm_toggle_avic_for_irq_window+0x52/0x90 [kvm_amd] > svm_enable_irq_window+0x26/0xa0 [kvm_amd] > vcpu_enter_guest+0xbbe/0x1560 [kvm] > ? avic_vcpu_load+0xd5/0x120 [kvm_amd] > ? kvm_arch_vcpu_load+0x76/0x240 [kvm] > ? svm_get_segment_base+0xa/0x10 [kvm_amd] > kvm_arch_vcpu_ioctl_run+0x103/0x590 [kvm] > kvm_vcpu_ioctl+0x22a/0x5d0 [kvm] > __x64_sys_ioctl+0x84/0xc0 > do_syscall_64+0x33/0x40 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > Fixes by moving the initializing of AMD IOMMU interrupt remapping mode > (amd_iommu_guest_ir) earlier before setting up the > amd_iommu_irq_ops.capability with appropriate IRQ_POSTING_CAP flag. > > [joro:Squashed the two patches and limited > check_features_on_all_iommus() to CONFIG_IRQ_REMAP > to fix a compile warning.] > > Signed-off-by: Wei Huang > Co-developed-by: Suravee Suthikulpanit > Signed-off-by: Suravee Suthikulpanit > Link: > https://lore.kernel.org/r/20210820202957.187572-2-suravee.suthikulpa...@amd.com > Link: > https://lore.kernel.org/r/20210820202957.187572-3-suravee.suthikulpa...@amd.com > Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log") > Signed-off-by: Joerg Roedel > --- > drivers/iommu/amd/init.c | 31 --- > 1 file changed, 24 insertions(+), 7 deletions(-) > > diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c > index bdcf167b4afe..4e753d1860b3 100644 > --- a/drivers/iommu/amd/init.c > +++ b/drivers/iommu/amd/init.c > @@ -297,6 +297,22 @@ int amd_iommu_get_num_iommus(void) > return amd_iommus_present; > } > > +#ifdef CONFIG_IRQ_REMAP > +static bool check_feature_on_all_iommus(u64 mask) > +{ > + bool ret = false; > + struct amd_iommu *iommu; > + > + for_each_iommu(iommu) { > + ret = iommu_feature(iommu, mask); > + if (!ret) > + return false; > + } > + > + return true; > +} > +#endif > + > /* > * For IVHD type 0x11/0x40, EFR is also available via IVHD. > * Default to IVHD EFR since it is available sooner > @@ -853,13 +869,6 @@ static int iommu_init_ga(struct amd_iommu *iommu) > int ret = 0; > > #ifdef CONFIG_IRQ_REMAP > - /* Note: We have already checked GASup from IVRS table. > - * Now, we need to make sure that GAMSup is set. > - */ > - if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) && > - !iommu_feature(iommu, FEATURE_GAM_VAPIC)) > - amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY_GA; > - > ret = iommu_init_ga_log(iommu); > #endif /* CONFIG_IRQ_REMAP */ > > @@ -2479,6 +2488,14 @@ static void early_enable_iommus(void) > } > > #ifdef CONFIG_IRQ_REMAP > + /* > + * Note: We have already checked GASup from IVRS table. > + * Now, we need to make sure that GAMSup is set. > + */ > + if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) && > + !check_feature_on_all_iommus(FEATURE_GAM_VAPIC)) > + amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY_GA; > + > if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) > amd_iommu_irq_ops.capability |= (1 << IRQ_POSTING_CAP); > #endif > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [git pull] IOMMU Fixes for Linux v5.15-rc0
The pull request you sent on Fri, 10 Sep 2021 17:48:20 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git > tags/iommu-fixes-v5.15-rc0 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/589e5cab170843b2f7f8260168ab2d77163d4384 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] dma-debug: prevent an error message from causing runtime problems
For some drivers, that use the DMA API. This error message can be reached several millions of times per second, causing spam to the kernel's printk buffer and bringing the CPU usage up to 100% (so, it should be rate limited). However, since there is at least one driver that is in the mainline and suffers from the error condition, it is more useful to err_printk() here instead of just rate limiting the error message (in hopes that it will make it easier for other drivers that suffer from this issue to be spotted). Link: https://lkml.kernel.org/r/fd67fbac-64bf-f0ea-01e1-5938ccfab...@arm.com Reported-by: Jeremy Linton Signed-off-by: Hamza Mahfooz --- v2: use err_printk() and make some clarifications in the commit message. --- kernel/dma/debug.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 6c90c69e5311..718a7f455df6 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -567,7 +567,8 @@ static void add_dma_entry(struct dma_debug_entry *entry) pr_err("cacheline tracking ENOMEM, dma-debug disabled\n"); global_disable = true; } else if (rc == -EEXIST) { - pr_err("cacheline tracking EEXIST, overlapping mappings aren't supported\n"); + err_printk(entry->dev, entry, "cacheline tracking EEXIST, " + "overlapping mappings aren't supported\n"); } } -- 2.33.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu