[PATCH] swiotlb: add overflow checks to swiotlb_bounce
This is a follow-up on 5f89468e2f06 ("swiotlb: manipulate orig_addr when tlb_addr has offset") which fixed unaligned dma mappings, making sure the following overflows are caught: - offset of the start of the slot within the device bigger than requested address' offset, in other words if the base address given in swiotlb_tbl_map_single to create the mapping (orig_addr) was after the requested address for the sync (tlb_offset) in the same block: |--| block <> mapped part of the block ^ orig_addr ^ invalid tlb_addr for sync - if the resulting offset was bigger than the allocation size this one could happen if the mapping was not until the end. e.g. |--| block <-> mapped part of the block ^ ^ orig_addr invalid tlb_addr Both should never happen so print a warning and bail out without trying to adjust the sizes/offsets: the first one could try to sync from orig_addr to whatever is left of the requested size, but the later really has nothing to sync there... Signed-off-by: Dominique Martinet Cc: Konrad Rzeszutek Wilk Cc: Bumyong Lee Cc: Chanho Park Cc: Christoph Hellwig --- Hi Konrad, here's the follow up for the swiotlb/caamjr regression I had promissed. It doesn't really change anything, and I confirmed I don't hit either of the warnings on our board, but it's probably best to have as either could really happen. kernel/dma/swiotlb.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index e50df8d8f87e..23f8d0b168c5 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -354,13 +354,27 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size size_t alloc_size = mem->slots[index].alloc_size; unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); - unsigned int tlb_offset; + unsigned int tlb_offset, orig_addr_offset; if (orig_addr == INVALID_PHYS_ADDR) return; - tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) - -swiotlb_align_offset(dev, orig_addr); + tlb_offset = tlb_addr & (IO_TLB_SIZE - 1); + orig_addr_offset = swiotlb_align_offset(dev, orig_addr); + if (tlb_offset < orig_addr_offset) { + dev_WARN_ONCE(dev, 1, + "Access before mapping start detected. orig offset %u, requested offset %u.\n", + orig_addr_offset, tlb_offset); + return; + } + + tlb_offset -= orig_addr_offset; + if (tlb_offset > alloc_size) { + dev_WARN_ONCE(dev, 1, + "Buffer overflow detected. Allocation size: %zu. Mapping size: %zu+%u.\n", + alloc_size, size, tlb_offset); + return; + } orig_addr += tlb_offset; alloc_size -= tlb_offset; -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling
On Sun, Jul 4, 2021 at 11:16 AM Rob Clark wrote: > > I suspect you are getting a dpu fault, and need: > > https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=h...@mail.gmail.com/ > > I suppose Bjorn was expecting me to send that patch If it's helpful, I applied that and it got the db845c booting mainline again for me (along with some reverts for a separate ext4 shrinker crash). Tested-by: John Stultz thanks -john ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module
Allow the qcom_scm driver to be loadable as a permenent module. This still uses the "depends on QCOM_SCM || !QCOM_SCM" bit to ensure that drivers that call into the qcom_scm driver are also built as modules. While not ideal in some cases its the only safe way I can find to avoid build errors without having those drivers select QCOM_SCM and have to force it on (as QCOM_SCM=n can be valid for those drivers). Reviving this now that Saravana's fw_devlink defaults to on, which should avoid loading troubles seen before. Cc: Catalin Marinas Cc: Will Deacon Cc: Andy Gross Cc: Bjorn Andersson Cc: Joerg Roedel Cc: Thomas Gleixner Cc: Marc Zyngier Cc: Linus Walleij Cc: Vinod Koul Cc: Kalle Valo Cc: Maulik Shah Cc: Saravana Kannan Cc: Todd Kjos Cc: Greg Kroah-Hartman Cc: linux-arm-...@vger.kernel.org Cc: iommu@lists.linux-foundation.org Cc: linux-g...@vger.kernel.org Acked-by: Kalle Valo Acked-by: Greg Kroah-Hartman Acked-by: Will Deacon Reviewed-by: Bjorn Andersson Signed-off-by: John Stultz --- v3: * Fix __arm_smccc_smc build issue reported by kernel test robot v4: * Add "depends on QCOM_SCM || !QCOM_SCM" bit to ath10k config that requires it. v5: * Fix QCOM_QCM typo in Kconfig, it should be QCOM_SCM --- drivers/firmware/Kconfig| 2 +- drivers/firmware/Makefile | 3 ++- drivers/firmware/qcom_scm.c | 4 drivers/iommu/Kconfig | 2 ++ drivers/net/wireless/ath/ath10k/Kconfig | 1 + 5 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index db0ea2d2d75a..af53778edc7e 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -235,7 +235,7 @@ config INTEL_STRATIX10_RSU Say Y here if you want Intel RSU support. config QCOM_SCM - bool + tristate "Qcom SCM driver" depends on ARM || ARM64 depends on HAVE_ARM_SMCCC select RESET_CONTROLLER diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile index 5e013b6a3692..523173cbff33 100644 --- a/drivers/firmware/Makefile +++ b/drivers/firmware/Makefile @@ -17,7 +17,8 @@ obj-$(CONFIG_ISCSI_IBFT) += iscsi_ibft.o obj-$(CONFIG_FIRMWARE_MEMMAP) += memmap.o obj-$(CONFIG_RASPBERRYPI_FIRMWARE) += raspberrypi.o obj-$(CONFIG_FW_CFG_SYSFS) += qemu_fw_cfg.o -obj-$(CONFIG_QCOM_SCM) += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o +obj-$(CONFIG_QCOM_SCM) += qcom-scm.o +qcom-scm-objs += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o obj-$(CONFIG_TI_SCI_PROTOCOL) += ti_sci.o obj-$(CONFIG_TRUSTED_FOUNDATIONS) += trusted_foundations.o obj-$(CONFIG_TURRIS_MOX_RWTM) += turris-mox-rwtm.o diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c index ee9cb545e73b..bb9ce3f92931 100644 --- a/drivers/firmware/qcom_scm.c +++ b/drivers/firmware/qcom_scm.c @@ -1296,6 +1296,7 @@ static const struct of_device_id qcom_scm_dt_match[] = { { .compatible = "qcom,scm" }, {} }; +MODULE_DEVICE_TABLE(of, qcom_scm_dt_match); static struct platform_driver qcom_scm_driver = { .driver = { @@ -1312,3 +1313,6 @@ static int __init qcom_scm_init(void) return platform_driver_register(&qcom_scm_driver); } subsys_initcall(qcom_scm_init); + +MODULE_DESCRIPTION("Qualcomm Technologies, Inc. SCM driver"); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 07b7c25cbed8..f61516c17589 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -253,6 +253,7 @@ config SPAPR_TCE_IOMMU config ARM_SMMU tristate "ARM Ltd. System MMU (SMMU) Support" depends on ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64) + depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y select IOMMU_API select IOMMU_IO_PGTABLE_LPAE select ARM_DMA_USE_IOMMU if ARM @@ -382,6 +383,7 @@ config QCOM_IOMMU # Note: iommu drivers cannot (yet?) be built as modules bool "Qualcomm IOMMU Support" depends on ARCH_QCOM || (COMPILE_TEST && !GENERIC_ATOMIC64) + depends on QCOM_SCM=y select IOMMU_API select IOMMU_IO_PGTABLE_LPAE select ARM_DMA_USE_IOMMU diff --git a/drivers/net/wireless/ath/ath10k/Kconfig b/drivers/net/wireless/ath/ath10k/Kconfig index 40f91bc8514d..741289e385d5 100644 --- a/drivers/net/wireless/ath/ath10k/Kconfig +++ b/drivers/net/wireless/ath/ath10k/Kconfig @@ -44,6 +44,7 @@ config ATH10K_SNOC tristate "Qualcomm ath10k SNOC support" depends on ATH10K depends on ARCH_QCOM || COMPILE_TEST + depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y select QCOM_QMI_HELPERS help This module adds support for integrated WCN3990 chip connected -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling
On Sun 04 Jul 13:20 CDT 2021, Rob Clark wrote: > I suspect you are getting a dpu fault, and need: > > https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=h...@mail.gmail.com/ > > I suppose Bjorn was expecting me to send that patch > No, I left that discussion with the same understanding as you... But I ended up side tracked by some other craziness. Did you post this somewhere or would you still like me to test it and spin a patch? Regards, Bjorn > BR, > -R > > On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov > wrote: > > > > Hi, > > > > I've had splash screen disabled on my RB3. However once I've enabled it, > > I've got the attached crash during the boot on the msm/msm-next. It > > looks like it is related to this particular set of changes. > > > > On 11/06/2021 00:44, Rob Clark wrote: > > > From: Rob Clark > > > > > > This picks up an earlier series[1] from Jordan, and adds additional > > > support needed to generate GPU devcore dumps on iova faults. Original > > > description: > > > > > > This is a stack to add an Adreno GPU specific handler for pagefaults. The > > > first > > > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch > > > adds > > > a adreno-smmu-priv function hook to capture a handful of important > > > debugging > > > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by > > > the > > > third patch to print more detailed information on page fault such as the > > > TTBR0 > > > for the pagetable that caused the fault and the source of the fault as > > > determined by a combination of the FSYNR1 register and an internal GPU > > > register. > > > > > > This code provides a solid base that we can expand on later for even more > > > extensive GPU side page fault debugging capabilities. > > > > > > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where > > > GPU snapshotting needs to avoid crashdumper, and check the > > > RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths > > > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver > > > resume translation after it has had a chance to snapshot the GPUs > > > state > > > v3: Always clear FSR even if the target driver is going to handle resume > > > v2: Fix comment wording and function pointer check per Rob Clark > > > > > > [1] > > > https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcro...@codeaurora.org/ > > > > > > Jordan Crouse (3): > > >iommu/arm-smmu: Add support for driver IOMMU fault handlers > > >iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault > > > info > > >drm/msm: Improve the a6xx page fault handler > > > > > > Rob Clark (2): > > >iommu/arm-smmu-qcom: Add stall support > > >drm/msm: devcoredump iommu fault support > > > > > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 23 +++- > > > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 110 +++- > > > drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 42 ++-- > > > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 15 +++ > > > drivers/gpu/drm/msm/msm_gem.h | 1 + > > > drivers/gpu/drm/msm/msm_gem_submit.c| 1 + > > > drivers/gpu/drm/msm/msm_gpu.c | 48 + > > > drivers/gpu/drm/msm/msm_gpu.h | 17 +++ > > > drivers/gpu/drm/msm/msm_gpummu.c| 5 + > > > drivers/gpu/drm/msm/msm_iommu.c | 22 +++- > > > drivers/gpu/drm/msm/msm_mmu.h | 5 +- > > > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 50 + > > > drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +- > > > drivers/iommu/arm/arm-smmu/arm-smmu.h | 2 + > > > include/linux/adreno-smmu-priv.h| 38 ++- > > > 15 files changed, 367 insertions(+), 21 deletions(-) > > > > > > > > > -- > > With best wishes > > Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
Hi Will and Robin, On 7/6/2021 10:06 AM, Will Deacon wrote: On Tue, Jul 06, 2021 at 04:39:11PM +0100, Robin Murphy wrote: On 2021-07-06 15:05, Christoph Hellwig wrote: On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote: FWIW I was pondering the question of whether to do something along those lines or just scrap the default assignment entirely, so since I hadn't got round to saying that I've gone ahead and hacked up the alternative (similarly untested) for comparison :) TBH I'm still not sure which one I prefer... Claire did implement something like your suggestion originally, but I don't really like it as it doesn't scale for adding multiple global pools, e.g. for the 64-bit addressable one for the various encrypted secure guest schemes. Ah yes, that had slipped my mind, and it's a fair point indeed. Since we're not concerned with a minimal fix for backports anyway I'm more than happy to focus on Will's approach. Another thing is that that looks to take us a quiet step closer to the possibility of dynamically resizing a SWIOTLB pool, which is something that some of the hypervisor protection schemes looking to build on top of this series may want to explore at some point. Ok, I'll split that nasty diff I posted up into a reviewable series and we can take it from there. For what it's worth, I attempted to boot Will's diff on top of Konrad's devel/for-linus-5.14 and it did not work; in fact, I got no output on my monitor period, even with earlyprintk=, and I do not think this machine has a serial console. Robin's fix does work, it survived ten reboots with no issues getting to X and I do not see the KASAN and slub debug messages anymore but I understand that this is not the preferred solution it seems (although Konrad did want to know if it works). I am happy to test any further patches or follow ups as needed, just keep me on CC. Cheers, Nathan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 1/1] dma-debug: fix check_for_illegal_area() in debug_dma_map_sg()
On Tue, 6 Jul 2021 10:22:40 +0100 Robin Murphy wrote: > On 2021-07-05 19:52, Gerald Schaefer wrote: > > The following warning occurred sporadically on s390: > > DMA-API: nvme 0006:00:00.0: device driver maps memory from kernel text or > > rodata [addr=48cc5e2f] [len=131072] > > WARNING: CPU: 4 PID: 825 at kernel/dma/debug.c:1083 > > check_for_illegal_area+0xa8/0x138 > > > > It is a false-positive warning, due to a broken logic in debug_dma_map_sg(). > > check_for_illegal_area() should check for overlay of sg elements with kernel > > text or rodata. It is called with sg_dma_len(s) instead of s->length as > > parameter. After the call to ->map_sg(), sg_dma_len() contains the length > > of possibly combined sg elements in the DMA address space, and not the > > individual sg element length, which would be s->length. > > > > The check will then use the kernel start address of an sg element, and add > > the DMA length for overlap check, which can result in the false-positive > > warning because the DMA length can be larger than the actual single sg > > element length in kernel address space. > > > > In addition, the call to check_for_illegal_area() happens in the iteration > > over mapped_ents, which will not include all individual sg elements if > > any of them were combined in ->map_sg(). > > > > Fix this by using s->length instead of sg_dma_len(s). Also put the call to > > check_for_illegal_area() in a separate loop, iterating over all the > > individual sg elements ("nents" instead of "mapped_ents"). > > > > Fixes: 884d05970bfb ("dma-debug: use sg_dma_len accessor") > > Tested-by: Niklas Schnelle > > Signed-off-by: Gerald Schaefer > > --- > > kernel/dma/debug.c | 10 ++ > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c > > index 14de1271463f..d7d44b7fe7e2 100644 > > --- a/kernel/dma/debug.c > > +++ b/kernel/dma/debug.c > > @@ -1299,6 +1299,12 @@ void debug_dma_map_sg(struct device *dev, struct > > scatterlist *sg, > > if (unlikely(dma_debug_disabled())) > > return; > > > > + for_each_sg(sg, s, nents, i) { > > + if (!PageHighMem(sg_page(s))) { > > + check_for_illegal_area(dev, sg_virt(s), s->length); > > + } > > + } > > + > > for_each_sg(sg, s, mapped_ents, i) { > > entry = dma_entry_alloc(); > > if (!entry) > > @@ -1316,10 +1322,6 @@ void debug_dma_map_sg(struct device *dev, struct > > scatterlist *sg, > > > > check_for_stack(dev, sg_page(s), s->offset); > > Strictly this should probably be moved to the new loop as well, as it is > similarly concerned with validating the source segments rather than the > DMA mappings - I think with virtually-mapped stacks it might technically > be possible for a stack page to be physically adjacent to a "valid" page > such that it could get merged and overlooked if it were near the end of > the list, although in fairness that would probably be indicative of > something having gone far more fundamentally wrong. Otherwise, the > overall reasoning looks sound to me. I see, good point. I think I can add this to my patch, and a different subject like "dma-debug: fix sg checks in debug_dma_map_sg()". However, I do not quite understand why check_for_stack() does not also consider s->length. It seems to check only the first page of an sg element. So, shouldn't check_for_stack() behave similar to check_for_illegal_area(), i.e. check all source sg elements for overlap with the task stack area? If yes, then this probably should be a separate patch, but I can try to come up with something and send a new RFC with two patches. Maybe check_for_stack() can also be integrated into check_for_illegal_area(), they are both called at the same places. And mapping memory from the stack also sounds rather illegal. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Fallback to default setting when def_domain_type() callback returns 0
On 2021-07-06 17:21, Kai-Heng Feng wrote: On Tue, Jul 6, 2021 at 5:27 PM Robin Murphy wrote: On 2021-07-06 07:51, Kai-Heng Feng wrote: Commit 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted device into core") not only moved the check for untrusted device to IOMMU core, it also introduced a behavioral change by returning def_domain_type() directly without checking its return value. That makes many devices no longer use the default IOMMU setting. So revert back to the old behavior which defaults to iommu_def_domain_type when driver callback returns 0. Fixes: 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted device into core") Are you sure about that? From that same commit: @@ -1507,7 +1509,7 @@ static int iommu_alloc_default_domain(struct iommu_group *group, if (group->default_domain) return 0; - type = iommu_get_def_domain_type(dev); + type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type; return iommu_group_alloc_default_domain(dev->bus, group, type); } AFAICS the other two callers should also handle 0 correctly. Have you seen a problem in practice? Thanks for pointing out how the return value is being handled by the callers. However, the same check is missing in probe_get_default_domain_type(): static int probe_get_default_domain_type(struct device *dev, void *data) { struct __group_domain_type *gtype = data; unsigned int type = iommu_get_def_domain_type(dev); ... } I'm still not following - the next line right after that is "if (type)", which means it won't touch gtype, and if that happens for every iteration, probe_alloc_default_domain() subsequently hits its "if (!gtype.type)" condition and still ends up with iommu_def_domain_type. This *was* one of the other two callers I was talking about (the second being iommu_change_dev_def_domain()), and in fact on second look I think your proposed change will actually break this logic, since it's necessary to differentiate between a specific type being requested for the given device, and a "don't care" response which only implies to use the global default type if it's still standing after *all* the appropriate devices have been queried. I personally prefer the old way instead of open coding with ternary operator, so I'll do that in v2. In practice, this causes a kernel panic when probing Realtek WiFi. Because of the bug, dma_ops isn't set by probe_finalize(), dma_map_single() falls back to swiotlb which isn't set and caused a kernel panic. Hmm, but if that's the case, wouldn't it still be a problem anyway if the end result was IOMMU_DOMAIN_IDENTITY? I can't claim to fully understand the x86 swiotlb and no_iommu dance, but nothing really stands out to give me confidence that it handles the passthrough options correctly. Robin. I didn't attach the panic log because the system simply is frozen at that point so the message is not logged to the storage. I'll see if I can find another way to collect the log and attach it in v2. Kai-Heng Robin. Signed-off-by: Kai-Heng Feng --- drivers/iommu/iommu.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 5419c4b9f27a..faac4f795025 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1507,14 +1507,15 @@ EXPORT_SYMBOL_GPL(fsl_mc_device_group); static int iommu_get_def_domain_type(struct device *dev) { const struct iommu_ops *ops = dev->bus->iommu_ops; + unsigned int type = 0; if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted) return IOMMU_DOMAIN_DMA; if (ops->def_domain_type) - return ops->def_domain_type(dev); + type = ops->def_domain_type(dev); - return 0; + return (type == 0) ? iommu_def_domain_type : type; } static int iommu_group_alloc_default_domain(struct bus_type *bus, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Tue, Jul 06, 2021 at 04:39:11PM +0100, Robin Murphy wrote: > On 2021-07-06 15:05, Christoph Hellwig wrote: > > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote: > > > FWIW I was pondering the question of whether to do something along those > > > lines or just scrap the default assignment entirely, so since I hadn't got > > > round to saying that I've gone ahead and hacked up the alternative > > > (similarly untested) for comparison :) > > > > > > TBH I'm still not sure which one I prefer... > > > > Claire did implement something like your suggestion originally, but > > I don't really like it as it doesn't scale for adding multiple global > > pools, e.g. for the 64-bit addressable one for the various encrypted > > secure guest schemes. > > Ah yes, that had slipped my mind, and it's a fair point indeed. Since we're > not concerned with a minimal fix for backports anyway I'm more than happy to > focus on Will's approach. Another thing is that that looks to take us a > quiet step closer to the possibility of dynamically resizing a SWIOTLB pool, > which is something that some of the hypervisor protection schemes looking to > build on top of this series may want to explore at some point. Ok, I'll split that nasty diff I posted up into a reviewable series and we can take it from there. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Tue, Jul 06, 2021 at 05:57:21PM +0100, Will Deacon wrote: > On Tue, Jul 06, 2021 at 10:46:07AM -0400, Konrad Rzeszutek Wilk wrote: > > On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote: > > > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote: > > > > FWIW I was pondering the question of whether to do something along > > > > those > > > > lines or just scrap the default assignment entirely, so since I hadn't > > > > got > > > > round to saying that I've gone ahead and hacked up the alternative > > > > (similarly untested) for comparison :) > > > > > > > > TBH I'm still not sure which one I prefer... > > > > > > Claire did implement something like your suggestion originally, but > > > I don't really like it as it doesn't scale for adding multiple global > > > pools, e.g. for the 64-bit addressable one for the various encrypted > > > secure guest schemes. > > > > Couple of things: > > - I am not pushing to Linus the Claire's patchset until we have a > >resolution on this. I hope you all agree that is a sensible way > >forward as much as I hate doing that. > > Sure, it's a pity but we could clearly use a bit more time to get these > just right and we've run out of time for 5.14. > > I think the main question I have is how would you like to see patches for > 5.15? i.e. as patches on top of devel/for-linus-5.14 or something else? Yes that would be perfect. If there are any dependencies on the rc1, I can rebase it on top of that. > > Cheers, > > Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Tue, Jul 06, 2021 at 10:46:07AM -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote: > > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote: > > > FWIW I was pondering the question of whether to do something along those > > > lines or just scrap the default assignment entirely, so since I hadn't > > > got > > > round to saying that I've gone ahead and hacked up the alternative > > > (similarly untested) for comparison :) > > > > > > TBH I'm still not sure which one I prefer... > > > > Claire did implement something like your suggestion originally, but > > I don't really like it as it doesn't scale for adding multiple global > > pools, e.g. for the 64-bit addressable one for the various encrypted > > secure guest schemes. > > Couple of things: > - I am not pushing to Linus the Claire's patchset until we have a >resolution on this. I hope you all agree that is a sensible way >forward as much as I hate doing that. Sure, it's a pity but we could clearly use a bit more time to get these just right and we've run out of time for 5.14. I think the main question I have is how would you like to see patches for 5.15? i.e. as patches on top of devel/for-linus-5.14 or something else? Cheers, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"
On Mon, Jul 05, 2021 at 08:56:57AM +0200, Marek Szyprowski wrote: > QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller, > what fails for the second and latter IOMMU devices. This is intended and > must be not fatal to the driver registration process. Also the cleanup > path should take care of the runtime PM state, what is missing in the > current patch. Revert relevant changes to the QCOM IOMMU driver until > a proper fix is prepared. > > This partially reverts commit 249c9dc6aa0db74a0f7908efd04acf774e19b155. > > Fixes: 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error > path") > Suggested-by: Will Deacon > Signed-off-by: Marek Szyprowski > --- > drivers/iommu/arm/arm-smmu/qcom_iommu.c | 13 ++--- > 1 file changed, 2 insertions(+), 11 deletions(-) Thanks, Marek: Acked-by: Will Deacon Joerg -- please can you pick this up as a fix? Cheers, Will > diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c > b/drivers/iommu/arm/arm-smmu/qcom_iommu.c > index 25ed444ff94d..021cf8f65ffc 100644 > --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c > +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c > @@ -849,12 +849,10 @@ static int qcom_iommu_device_probe(struct > platform_device *pdev) > ret = iommu_device_register(&qcom_iommu->iommu, &qcom_iommu_ops, dev); > if (ret) { > dev_err(dev, "Failed to register iommu\n"); > - goto err_sysfs_remove; > + return ret; > } > > - ret = bus_set_iommu(&platform_bus_type, &qcom_iommu_ops); > - if (ret) > - goto err_unregister_device; > + bus_set_iommu(&platform_bus_type, &qcom_iommu_ops); > > if (qcom_iommu->local_base) { > pm_runtime_get_sync(dev); > @@ -863,13 +861,6 @@ static int qcom_iommu_device_probe(struct > platform_device *pdev) > } > > return 0; > - > -err_unregister_device: > - iommu_device_unregister(&qcom_iommu->iommu); > - > -err_sysfs_remove: > - iommu_device_sysfs_remove(&qcom_iommu->iommu); > - return ret; > } > > static int qcom_iommu_device_remove(struct platform_device *pdev) > -- > 2.17.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Fallback to default setting when def_domain_type() callback returns 0
On Tue, Jul 6, 2021 at 5:27 PM Robin Murphy wrote: > > On 2021-07-06 07:51, Kai-Heng Feng wrote: > > Commit 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted > > device into core") not only moved the check for untrusted device to > > IOMMU core, it also introduced a behavioral change by returning > > def_domain_type() directly without checking its return value. That makes > > many devices no longer use the default IOMMU setting. > > > > So revert back to the old behavior which defaults to > > iommu_def_domain_type when driver callback returns 0. > > > > Fixes: 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted > > device into core") > > Are you sure about that? From that same commit: > > @@ -1507,7 +1509,7 @@ static int iommu_alloc_default_domain(struct > iommu_group *group, > if (group->default_domain) > return 0; > > - type = iommu_get_def_domain_type(dev); > + type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type; > > return iommu_group_alloc_default_domain(dev->bus, group, type); > } > > AFAICS the other two callers should also handle 0 correctly. Have you > seen a problem in practice? Thanks for pointing out how the return value is being handled by the callers. However, the same check is missing in probe_get_default_domain_type(): static int probe_get_default_domain_type(struct device *dev, void *data) { struct __group_domain_type *gtype = data; unsigned int type = iommu_get_def_domain_type(dev); ... } I personally prefer the old way instead of open coding with ternary operator, so I'll do that in v2. In practice, this causes a kernel panic when probing Realtek WiFi. Because of the bug, dma_ops isn't set by probe_finalize(), dma_map_single() falls back to swiotlb which isn't set and caused a kernel panic. I didn't attach the panic log because the system simply is frozen at that point so the message is not logged to the storage. I'll see if I can find another way to collect the log and attach it in v2. Kai-Heng > > Robin. > > > Signed-off-by: Kai-Heng Feng > > --- > > drivers/iommu/iommu.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 5419c4b9f27a..faac4f795025 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -1507,14 +1507,15 @@ EXPORT_SYMBOL_GPL(fsl_mc_device_group); > > static int iommu_get_def_domain_type(struct device *dev) > > { > > const struct iommu_ops *ops = dev->bus->iommu_ops; > > + unsigned int type = 0; > > > > if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted) > > return IOMMU_DOMAIN_DMA; > > > > if (ops->def_domain_type) > > - return ops->def_domain_type(dev); > > + type = ops->def_domain_type(dev); > > > > - return 0; > > + return (type == 0) ? iommu_def_domain_type : type; > > } > > > > static int iommu_group_alloc_default_domain(struct bus_type *bus, > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On 2021-07-06 15:05, Christoph Hellwig wrote: On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote: FWIW I was pondering the question of whether to do something along those lines or just scrap the default assignment entirely, so since I hadn't got round to saying that I've gone ahead and hacked up the alternative (similarly untested) for comparison :) TBH I'm still not sure which one I prefer... Claire did implement something like your suggestion originally, but I don't really like it as it doesn't scale for adding multiple global pools, e.g. for the 64-bit addressable one for the various encrypted secure guest schemes. Ah yes, that had slipped my mind, and it's a fair point indeed. Since we're not concerned with a minimal fix for backports anyway I'm more than happy to focus on Will's approach. Another thing is that that looks to take us a quiet step closer to the possibility of dynamically resizing a SWIOTLB pool, which is something that some of the hypervisor protection schemes looking to build on top of this series may want to explore at some point. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote: > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote: > > FWIW I was pondering the question of whether to do something along those > > lines or just scrap the default assignment entirely, so since I hadn't got > > round to saying that I've gone ahead and hacked up the alternative > > (similarly untested) for comparison :) > > > > TBH I'm still not sure which one I prefer... > > Claire did implement something like your suggestion originally, but > I don't really like it as it doesn't scale for adding multiple global > pools, e.g. for the 64-bit addressable one for the various encrypted > secure guest schemes. Couple of things: - I am not pushing to Linus the Claire's patchset until we have a resolution on this. I hope you all agree that is a sensible way forward as much as I hate doing that. - I like Robin's fix as it is simplest looking. Would love to see if it does fix the problem. - Christopher - we can always add multiple pools as the next milestone and just focus on this feature getting tested extensively during this release. - Would it be worth (for future or maybe in another tiny fix) to also add a printk in swiotlb when we de-allocate the buffer so when someone looks through the `dmesg` it becomes much easier to diagnose issues? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 3/5] drm/msm: Improve the a6xx page fault handler
In-Reply-To: <20210610214431.539029-4-robdcl...@gmail.com> On Thu 10 Jun 16:44 CDT 2021, Rob Clark wrote: > From: Jordan Crouse > > Use the new adreno-smmu-priv fault info function to get more SMMU > debug registers and print the current TTBR0 to debug per-instance > pagetables and figure out which GPU block generated the request. > > Signed-off-by: Jordan Crouse > Signed-off-by: Rob Clark > --- > drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 +- > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 76 +-- > drivers/gpu/drm/msm/msm_iommu.c | 11 +++- > drivers/gpu/drm/msm/msm_mmu.h | 4 +- > 4 files changed, 87 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > index f46562c12022..eb030b00bff4 100644 > --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > @@ -1075,7 +1075,7 @@ bool a5xx_idle(struct msm_gpu *gpu, struct > msm_ringbuffer *ring) > return true; > } > > -static int a5xx_fault_handler(void *arg, unsigned long iova, int flags) > +static int a5xx_fault_handler(void *arg, unsigned long iova, int flags, void > *data) > { > struct msm_gpu *gpu = arg; > pr_warn_ratelimited("*** gpu fault: iova=%08lx, flags=%d > (%u,%u,%u,%u)\n", > @@ -1085,7 +1085,7 @@ static int a5xx_fault_handler(void *arg, unsigned long > iova, int flags) > gpu_read(gpu, REG_A5XX_CP_SCRATCH_REG(6)), > gpu_read(gpu, REG_A5XX_CP_SCRATCH_REG(7))); > > - return -EFAULT; > + return 0; > } > > static void a5xx_cp_err_irq(struct msm_gpu *gpu) > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > index c7f0ddb12d8f..fc19db10bff1 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > @@ -1032,18 +1032,88 @@ static void a6xx_recover(struct msm_gpu *gpu) > msm_gpu_hw_init(gpu); > } > > -static int a6xx_fault_handler(void *arg, unsigned long iova, int flags) > +static const char *a6xx_uche_fault_block(struct msm_gpu *gpu, u32 mid) > +{ > + static const char *uche_clients[7] = { > + "VFD", "SP", "VSC", "VPC", "HLSQ", "PC", "LRZ", > + }; > + u32 val; > + > + if (mid < 1 || mid > 3) > + return "UNKNOWN"; > + > + /* > + * The source of the data depends on the mid ID read from FSYNR1. > + * and the client ID read from the UCHE block > + */ > + val = gpu_read(gpu, REG_A6XX_UCHE_CLIENT_PF); > + > + /* mid = 3 is most precise and refers to only one block per client */ > + if (mid == 3) > + return uche_clients[val & 7]; > + > + /* For mid=2 the source is TP or VFD except when the client id is 0 */ > + if (mid == 2) > + return ((val & 7) == 0) ? "TP" : "TP|VFD"; > + > + /* For mid=1 just return "UCHE" as a catchall for everything else */ > + return "UCHE"; > +} > + > +static const char *a6xx_fault_block(struct msm_gpu *gpu, u32 id) > +{ > + if (id == 0) > + return "CP"; > + else if (id == 4) > + return "CCU"; > + else if (id == 6) > + return "CDP Prefetch"; > + > + return a6xx_uche_fault_block(gpu, id); > +} > + > +#define ARM_SMMU_FSR_TF BIT(1) > +#define ARM_SMMU_FSR_PF BIT(3) > +#define ARM_SMMU_FSR_EF BIT(4) > + > +static int a6xx_fault_handler(void *arg, unsigned long iova, int flags, void > *data) > { > struct msm_gpu *gpu = arg; > + struct adreno_smmu_fault_info *info = data; > + const char *type = "UNKNOWN"; > > - pr_warn_ratelimited("*** gpu fault: iova=%08lx, flags=%d > (%u,%u,%u,%u)\n", > + /* > + * Print a default message if we couldn't get the data from the > + * adreno-smmu-priv > + */ > + if (!info) { > + pr_warn_ratelimited("*** gpu fault: iova=%.16lx flags=%d > (%u,%u,%u,%u)\n", > iova, flags, > gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4)), > gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(5)), > gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(6)), > gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(7))); > > - return -EFAULT; > + return 0; > + } > + > + if (info->fsr & ARM_SMMU_FSR_TF) > + type = "TRANSLATION"; > + else if (info->fsr & ARM_SMMU_FSR_PF) > + type = "PERMISSION"; > + else if (info->fsr & ARM_SMMU_FSR_EF) > + type = "EXTERNAL"; > + > + pr_warn_ratelimited("*** gpu fault: ttbr0=%.16llx iova=%.16lx dir=%s > type=%s source=%s (%u,%u,%u,%u)\n", > + info->ttbr0, iova, > + flags & IOMMU_FAULT_WRITE ? "WRITE" : "READ", type, > + a6xx_fault_block(gpu, info->fsynr1 & 0xff), > + gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4))
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote: > FWIW I was pondering the question of whether to do something along those > lines or just scrap the default assignment entirely, so since I hadn't got > round to saying that I've gone ahead and hacked up the alternative > (similarly untested) for comparison :) > > TBH I'm still not sure which one I prefer... Claire did implement something like your suggestion originally, but I don't really like it as it doesn't scale for adding multiple global pools, e.g. for the 64-bit addressable one for the various encrypted secure guest schemes. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On 2021-07-06 14:24, Will Deacon wrote: On Tue, Jul 06, 2021 at 06:48:48AM +0200, Christoph Hellwig wrote: On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote: So at this point, the AMD IOMMU driver does: swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0; where 'swiotlb' is a global variable indicating whether or not swiotlb is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which will call swiotlb_exit() if 'swiotlb' is false. Now, that used to work fine, because swiotlb_exit() clears 'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I think that all the devices which have successfully probed beforehand will have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem' field. Yeah. I don't think we can do that anymore, and I also think it is a bad idea to start with. I've had a crack at reworking things along the following lines: - io_tlb_default_mem now lives in the BSS, the flexible array member is now a pointer and that part is allocated dynamically (downside of this is an extra indirection to get at the slots). - io_tlb_default_mem.nslabs tells you whether the thing is valid - swiotlb_exit() frees the slots array and clears the rest of the structure to 0. I also extended it to free the actual slabs, but I'm not sure why it wasn't doing that before. So a non-NULL dev->dma_io_tlb_mem should always be valid to follow. FWIW I was pondering the question of whether to do something along those lines or just scrap the default assignment entirely, so since I hadn't got round to saying that I've gone ahead and hacked up the alternative (similarly untested) for comparison :) TBH I'm still not sure which one I prefer... Robin. ->8- diff --git a/drivers/base/core.c b/drivers/base/core.c index ea5b85354526..394abf184c1a 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2847,9 +2847,6 @@ void device_initialize(struct device *dev) defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) dev->dma_coherent = dma_default_coherent; #endif -#ifdef CONFIG_SWIOTLB - dev->dma_io_tlb_mem = io_tlb_default_mem; -#endif } EXPORT_SYMBOL_GPL(device_initialize); diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 39284ff2a6cd..620f16d89a98 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -107,16 +107,21 @@ struct io_tlb_mem { }; extern struct io_tlb_mem *io_tlb_default_mem; +static inline struct io_tlb_mem *dev_iotlb_mem(struct device *dev) +{ + return dev->dma_io_tlb_mem ?: io_tlb_default_mem; +} + static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev_iotlb_mem(dev); return mem && paddr >= mem->start && paddr < mem->end; } static inline bool is_swiotlb_force_bounce(struct device *dev) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev_iotlb_mem(dev); return mem && mem->force_bounce; } @@ -167,7 +172,7 @@ bool swiotlb_free(struct device *dev, struct page *page, size_t size); static inline bool is_swiotlb_for_alloc(struct device *dev) { - return dev->dma_io_tlb_mem->for_alloc; + return dev_iotlb_mem(dev)->for_alloc; } #else static inline struct page *swiotlb_alloc(struct device *dev, size_t size) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index b7f76bca89bf..f4942149f87d 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -359,7 +359,7 @@ static unsigned int swiotlb_align_offset(struct device *dev, u64 addr) static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev_iotlb_mem(dev); int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT; phys_addr_t orig_addr = mem->slots[index].orig_addr; size_t alloc_size = mem->slots[index].alloc_size; @@ -440,7 +440,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, size_t alloc_size) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev_iotlb_mem(dev); unsigned long boundary_mask = dma_get_seg_boundary(dev); dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, mem->start) & boundary_mask; @@ -522,7 +522,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, size_t mapping_size, size_t alloc_size, enum dma_data_direction dir, unsigned long attrs) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev_iotlb_mem(dev); unsigned int offset = swiotlb_al
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Tue, Jul 06, 2021 at 06:48:48AM +0200, Christoph Hellwig wrote: > On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote: > > So at this point, the AMD IOMMU driver does: > > > > swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0; > > > > where 'swiotlb' is a global variable indicating whether or not swiotlb > > is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which > > will call swiotlb_exit() if 'swiotlb' is false. > > > > Now, that used to work fine, because swiotlb_exit() clears > > 'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I > > think that all the devices which have successfully probed beforehand will > > have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem' > > field. > > Yeah. I don't think we can do that anymore, and I also think it is > a bad idea to start with. I've had a crack at reworking things along the following lines: - io_tlb_default_mem now lives in the BSS, the flexible array member is now a pointer and that part is allocated dynamically (downside of this is an extra indirection to get at the slots). - io_tlb_default_mem.nslabs tells you whether the thing is valid - swiotlb_exit() frees the slots array and clears the rest of the structure to 0. I also extended it to free the actual slabs, but I'm not sure why it wasn't doing that before. So a non-NULL dev->dma_io_tlb_mem should always be valid to follow. Untested diff below... Nathan, it would be ace if you're brave enough to give this a shot. Will --->8 diff --git a/drivers/base/core.c b/drivers/base/core.c index bbad7c559901..9e1218f89e4b 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2820,7 +2820,7 @@ void device_initialize(struct device *dev) dev->dma_coherent = dma_default_coherent; #endif #ifdef CONFIG_SWIOTLB - dev->dma_io_tlb_mem = io_tlb_default_mem; + dev->dma_io_tlb_mem = &io_tlb_default_mem; #endif } EXPORT_SYMBOL_GPL(device_initialize); diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 785ec7e8be01..f06d9b4f1e0f 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -164,7 +164,7 @@ int __ref xen_swiotlb_init(void) int rc = -ENOMEM; char *start; - if (io_tlb_default_mem != NULL) { + if (io_tlb_default_mem.nslabs) { pr_warn("swiotlb buffer already initialized\n"); return -EEXIST; } @@ -547,7 +547,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl, static int xen_swiotlb_dma_supported(struct device *hwdev, u64 mask) { - return xen_phys_to_dma(hwdev, io_tlb_default_mem->end - 1) <= mask; + return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask; } const struct dma_map_ops xen_swiotlb_dma_ops = { diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 39284ff2a6cd..b0cb2a9973f4 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -103,9 +103,9 @@ struct io_tlb_mem { phys_addr_t orig_addr; size_t alloc_size; unsigned int list; - } slots[]; + } *slots; }; -extern struct io_tlb_mem *io_tlb_default_mem; +extern struct io_tlb_mem io_tlb_default_mem; static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 0ffbaae9fba2..91cd1d413027 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -70,7 +70,7 @@ enum swiotlb_force swiotlb_force; -struct io_tlb_mem *io_tlb_default_mem; +struct io_tlb_mem io_tlb_default_mem; /* * Max segment that we can provide which (if pages are contingous) will @@ -101,7 +101,7 @@ early_param("swiotlb", setup_io_tlb_npages); unsigned int swiotlb_max_segment(void) { - return io_tlb_default_mem ? max_segment : 0; + return io_tlb_default_mem.nslabs ? max_segment : 0; } EXPORT_SYMBOL_GPL(swiotlb_max_segment); @@ -134,9 +134,9 @@ void __init swiotlb_adjust_size(unsigned long size) void swiotlb_print_info(void) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = &io_tlb_default_mem; - if (!mem) { + if (!mem->nslabs) { pr_warn("No low mem\n"); return; } @@ -163,11 +163,11 @@ static inline unsigned long nr_slots(u64 val) */ void __init swiotlb_update_mem_attributes(void) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = &io_tlb_default_mem; void *vaddr; unsigned long bytes; - if (!mem || mem->late_alloc) + if (!mem->nslabs || mem->late_alloc) return; vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); @@ -201,25 +201,24 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, int __init swiotlb_init_with_tbl(ch
Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
On Tue, Jul 06, 2021 at 11:04:18AM +0800, Yongji Xie wrote: > On Mon, Jul 5, 2021 at 8:50 PM Stefan Hajnoczi wrote: > > > > On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote: > > > > > > 在 2021/7/4 下午5:49, Yongji Xie 写道: > > > > > > OK, I get you now. Since the VIRTIO specification says "Device > > > > > > configuration space is generally used for rarely-changing or > > > > > > initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG > > > > > > ioctl should not be called frequently. > > > > > The spec uses MUST and other terms to define the precise requirements. > > > > > Here the language (especially the word "generally") is weaker and > > > > > means > > > > > there may be exceptions. > > > > > > > > > > Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG > > > > > approach is reads that have side-effects. For example, imagine a field > > > > > containing an error code if the device encounters a problem unrelated > > > > > to > > > > > a specific virtqueue request. Reading from this field resets the error > > > > > code to 0, saving the driver an extra configuration space write access > > > > > and possibly race conditions. It isn't possible to implement those > > > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it > > > > > makes me think that the interface does not allow full VIRTIO > > > > > semantics. > > > > > > > > > Note that though you're correct, my understanding is that config space is > > > not suitable for this kind of error propagating. And it would be very hard > > > to implement such kind of semantic in some transports. Virtqueue should > > > be > > > much better. As Yong Ji quoted, the config space is used for > > > "rarely-changing or intialization-time parameters". > > > > > > > > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to > > > > handle the message failure, I'm going to add a return value to > > > > virtio_config_ops.get() and virtio_cread_* API so that the error can > > > > be propagated to the virtio device driver. Then the virtio-blk device > > > > driver can be modified to handle that. > > > > > > > > Jason and Stefan, what do you think of this way? > > > > Why does VDUSE_DEV_GET_CONFIG need to support an error return value? > > > > We add a timeout and return error in case userspace never replies to > the message. > > > The VIRTIO spec provides no way for the device to report errors from > > config space accesses. > > > > The QEMU virtio-pci implementation returns -1 from invalid > > virtio_config_read*() and silently discards virtio_config_write*() > > accesses. > > > > VDUSE can take the same approach with > > VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG. > > > > I noticed that virtio_config_read*() only returns -1 when we access a > invalid field. But in the VDUSE case, VDUSE_DEV_GET_CONFIG might fail > when we access a valid field. Not sure if it's ok to silently ignore > this kind of error. That's a good point but it's a general VIRTIO issue. Any device implementation (QEMU userspace, hardware vDPA, etc) can fail, so the VIRTIO specification needs to provide a way for the driver to detect this. If userspace violates the contract then VDUSE needs to mark the device broken. QEMU's device emulation does something similar with the vdev->broken flag. The VIRTIO Device Status field DEVICE_NEEDS_RESET bit can be set by vDPA/VDUSE to indicate that the device is not operational and must be reset. The driver code may still process the -1 value read from the configuration space. Hopefully this isn't a problem. There is currently no VIRTIO interface besides DEVICE_NEEDS_RESET to indicate configuration space access failures. On the other hand, drivers need to handle malicious devices so they should be able to cope with the -1 value anyway. Stefan signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
On Tue, Jul 06, 2021 at 10:34:33AM +0800, Jason Wang wrote: > > 在 2021/7/5 下午8:49, Stefan Hajnoczi 写道: > > On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote: > > > 在 2021/7/4 下午5:49, Yongji Xie 写道: > > > > > > OK, I get you now. Since the VIRTIO specification says "Device > > > > > > configuration space is generally used for rarely-changing or > > > > > > initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG > > > > > > ioctl should not be called frequently. > > > > > The spec uses MUST and other terms to define the precise requirements. > > > > > Here the language (especially the word "generally") is weaker and > > > > > means > > > > > there may be exceptions. > > > > > > > > > > Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG > > > > > approach is reads that have side-effects. For example, imagine a field > > > > > containing an error code if the device encounters a problem unrelated > > > > > to > > > > > a specific virtqueue request. Reading from this field resets the error > > > > > code to 0, saving the driver an extra configuration space write access > > > > > and possibly race conditions. It isn't possible to implement those > > > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it > > > > > makes me think that the interface does not allow full VIRTIO > > > > > semantics. > > > > > > Note that though you're correct, my understanding is that config space is > > > not suitable for this kind of error propagating. And it would be very hard > > > to implement such kind of semantic in some transports. Virtqueue should > > > be > > > much better. As Yong Ji quoted, the config space is used for > > > "rarely-changing or intialization-time parameters". > > > > > > > > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to > > > > handle the message failure, I'm going to add a return value to > > > > virtio_config_ops.get() and virtio_cread_* API so that the error can > > > > be propagated to the virtio device driver. Then the virtio-blk device > > > > driver can be modified to handle that. > > > > > > > > Jason and Stefan, what do you think of this way? > > Why does VDUSE_DEV_GET_CONFIG need to support an error return value? > > > > The VIRTIO spec provides no way for the device to report errors from > > config space accesses. > > > > The QEMU virtio-pci implementation returns -1 from invalid > > virtio_config_read*() and silently discards virtio_config_write*() > > accesses. > > > > VDUSE can take the same approach with > > VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG. > > > > > I'd like to stick to the current assumption thich get_config won't fail. > > > That is to say, > > > > > > 1) maintain a config in the kernel, make sure the config space read can > > > always succeed > > > 2) introduce an ioctl for the vduse usersapce to update the config space. > > > 3) we can synchronize with the vduse userspace during set_config > > > > > > Does this work? > > I noticed that caching is also allowed by the vhost-user protocol > > messages (QEMU's docs/interop/vhost-user.rst), but the device doesn't > > know whether or not caching is in effect. The interface you outlined > > above requires caching. > > > > Is there a reason why the host kernel vDPA code needs to cache the > > configuration space? > > > Because: > > 1) Kernel can not wait forever in get_config(), this is the major difference > with vhost-user. virtio_cread() can sleep: #define virtio_cread(vdev, structname, member, ptr) \ do {\ typeof(((structname*)0)->member) virtio_cread_v;\ \ might_sleep(); \ ^^ Which code path cannot sleep? > 2) Stick to the current assumption that virtio_cread() should always > succeed. That can be done by reading -1 (like QEMU does) when the read fails. Stefan signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: Fallback to default setting when def_domain_type() callback returns 0
On 2021-07-06 07:51, Kai-Heng Feng wrote: Commit 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted device into core") not only moved the check for untrusted device to IOMMU core, it also introduced a behavioral change by returning def_domain_type() directly without checking its return value. That makes many devices no longer use the default IOMMU setting. So revert back to the old behavior which defaults to iommu_def_domain_type when driver callback returns 0. Fixes: 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted device into core") Are you sure about that? From that same commit: @@ -1507,7 +1509,7 @@ static int iommu_alloc_default_domain(struct iommu_group *group, if (group->default_domain) return 0; - type = iommu_get_def_domain_type(dev); + type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type; return iommu_group_alloc_default_domain(dev->bus, group, type); } AFAICS the other two callers should also handle 0 correctly. Have you seen a problem in practice? Robin. Signed-off-by: Kai-Heng Feng --- drivers/iommu/iommu.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 5419c4b9f27a..faac4f795025 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1507,14 +1507,15 @@ EXPORT_SYMBOL_GPL(fsl_mc_device_group); static int iommu_get_def_domain_type(struct device *dev) { const struct iommu_ops *ops = dev->bus->iommu_ops; + unsigned int type = 0; if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted) return IOMMU_DOMAIN_DMA; if (ops->def_domain_type) - return ops->def_domain_type(dev); + type = ops->def_domain_type(dev); - return 0; + return (type == 0) ? iommu_def_domain_type : type; } static int iommu_group_alloc_default_domain(struct bus_type *bus, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 1/1] dma-debug: fix check_for_illegal_area() in debug_dma_map_sg()
On 2021-07-05 19:52, Gerald Schaefer wrote: The following warning occurred sporadically on s390: DMA-API: nvme 0006:00:00.0: device driver maps memory from kernel text or rodata [addr=48cc5e2f] [len=131072] WARNING: CPU: 4 PID: 825 at kernel/dma/debug.c:1083 check_for_illegal_area+0xa8/0x138 It is a false-positive warning, due to a broken logic in debug_dma_map_sg(). check_for_illegal_area() should check for overlay of sg elements with kernel text or rodata. It is called with sg_dma_len(s) instead of s->length as parameter. After the call to ->map_sg(), sg_dma_len() contains the length of possibly combined sg elements in the DMA address space, and not the individual sg element length, which would be s->length. The check will then use the kernel start address of an sg element, and add the DMA length for overlap check, which can result in the false-positive warning because the DMA length can be larger than the actual single sg element length in kernel address space. In addition, the call to check_for_illegal_area() happens in the iteration over mapped_ents, which will not include all individual sg elements if any of them were combined in ->map_sg(). Fix this by using s->length instead of sg_dma_len(s). Also put the call to check_for_illegal_area() in a separate loop, iterating over all the individual sg elements ("nents" instead of "mapped_ents"). Fixes: 884d05970bfb ("dma-debug: use sg_dma_len accessor") Tested-by: Niklas Schnelle Signed-off-by: Gerald Schaefer --- kernel/dma/debug.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 14de1271463f..d7d44b7fe7e2 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -1299,6 +1299,12 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg, if (unlikely(dma_debug_disabled())) return; + for_each_sg(sg, s, nents, i) { + if (!PageHighMem(sg_page(s))) { + check_for_illegal_area(dev, sg_virt(s), s->length); + } + } + for_each_sg(sg, s, mapped_ents, i) { entry = dma_entry_alloc(); if (!entry) @@ -1316,10 +1322,6 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg, check_for_stack(dev, sg_page(s), s->offset); Strictly this should probably be moved to the new loop as well, as it is similarly concerned with validating the source segments rather than the DMA mappings - I think with virtually-mapped stacks it might technically be possible for a stack page to be physically adjacent to a "valid" page such that it could get merged and overlooked if it were near the end of the list, although in fairness that would probably be indicative of something having gone far more fundamentally wrong. Otherwise, the overall reasoning looks sound to me. Robin. - if (!PageHighMem(sg_page(s))) { - check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s)); - } - check_sg_segment(dev, s); add_dma_entry(entry); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu