Re: Summary of LPC guest MSI discussion in Santa Fe
Hi Will, On 08/11/2016 20:02, Don Dutile wrote: > On 11/08/2016 12:54 PM, Will Deacon wrote: >> On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote: >>> On 08/11/2016 03:45, Will Deacon wrote: Rather than treat these as separate problems, a better interface is to tell userspace about a set of reserved regions, and have this include the MSI doorbell, irrespective of whether or not it can be remapped. Don suggested that we statically pick an address for the doorbell in a similar way to x86, and have the kernel map it there. We could even pick 0xfee0. If it conflicts with a reserved region on the platform (due to (4)), then we'd obviously have to (deterministically?) allocate it somewhere else, but probably within the bottom 4G. >>> This is tentatively achieved now with >>> [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II >>> (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html) >>> >> Yup, I saw that fly by. Hopefully some of the internals can be reused >> with the current thinking on user ABI. >> The next question is how to tell userspace about all of the reserved regions. Initially, the idea was to extend VFIO, however Alex pointed out a horrible scenario: 1. QEMU spawns a VM on system 0 2. VM is migrated to system 1 3. QEMU attempts to passthrough a device using PCI hotplug In this scenario, the guest memory map is chosen at step (1), yet there is no VFIO fd available to determine the reserved regions. Furthermore, the reserved regions may vary between system 0 and system 1. This pretty much rules out using VFIO to determine the reserved regions.Alex suggested that the SMMU driver can advertise the regions via /sys/class/iommu/. This would solve part of the problem, but migration between systems with different memory maps can still cause problems if the reserved regions of the new system conflict with the guest memory map chosen by QEMU. >>> >>> OK so I understand we do not want anymore the VFIO chain capability API >>> (patch 5 of above series) but we prefer a sysfs approach instead. >> Right. >> >>> I understand the sysfs approach which allows the userspace to get the >>> info earlier and independently on VFIO. Keeping in mind current QEMU >>> virt - which is not the only userspace - will not do much from this info >>> until we bring upheavals in virt address space management. So if I am >>> not wrong, at the moment the main action to be undertaken is the >>> rejection of the PCI hotplug in case we detect a collision. >> I don't think so; it should be up to userspace to reject the hotplug. >> If userspace doesn't have support for the regions, then that's fine -- >> you just end up in a situation where the CPU page table maps memory >> somewhere that the device can't see. In other words, you'll end up with >> spurious DMA failures, but that's exactly what happens with current >> systems >> if you passthrough an overlapping region (Robin demonstrated this on >> Juno). >> >> Additionally, you can imagine some future support where you can tell the >> guest not to use certain regions of its memory for DMA. In this case, you >> wouldn't want to refuse the hotplug in the case of overlapping regions. >> >> Really, I think the kernel side just needs to enumerate the fixed >> reserved >> regions, place the doorbell at a fixed address and then advertise these >> via sysfs. >> >>> I can respin [1] >>> - studying and taking into account Robin's comments about dm_regions >>> similarities >>> - removing the VFIO capability chain and replacing this by a sysfs API >> Ideally, this would be reusable between different SMMU drivers so the >> sysfs >> entries have the same format etc. >> >>> Would that be OK? >> Sounds good to me. Are you in a position to prototype something on the >> qemu >> side once we've got kernel-side agreement? yes sure. >> >>> What about Alex comments who wanted to report the usable memory ranges >>> instead of unusable memory ranges? >>> >>> Also did you have a chance to discuss the following items: >>> 1) the VFIO irq safety assessment >> The discussion really focussed on system topology, as opposed to >> properties >> of the doorbell. Regardless of how the device talks to the doorbell, if >> the doorbell can't protect against things like MSI spoofing, then it's >> unsafe. My opinion is that we shouldn't allow passthrough by default on >> systems with unsafe doorbells (we could piggyback on >> allow_unsafe_interrupts >> cmdline option to VFIO). OK. >> >> A first step would be making all this opt-in, and only supporting GICv3 >> ITS for now. > You're trying to support a config that is < GICv3 and no ITS ? ... > That would be the equiv. of x86 pre-intr-remap, and that's why > allow_unsafe_interrupts > hook was created ... to enable devel/kick-the-tires. >>> 2) the MSI reserved size computation (is an arbitr
RE: [PATCH V3 0/8] IOMMU probe deferral support
Hi Robin, >On 04/11/16 15:16, Sricharan wrote: >> Hi Robin, >> > Yikes, on second look, that definitely shouldn't be happening. > Everything below is probably the resulting fallout. [ 40.206703] vfio-pci :08:00.0: Failed to setup iommu ops I think the above print which says "failed to setup iommu_ops" because the call ops->add_device failed in of_pci_iommu_configure is the reason for the failure, in my case i simply do not get this even with your scripts. ops->add_device succeeds in the rebind as well. So still checking what could be happening in your case. >>> >>> I was looking at your code base from [1].The ops->add_device >>> callback from of_pci_iommu_configure on the rebind is the >>> one which is causing the failure. But not able to spot out >>>from code which point is causing the failure. It would be very helpful >>> if i can know which is the return value from the add_device callback >>> or point inside add_device callback which fails in your setup. >>> >>> >>> [1] git://linux-arm.org/linux-rm iommu/misc >> >> With little more try, i saw an issue where i had an failure >> similar to what you reported. The issue happens when multiple >> devices fall in to same group due to matching sids. I ended up >> doing a fix like below and it would be nice to verify if it is the same >> that we are seeing in your setup and if the fix makes a difference ? >> >> From: Sricharan R >> Date: Fri, 4 Nov 2016 20:28:49 +0530 >> Subject: [PATCH] iommu/arm-smmu: Fix group's reference counting >> >> iommu_group_get_for_dev which gets called in the add_device >> callback, increases the reference count of the iommu_group, >> so we do an iommu_group_put after that. iommu_group_get_for_dev >> inturn calls device_group callback and in the case of arm-smmu >> we call generic_device_group/pci_device_group which takes >> care of increasing the group's reference. But when we return >> an already existing group(when multiple devices have same group) >> the reference is not incremented, resulting in issues when the >> remove_device callback for the devices is invoked. >> Fixing the same here. > >Bah, yes, this does look like my fault - after flip-flopping between >about 3 different ways to keep refcounts for the S2CR entries, none of >which would quite work, I ripped it all out but apparently still got >things wrong, oh well. Thanks for figuring it out. > >On the probe-deferral angle, whilst it's useful to have uncovered this >bug, I don't think we should actually be calling remove_device() from >DMA teardown. I think it's preferable from a user perspective if group >numbering remains stable, rather than changing depending on the order in >which they unbind/rebind VFIO drivers. I'm really keen to try and get >this in shape for 4.10, so I've taken the liberty of hacking up my own >branch (iommu/defer) based on v3 - would you mind taking a look at the >two "iommu/of:" commits to see what you think? (Ignore the PCI changes >to your later patches - that was an experiment which didn't really work out) Ok, will take a look at this now and respond more on this. > >> Signed-off-by: Sricharan R >> --- >> drivers/iommu/arm-smmu.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >> index 71ce4b6..a1d0b3c 100644 >> --- a/drivers/iommu/arm-smmu.c >> +++ b/drivers/iommu/arm-smmu.c >> @@ -1516,8 +1516,10 @@ static struct iommu_group >> *arm_smmu_device_group(struct device *dev) >> group = smmu->s2crs[idx].group; >> } >> >> -if (group) >> +if (group) { >> +iommu_group_get_by_id(iommu_group_id(group)); >> return group; > >This might as well just be inline, i.e.: > > return iommu_group_get_by_id(iommu_group_id(group)); > >It's a shame we have to go all round the houses when we have the group >right there, but this is probably the most expedient fix. I guess we can >extend the API with some sort of iommu_group_get(group) overload in >future if we really want to. > ok, i can send this fix separately then. Otherwise, Will was suggesting on the other thread that there should probably be a separate API to increment the group refcount or get the group from the existing aliasing device. As per me adding the api, looks like another option or post the above ? Regards, Sricharan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[v16, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
The eSDHC of T4240-R1.0-R2.0 has incorrect vender version and spec version. Acturally the right version numbers should be VVN=0x13 and SVN = 0x1. This patch adds the GUTS driver support for eSDHC driver to match SoC. And fix host version to avoid that incorrect version numbers break down the ADMA data transfer. Signed-off-by: Yangbo Lu Acked-by: Ulf Hansson Acked-by: Scott Wood Acked-by: Arnd Bergmann --- Changes for v2: - Got SVR through iomap instead of dts Changes for v3: - Managed GUTS through syscon instead of iomap in eSDHC driver Changes for v4: - Got SVR by GUTS driver instead of SYSCON Changes for v5: - Changed to get SVR through API fsl_guts_get_svr() - Combined patch 4, patch 5 and patch 6 into one Changes for v6: - Added 'Acked-by: Ulf Hansson' Changes for v7: - None Changes for v8: - Added 'Acked-by: Scott Wood' Changes for v9: - None Changes for v10: - None Changes for v11: - Changed to use soc_device_match Changes for v12: - Matched soc through .family field instead of .soc_id Changes for v13: - None Changes for v14: - None Changes for v15: - None Changes for v16: - Added 'Acked-by: Arnd' --- drivers/mmc/host/Kconfig | 1 + drivers/mmc/host/sdhci-of-esdhc.c | 20 2 files changed, 21 insertions(+) diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index 5cf7eba..4128a3c 100644 --- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -144,6 +144,7 @@ config MMC_SDHCI_OF_ESDHC depends on MMC_SDHCI_PLTFM depends on PPC || ARCH_MXC || ARCH_LAYERSCAPE select MMC_SDHCI_IO_ACCESSORS + select FSL_GUTS help This selects the Freescale eSDHC controller support. diff --git a/drivers/mmc/host/sdhci-of-esdhc.c b/drivers/mmc/host/sdhci-of-esdhc.c index fb71c86..57bdb9e 100644 --- a/drivers/mmc/host/sdhci-of-esdhc.c +++ b/drivers/mmc/host/sdhci-of-esdhc.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include "sdhci-pltfm.h" #include "sdhci-esdhc.h" @@ -28,6 +29,7 @@ struct sdhci_esdhc { u8 vendor_ver; u8 spec_ver; + bool quirk_incorrect_hostver; }; /** @@ -73,6 +75,8 @@ static u32 esdhc_readl_fixup(struct sdhci_host *host, static u16 esdhc_readw_fixup(struct sdhci_host *host, int spec_reg, u32 value) { + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); + struct sdhci_esdhc *esdhc = sdhci_pltfm_priv(pltfm_host); u16 ret; int shift = (spec_reg & 0x2) * 8; @@ -80,6 +84,12 @@ static u16 esdhc_readw_fixup(struct sdhci_host *host, ret = value & 0x; else ret = (value >> shift) & 0x; + /* Workaround for T4240-R1.0-R2.0 eSDHC which has incorrect +* vendor version and spec version information. +*/ + if ((spec_reg == SDHCI_HOST_VERSION) && + (esdhc->quirk_incorrect_hostver)) + ret = (VENDOR_V_23 << SDHCI_VENDOR_VER_SHIFT) | SDHCI_SPEC_200; return ret; } @@ -558,6 +568,12 @@ static const struct sdhci_pltfm_data sdhci_esdhc_le_pdata = { .ops = &sdhci_esdhc_le_ops, }; +static struct soc_device_attribute soc_incorrect_hostver[] = { + { .family = "QorIQ T4240", .revision = "1.0", }, + { .family = "QorIQ T4240", .revision = "2.0", }, + { }, +}; + static void esdhc_init(struct platform_device *pdev, struct sdhci_host *host) { struct sdhci_pltfm_host *pltfm_host; @@ -571,6 +587,10 @@ static void esdhc_init(struct platform_device *pdev, struct sdhci_host *host) esdhc->vendor_ver = (host_ver & SDHCI_VENDOR_VER_MASK) >> SDHCI_VENDOR_VER_SHIFT; esdhc->spec_ver = host_ver & SDHCI_SPEC_VER_MASK; + if (soc_device_match(soc_incorrect_hostver)) + esdhc->quirk_incorrect_hostver = true; + else + esdhc->quirk_incorrect_hostver = false; } static int sdhci_esdhc_probe(struct platform_device *pdev) -- 2.1.0.27.g96db324 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[v16, 3/7] soc: fsl: add GUTS driver for QorIQ platforms
The global utilities block controls power management, I/O device enabling, power-onreset(POR) configuration monitoring, alternate function selection for multiplexed signals,and clock control. This patch adds a driver to manage and access global utilities block. Initially only reading SVR and registering soc device are supported. Other guts accesses, such as reading RCW, should eventually be moved into this driver as well. Signed-off-by: Yangbo Lu Acked-by: Arnd Bergmann --- Changes for v4: - Added this patch Changes for v5: - Modified copyright info - Changed MODULE_LICENSE to GPL - Changed EXPORT_SYMBOL_GPL to EXPORT_SYMBOL - Made FSL_GUTS user-invisible - Added a complete compatible list for GUTS - Stored guts info in file-scope variable - Added mfspr() getting SVR - Redefined GUTS APIs - Called fsl_guts_init rather than using platform driver - Removed useless parentheses - Removed useless 'extern' key words Changes for v6: - Made guts thread safe in fsl_guts_init Changes for v7: - Removed 'ifdef' for function declaration in guts.h Changes for v8: - Fixes lines longer than 80 characters checkpatch issue - Added 'Acked-by: Scott Wood' Changes for v9: - None Changes for v10: - None Changes for v11: - Changed to platform driver Changes for v12: - Removed "signed-off-by: Scott" - Defined fsl_soc_die_attr struct array instead of soc_device_attribute - Re-designed soc_device_attribute for QorIQ SoC - Other minor fixes Changes for v13: - Rebased - Removed text after 'bool' in Kconfig - Removed ARCH ifdefs - Added more bits for ls1021a mask - Used devm Changes for v14: - Used devm_ioremap_resource Changes for v15: - Fixed error code for devm_ioremap_resource Changes for v16: - Removed header file svr.h and calculated REV_MAJ/MIN in this driver - Added 'Acked-by: Arnd' --- drivers/soc/Kconfig | 3 +- drivers/soc/fsl/Kconfig | 18 drivers/soc/fsl/Makefile | 1 + drivers/soc/fsl/guts.c | 236 +++ include/linux/fsl/guts.h | 125 +++-- 5 files changed, 333 insertions(+), 50 deletions(-) create mode 100644 drivers/soc/fsl/Kconfig create mode 100644 drivers/soc/fsl/guts.c diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig index e6e90e8..f31bceb 100644 --- a/drivers/soc/Kconfig +++ b/drivers/soc/Kconfig @@ -1,8 +1,7 @@ menu "SOC (System On Chip) specific Drivers" source "drivers/soc/bcm/Kconfig" -source "drivers/soc/fsl/qbman/Kconfig" -source "drivers/soc/fsl/qe/Kconfig" +source "drivers/soc/fsl/Kconfig" source "drivers/soc/mediatek/Kconfig" source "drivers/soc/qcom/Kconfig" source "drivers/soc/rockchip/Kconfig" diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig new file mode 100644 index 000..7a9fb9b --- /dev/null +++ b/drivers/soc/fsl/Kconfig @@ -0,0 +1,18 @@ +# +# Freescale SOC drivers +# + +source "drivers/soc/fsl/qbman/Kconfig" +source "drivers/soc/fsl/qe/Kconfig" + +config FSL_GUTS + bool + select SOC_BUS + help + The global utilities block controls power management, I/O device + enabling, power-onreset(POR) configuration monitoring, alternate + function selection for multiplexed signals,and clock control. + This driver is to manage and access global utilities block. + Initially only reading SVR and registering soc device are supported. + Other guts accesses, such as reading RCW, should eventually be moved + into this driver as well. diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile index 75e1f53..44b3beb 100644 --- a/drivers/soc/fsl/Makefile +++ b/drivers/soc/fsl/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_FSL_DPAA) += qbman/ obj-$(CONFIG_QUICC_ENGINE) += qe/ obj-$(CONFIG_CPM) += qe/ +obj-$(CONFIG_FSL_GUTS) += guts.o diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c new file mode 100644 index 000..0ac8826 --- /dev/null +++ b/drivers/soc/fsl/guts.c @@ -0,0 +1,236 @@ +/* + * Freescale QorIQ Platforms GUTS Driver + * + * Copyright (C) 2016 Freescale Semiconductor, Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +struct guts { + struct ccsr_guts __iomem *regs; + bool little_endian; +}; + +struct fsl_soc_die_attr { + char*die; + u32 svr; + u32 mask; +}; + +static struct guts *guts; +static struct soc_device_attribute soc_dev_attr;
[v16, 6/7] base: soc: Check for NULL SoC device attributes
From: Geert Uytterhoeven If soc_device_match() is used to check the value of a specific attribute that is not present for the current SoC, the kernel crashes with a NULL pointer dereference. Fix this by explicitly checking for the absence of a needed property, and considering this a non-match. Signed-off-by: Geert Uytterhoeven Acked-by: Arnd Bergmann --- Changes for v16: - Added this patch --- drivers/base/soc.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/base/soc.c b/drivers/base/soc.c index 0c5cf87..0e701e2 100644 --- a/drivers/base/soc.c +++ b/drivers/base/soc.c @@ -167,19 +167,23 @@ static int soc_device_match_one(struct device *dev, void *arg) const struct soc_device_attribute *match = arg; if (match->machine && - !glob_match(match->machine, soc_dev->attr->machine)) + (!soc_dev->attr->machine || +!glob_match(match->machine, soc_dev->attr->machine))) return 0; if (match->family && - !glob_match(match->family, soc_dev->attr->family)) + (!soc_dev->attr->family || +!glob_match(match->family, soc_dev->attr->family))) return 0; if (match->revision && - !glob_match(match->revision, soc_dev->attr->revision)) + (!soc_dev->attr->revision || +!glob_match(match->revision, soc_dev->attr->revision))) return 0; if (match->soc_id && - !glob_match(match->soc_id, soc_dev->attr->soc_id)) + (!soc_dev->attr->soc_id || +!glob_match(match->soc_id, soc_dev->attr->soc_id))) return 0; return 1; -- 2.1.0.27.g96db324 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[v16, 0/7] Fix eSDHC host version register bug
This patchset is used to fix a host version register bug in the T4240-R1.0-R2.0 eSDHC controller. To match the SoC version and revision, 15 previous version patchsets had tried many methods but all of them were rejected by reviewers. Such as - dts compatible method - syscon method - ifdef PPC method - GUTS driver getting SVR method Anrd suggested a soc_device_match method in v10, and this is the only available method left now. This v11 patchset introduces the soc_device_match interface in soc driver. The first four patches of Yangbo are to add the GUTS driver. This is used to register a soc device which contain soc version and revision information. The other three patches introduce the soc_device_match method in soc driver and apply it on esdhc driver to fix this bug. --- Changes for v15: - Dropped patch 'dt: bindings: update Freescale DCFG compatible' since the work had been done by below patch on ShawnGuo's linux tree. 'dt-bindings: fsl: add LS1043A/LS1046A/LS2080A compatible for SCFG and DCFG' - Fixed error code issue in guts driver Changes for v16: - Dropped patch 'powerpc/fsl: move mpc85xx.h to include/linux/fsl' - Added a bug-fix patch from Geert --- Arnd Bergmann (1): base: soc: introduce soc_device_match() interface Geert Uytterhoeven (1): base: soc: Check for NULL SoC device attributes Yangbo Lu (5): ARM64: dts: ls2080a: add device configuration node dt: bindings: move guts devicetree doc out of powerpc directory soc: fsl: add GUTS driver for QorIQ platforms MAINTAINERS: add entry for Freescale SoC drivers mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0 .../bindings/{powerpc => soc}/fsl/guts.txt | 3 + MAINTAINERS| 11 +- arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi | 6 + drivers/base/Kconfig | 1 + drivers/base/soc.c | 70 ++ drivers/mmc/host/Kconfig | 1 + drivers/mmc/host/sdhci-of-esdhc.c | 20 ++ drivers/soc/Kconfig| 3 +- drivers/soc/fsl/Kconfig| 18 ++ drivers/soc/fsl/Makefile | 1 + drivers/soc/fsl/guts.c | 236 + include/linux/fsl/guts.h | 125 ++- include/linux/sys_soc.h| 3 + 13 files changed, 447 insertions(+), 51 deletions(-) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%) create mode 100644 drivers/soc/fsl/Kconfig create mode 100644 drivers/soc/fsl/guts.c -- 2.1.0.27.g96db324 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[v16, 2/7] dt: bindings: move guts devicetree doc out of powerpc directory
Move guts devicetree doc to Documentation/devicetree/bindings/soc/fsl/ since it's used by not only PowerPC but also ARM. And add a specification for 'little-endian' property. Signed-off-by: Yangbo Lu Acked-by: Rob Herring Acked-by: Scott Wood Acked-by: Arnd Bergmann --- Changes for v4: - Added this patch Changes for v5: - Modified the description for little-endian property Changes for v6: - None Changes for v7: - None Changes for v8: - Added 'Acked-by: Scott Wood' - Added 'Acked-by: Rob Herring' Changes for v9: - None Changes for v10: - None Changes for v11: - None Changes for v12: - None Changes for v13: - None Changes for v14: - None Changes for v15: - None Changes for v16: - Added 'Acked-by: Arnd' --- Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt | 3 +++ 1 file changed, 3 insertions(+) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%) diff --git a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt b/Documentation/devicetree/bindings/soc/fsl/guts.txt similarity index 91% rename from Documentation/devicetree/bindings/powerpc/fsl/guts.txt rename to Documentation/devicetree/bindings/soc/fsl/guts.txt index b71b203..07adca9 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt +++ b/Documentation/devicetree/bindings/soc/fsl/guts.txt @@ -25,6 +25,9 @@ Recommended properties: - fsl,liodn-bits : Indicates the number of defined bits in the LIODN registers, for those SOCs that have a PAMU device. + - little-endian : Indicates that the global utilities block is little + endian. The default is big endian. + Examples: global-utilities@e {/* global utilities block */ compatible = "fsl,mpc8548-guts"; -- 2.1.0.27.g96db324 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[v16, 5/7] base: soc: introduce soc_device_match() interface
From: Arnd Bergmann We keep running into cases where device drivers want to know the exact version of the a SoC they are currently running on. In the past, this has usually been done through a vendor specific API that can be called by a driver, or by directly accessing some kind of version register that is not part of the device itself but that belongs to a global register area of the chip. Common reasons for doing this include: - A machine is not using devicetree or similar for passing data about on-chip devices, but just announces their presence using boot-time platform devices, and the machine code itself does not care about the revision. - There is existing firmware or boot loaders with existing DT binaries with generic compatible strings that do not identify the particular revision of each device, but the driver knows which SoC revisions include which part. - A prerelease version of a chip has some quirks and we are using the same version of the bootloader and the DT blob on both the prerelease and the final version. An update of the DT binding seems inappropriate because that would involve maintaining multiple copies of the dts and/or bootloader. This patch introduces the soc_device_match() interface that is meant to work like of_match_node() but instead of identifying the version of a device, it identifies the SoC itself using a vendor-agnostic interface. Unlike of_match_node(), we do not do an exact string compare but instead use glob_match() to allow wildcards in strings. Signed-off-by: Arnd Bergmann Signed-off-by: Yangbo Lu Acked-by: Greg Kroah-Hartman --- Changes for v11: - Added this patch for soc match Changes for v12: - Corrected the author - Rewrited soc_device_match with while loop Changes for v13: - Added ack from Greg Changes for v14: - None Changes for v15: - None Changes for v16: - None --- drivers/base/Kconfig| 1 + drivers/base/soc.c | 66 + include/linux/sys_soc.h | 3 +++ 3 files changed, 70 insertions(+) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index d02e7c0..2abea87 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -237,6 +237,7 @@ config GENERIC_CPU_AUTOPROBE config SOC_BUS bool + select GLOB source "drivers/base/regmap/Kconfig" diff --git a/drivers/base/soc.c b/drivers/base/soc.c index b63f23e..0c5cf87 100644 --- a/drivers/base/soc.c +++ b/drivers/base/soc.c @@ -13,6 +13,7 @@ #include #include #include +#include static DEFINE_IDA(soc_ida); @@ -159,3 +160,68 @@ static int __init soc_bus_register(void) return bus_register(&soc_bus_type); } core_initcall(soc_bus_register); + +static int soc_device_match_one(struct device *dev, void *arg) +{ + struct soc_device *soc_dev = container_of(dev, struct soc_device, dev); + const struct soc_device_attribute *match = arg; + + if (match->machine && + !glob_match(match->machine, soc_dev->attr->machine)) + return 0; + + if (match->family && + !glob_match(match->family, soc_dev->attr->family)) + return 0; + + if (match->revision && + !glob_match(match->revision, soc_dev->attr->revision)) + return 0; + + if (match->soc_id && + !glob_match(match->soc_id, soc_dev->attr->soc_id)) + return 0; + + return 1; +} + +/* + * soc_device_match - identify the SoC in the machine + * @matches: zero-terminated array of possible matches + * + * returns the first matching entry of the argument array, or NULL + * if none of them match. + * + * This function is meant as a helper in place of of_match_node() + * in cases where either no device tree is available or the information + * in a device node is insufficient to identify a particular variant + * by its compatible strings or other properties. For new devices, + * the DT binding should always provide unique compatible strings + * that allow the use of of_match_node() instead. + * + * The calling function can use the .data entry of the + * soc_device_attribute to pass a structure or function pointer for + * each entry. + */ +const struct soc_device_attribute *soc_device_match( + const struct soc_device_attribute *matches) +{ + int ret = 0; + + if (!matches) + return NULL; + + while (!ret) { + if (!(matches->machine || matches->family || + matches->revision || matches->soc_id)) + break; + ret = bus_for_each_dev(&soc_bus_type, NULL, (void *)matches, + soc_device_match_one); + if (!ret) + matches++; + else + return matches; + } + return NULL; +} +EXPORT_SYMBOL_GPL(soc_device_match); diff --git a/include/linux/sys_soc.h b/include/linux/sys
[v16, 4/7] MAINTAINERS: add entry for Freescale SoC drivers
Add maintainer entry for Freescale SoC drivers including the QE library and the GUTS driver now. Also add maintainer for QE library. Signed-off-by: Yangbo Lu Acked-by: Scott Wood Acked-by: Qiang Zhao Acked-by: Arnd Bergmann --- Changes for v8: - Added this patch Changes for v9: - Added linux-arm mail list - Removed GUTS driver entry Changes for v10: - Changed 'DRIVER' to 'DRIVERS' - Added 'Acked-by' of Scott and Qiang Changes for v11: - None Changes for v12: - None Changes for v13: - None Changes for v14: - None Changes for v15: - None Changes for v16: - Added 'Acked-by: Arnd' --- MAINTAINERS | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 9be761f..e1a8835 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5045,9 +5045,18 @@ S: Maintained F: drivers/net/ethernet/freescale/fman F: Documentation/devicetree/bindings/powerpc/fsl/fman.txt +FREESCALE SOC DRIVERS +M: Scott Wood +L: linuxppc-...@lists.ozlabs.org +L: linux-arm-ker...@lists.infradead.org +S: Maintained +F: drivers/soc/fsl/ +F: include/linux/fsl/ + FREESCALE QUICC ENGINE LIBRARY +M: Qiang Zhao L: linuxppc-...@lists.ozlabs.org -S: Orphan +S: Maintained F: drivers/soc/fsl/qe/ F: include/soc/fsl/*qe*.h F: include/soc/fsl/*ucc*.h -- 2.1.0.27.g96db324 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[v16, 1/7] ARM64: dts: ls2080a: add device configuration node
Add the dts node for device configuration unit that provides general purpose configuration and status for the device. Signed-off-by: Yangbo Lu Acked-by: Scott Wood Acked-by: Arnd Bergmann --- Changes for v5: - Added this patch Changes for v6: - None Changes for v7: - None Changes for v8: - Added 'Acked-by: Scott Wood' Changes for v9: - None Changes for v10: - None Changes for v11: - None Changes for v12: - None Changes for v13: - None Changes for v14: - None Changes for v15: - None Changes for v16: - Added 'Acked-by: Arnd' --- arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi index 7f0dc13..d058e56 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi @@ -216,6 +216,12 @@ clocks = <&sysclk>; }; + dcfg: dcfg@1e0 { + compatible = "fsl,ls2080a-dcfg", "syscon"; + reg = <0x0 0x1e0 0x0 0x1>; + little-endian; + }; + serial0: serial@21c0500 { compatible = "fsl,ns16550", "ns16550a"; reg = <0x0 0x21c0500 0x0 0x100>; -- 2.1.0.27.g96db324 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Summary of LPC guest MSI discussion in Santa Fe
On 11/08/2016 06:35 PM, Alex Williamson wrote: On Tue, 8 Nov 2016 21:29:22 +0100 Christoffer Dall wrote: Hi Will, On Tue, Nov 08, 2016 at 02:45:59AM +, Will Deacon wrote: Hi all, I figured this was a reasonable post to piggy-back on for the LPC minutes relating to guest MSIs on arm64. On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote: We can always have QEMU reject hot-adding the device if the reserved region overlaps existing guest RAM, but I don't even really see how we advise users to give them a reasonable chance of avoiding that possibility. Apparently there are also ARM platforms where MSI pages cannot be remapped to support the previous programmable user/VM address, is it even worthwhile to support those platforms? Does that decision influence whether user programmable MSI reserved regions are really a second class citizen to fixed reserved regions? I expect we'll be talking about this tomorrow morning, but I certainly haven't come up with any viable solutions to this. Thanks, At LPC last week, we discussed guest MSIs on arm64 as part of the PCI microconference. I presented some slides to illustrate some of the issues we're trying to solve: http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf Punit took some notes (thanks!) on the etherpad here: https://etherpad.openstack.org/p/LPC2016_PCI although the discussion was pretty lively and jumped about, so I've had to go from memory where the notes didn't capture everything that was said. To summarise, arm64 platforms differ in their handling of MSIs when compared to x86: 1. The physical memory map is not standardised (Jon pointed out that this is something that was realised late on) 2. MSIs are usually treated the same as DMA writes, in that they must be mapped by the SMMU page tables so that they target a physical MSI doorbell 3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI doorbell built into the PCI RC) 4. Platforms typically have some set of addresses that abort before reaching the SMMU (e.g. because the PCI identifies them as P2P). All of this means that userspace (QEMU) needs to identify the memory regions corresponding to points (3) and (4) and ensure that they are not allocated in the guest physical (IPA) space. For platforms that can remap the MSI doorbell as in (2), then some space also needs to be allocated for that. Rather than treat these as separate problems, a better interface is to tell userspace about a set of reserved regions, and have this include the MSI doorbell, irrespective of whether or not it can be remapped. Is my understanding correct, that you need to tell userspace about the location of the doorbell (in the IOVA space) in case (2), because even though the configuration of the device is handled by the (host) kernel through trapping of the BARs, we have to avoid the VFIO user programming the device to create other DMA transactions to this particular address, since that will obviously conflict and either not produce the desired DMA transactions or result in unintended weird interrupts? Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then it's potentially a DMA target and we'll get bogus data on DMA read from the device, and lose data and potentially trigger spurious interrupts on DMA write from the device. Thanks, Alex That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e., they address match before the SMMU checks are done. if all DMA addrs had to go through SMMU first, then the DMA access could be ignored/rejected. For bare-metal, memory can't be put in the same place as MSI addrs, or DMA could never reach it. So, only a virt issue, unless the VMs mem address range mimic the host layout. - Don ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II))
On Tue, 8 Nov 2016 21:29:22 +0100 Christoffer Dall wrote: > Hi Will, > > On Tue, Nov 08, 2016 at 02:45:59AM +, Will Deacon wrote: > > Hi all, > > > > I figured this was a reasonable post to piggy-back on for the LPC minutes > > relating to guest MSIs on arm64. > > > > On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote: > > > We can always have QEMU reject hot-adding the device if the reserved > > > region overlaps existing guest RAM, but I don't even really see how we > > > advise users to give them a reasonable chance of avoiding that > > > possibility. Apparently there are also ARM platforms where MSI pages > > > cannot be remapped to support the previous programmable user/VM > > > address, is it even worthwhile to support those platforms? Does that > > > decision influence whether user programmable MSI reserved regions are > > > really a second class citizen to fixed reserved regions? I expect > > > we'll be talking about this tomorrow morning, but I certainly haven't > > > come up with any viable solutions to this. Thanks, > > > > At LPC last week, we discussed guest MSIs on arm64 as part of the PCI > > microconference. I presented some slides to illustrate some of the issues > > we're trying to solve: > > > > http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf > > > > Punit took some notes (thanks!) on the etherpad here: > > > > https://etherpad.openstack.org/p/LPC2016_PCI > > > > although the discussion was pretty lively and jumped about, so I've had > > to go from memory where the notes didn't capture everything that was > > said. > > > > To summarise, arm64 platforms differ in their handling of MSIs when compared > > to x86: > > > > 1. The physical memory map is not standardised (Jon pointed out that > > this is something that was realised late on) > > 2. MSIs are usually treated the same as DMA writes, in that they must be > > mapped by the SMMU page tables so that they target a physical MSI > > doorbell > > 3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI > > doorbell built into the PCI RC) > > 4. Platforms typically have some set of addresses that abort before > > reaching the SMMU (e.g. because the PCI identifies them as P2P). > > > > All of this means that userspace (QEMU) needs to identify the memory > > regions corresponding to points (3) and (4) and ensure that they are > > not allocated in the guest physical (IPA) space. For platforms that can > > remap the MSI doorbell as in (2), then some space also needs to be > > allocated for that. > > > > Rather than treat these as separate problems, a better interface is to > > tell userspace about a set of reserved regions, and have this include > > the MSI doorbell, irrespective of whether or not it can be remapped. > > Is my understanding correct, that you need to tell userspace about the > location of the doorbell (in the IOVA space) in case (2), because even > though the configuration of the device is handled by the (host) kernel > through trapping of the BARs, we have to avoid the VFIO user programming > the device to create other DMA transactions to this particular address, > since that will obviously conflict and either not produce the desired > DMA transactions or result in unintended weird interrupts? Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then it's potentially a DMA target and we'll get bogus data on DMA read from the device, and lose data and potentially trigger spurious interrupts on DMA write from the device. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/2] iommu/exynos: Add proper runtime pm support
On Mon, Oct 10, 2016 at 03:32:06PM +0200, Marek Szyprowski wrote: > Hi Luis > > > On 2016-10-06 19:37, Luis R. Rodriguez wrote: > > On Thu, Sep 29, 2016 at 10:12:31AM +0200, Marek Szyprowski wrote: > > > This patch uses recently introduced device links to track the runtime pm > > > state of the master's device. This way each SYSMMU controller is runtime > > > activated when its master's device is active > > instead of? > > instead of keeping SYSMMU controller runtime active all the time. I thought Rafael's work was for suspend/resume, not for runtime suspend. Is it for both ? Because as far as I can tell this was painted to help with suspend/resume ? > > BTW what is the master device of a SYSMMU? I have no clue about these > > IOMMU devices here. > > Here is a more detailed description of IOMMU hardware I wrote a few days ago > for Ulf: > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1231006.html > > In short: there is a SYSMMU controller and its master device - a device, > which performs DMA operations. That SYSMMU sits in between system memory > and the master device, so it performs mapping of DMA addresses to physical > memory addresses on each DMA operation. So you seek a run time power optimization ? Or a fix on suspend? Or both? > > > and can save/restore its state instead of being enabled all the time. > > I take it this means currently even if the master device is disabled > > (whatever that is) all SYSMMU controllers are kept enabled, is that right? > > The issue here is this wastes power? Or what is the issue? > > Yes, the issue here is the fact that SYSMMU is kept active all the time, > what in turn prevent the power domain for turning off even if master device > doesn't do anything and is already suspended. This directly (some clocks > enabled) and in-directly (leakage current) causes power looses. Thanks for the confirmation so really the biggest concern here was run time PM. > > > This way SYSMMU controllers no > > > longer prevents respective power domains to be turned off when master's > > > device is not used. > > So when the master device is idle we want to also remove power from the > > controllers ? How much power does this save on a typical device in the > > market BTW ? > > The main purpose of this patchset is to let power domains to be turned off, > because with the current code all domains are all the time turned on, > because > SYSMMU controllers prevent them from turning them off. I see.. I think the audio folks already addressed this with DAPM, but granted this was for audio. Then I was also referred to the DRM / Audio component framework, has this been looked into? v4l folks have v4l async stuff but its not clear if that help with run time PM. I'm mentioning these given it'd be silly to re-invent the wheel, additionally if we now have a generic solution everyone can jump on board with there is quite a bit of work we can do to dump a lot of old legacy crap. > If you want I can measure the power consumption of the idle board with all > domains enabled and disabled if you want to see the numbers. On the other > board > disabling most power domains in idle state (the clocks were already > disabled) > gave me about 20mA savings (at 3.7V), what is a significant value for the > battery powered device. Thanks, this means nothing to me, however it would be value-add to the commit log as anyone reviewing this can understand what the goal / savings was for exactly. > > > > > Signed-off-by: Marek Szyprowski > > > --- > > > drivers/iommu/exynos-iommu.c | 225 > > > ++- > > > 1 file changed, 94 insertions(+), 131 deletions(-) > > I'm reviewing the device link patches now but since this is a demo of > > use of that I'll note the changes here are pretty large and it makes > > it terribly difficult for review. Is there any way this patch can be split > > up in to logical atomic pieces that only do one task upon change ? > > I will try to split it a bit, but I cannot promise that much can be done > to improve readability for someone not very familiar with the driver > internals. I've heard this before, I don't buy it but lets see! Luis ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II))
Hi Will, On Tue, Nov 08, 2016 at 02:45:59AM +, Will Deacon wrote: > Hi all, > > I figured this was a reasonable post to piggy-back on for the LPC minutes > relating to guest MSIs on arm64. > > On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote: > > We can always have QEMU reject hot-adding the device if the reserved > > region overlaps existing guest RAM, but I don't even really see how we > > advise users to give them a reasonable chance of avoiding that > > possibility. Apparently there are also ARM platforms where MSI pages > > cannot be remapped to support the previous programmable user/VM > > address, is it even worthwhile to support those platforms? Does that > > decision influence whether user programmable MSI reserved regions are > > really a second class citizen to fixed reserved regions? I expect > > we'll be talking about this tomorrow morning, but I certainly haven't > > come up with any viable solutions to this. Thanks, > > At LPC last week, we discussed guest MSIs on arm64 as part of the PCI > microconference. I presented some slides to illustrate some of the issues > we're trying to solve: > > http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf > > Punit took some notes (thanks!) on the etherpad here: > > https://etherpad.openstack.org/p/LPC2016_PCI > > although the discussion was pretty lively and jumped about, so I've had > to go from memory where the notes didn't capture everything that was > said. > > To summarise, arm64 platforms differ in their handling of MSIs when compared > to x86: > > 1. The physical memory map is not standardised (Jon pointed out that > this is something that was realised late on) > 2. MSIs are usually treated the same as DMA writes, in that they must be > mapped by the SMMU page tables so that they target a physical MSI > doorbell > 3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI > doorbell built into the PCI RC) > 4. Platforms typically have some set of addresses that abort before > reaching the SMMU (e.g. because the PCI identifies them as P2P). > > All of this means that userspace (QEMU) needs to identify the memory > regions corresponding to points (3) and (4) and ensure that they are > not allocated in the guest physical (IPA) space. For platforms that can > remap the MSI doorbell as in (2), then some space also needs to be > allocated for that. > > Rather than treat these as separate problems, a better interface is to > tell userspace about a set of reserved regions, and have this include > the MSI doorbell, irrespective of whether or not it can be remapped. Is my understanding correct, that you need to tell userspace about the location of the doorbell (in the IOVA space) in case (2), because even though the configuration of the device is handled by the (host) kernel through trapping of the BARs, we have to avoid the VFIO user programming the device to create other DMA transactions to this particular address, since that will obviously conflict and either not produce the desired DMA transactions or result in unintended weird interrupts? Thanks, Christoffer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Summary of LPC guest MSI discussion in Santa Fe
On Tue, Nov 08, 2016 at 02:02:39PM -0500, Don Dutile wrote: > On 11/08/2016 12:54 PM, Will Deacon wrote: > >A first step would be making all this opt-in, and only supporting GICv3 > >ITS for now. > You're trying to support a config that is < GICv3 and no ITS ? ... > That would be the equiv. of x86 pre-intr-remap, and that's why > allow_unsafe_interrupts > hook was created ... to enable devel/kick-the-tires. Yup, that's exactly what I was envisaging. For systems that can't do passthrough safely, we'll always have to throw a devel switch. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Summary of LPC guest MSI discussion in Santa Fe
On 11/08/2016 12:54 PM, Will Deacon wrote: On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote: On 08/11/2016 03:45, Will Deacon wrote: Rather than treat these as separate problems, a better interface is to tell userspace about a set of reserved regions, and have this include the MSI doorbell, irrespective of whether or not it can be remapped. Don suggested that we statically pick an address for the doorbell in a similar way to x86, and have the kernel map it there. We could even pick 0xfee0. If it conflicts with a reserved region on the platform (due to (4)), then we'd obviously have to (deterministically?) allocate it somewhere else, but probably within the bottom 4G. This is tentatively achieved now with [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html) Yup, I saw that fly by. Hopefully some of the internals can be reused with the current thinking on user ABI. The next question is how to tell userspace about all of the reserved regions. Initially, the idea was to extend VFIO, however Alex pointed out a horrible scenario: 1. QEMU spawns a VM on system 0 2. VM is migrated to system 1 3. QEMU attempts to passthrough a device using PCI hotplug In this scenario, the guest memory map is chosen at step (1), yet there is no VFIO fd available to determine the reserved regions. Furthermore, the reserved regions may vary between system 0 and system 1. This pretty much rules out using VFIO to determine the reserved regions.Alex suggested that the SMMU driver can advertise the regions via /sys/class/iommu/. This would solve part of the problem, but migration between systems with different memory maps can still cause problems if the reserved regions of the new system conflict with the guest memory map chosen by QEMU. OK so I understand we do not want anymore the VFIO chain capability API (patch 5 of above series) but we prefer a sysfs approach instead. Right. I understand the sysfs approach which allows the userspace to get the info earlier and independently on VFIO. Keeping in mind current QEMU virt - which is not the only userspace - will not do much from this info until we bring upheavals in virt address space management. So if I am not wrong, at the moment the main action to be undertaken is the rejection of the PCI hotplug in case we detect a collision. I don't think so; it should be up to userspace to reject the hotplug. If userspace doesn't have support for the regions, then that's fine -- you just end up in a situation where the CPU page table maps memory somewhere that the device can't see. In other words, you'll end up with spurious DMA failures, but that's exactly what happens with current systems if you passthrough an overlapping region (Robin demonstrated this on Juno). Additionally, you can imagine some future support where you can tell the guest not to use certain regions of its memory for DMA. In this case, you wouldn't want to refuse the hotplug in the case of overlapping regions. Really, I think the kernel side just needs to enumerate the fixed reserved regions, place the doorbell at a fixed address and then advertise these via sysfs. I can respin [1] - studying and taking into account Robin's comments about dm_regions similarities - removing the VFIO capability chain and replacing this by a sysfs API Ideally, this would be reusable between different SMMU drivers so the sysfs entries have the same format etc. Would that be OK? Sounds good to me. Are you in a position to prototype something on the qemu side once we've got kernel-side agreement? What about Alex comments who wanted to report the usable memory ranges instead of unusable memory ranges? Also did you have a chance to discuss the following items: 1) the VFIO irq safety assessment The discussion really focussed on system topology, as opposed to properties of the doorbell. Regardless of how the device talks to the doorbell, if the doorbell can't protect against things like MSI spoofing, then it's unsafe. My opinion is that we shouldn't allow passthrough by default on systems with unsafe doorbells (we could piggyback on allow_unsafe_interrupts cmdline option to VFIO). A first step would be making all this opt-in, and only supporting GICv3 ITS for now. You're trying to support a config that is < GICv3 and no ITS ? ... That would be the equiv. of x86 pre-intr-remap, and that's why allow_unsafe_interrupts hook was created ... to enable devel/kick-the-tires. 2) the MSI reserved size computation (is an arbitrary size OK?) If we fix the base address, we could fix a size too. However, we'd still need to enumerate the doorbells to check that they fit in the region we have. If not, then we can warn during boot and treat it the same way as a resource conflict (that is, reallocate the region in some deterministic way). Will ___ iommu mailing list iommu@lists.linux-foun
Re: Summary of LPC guest MSI discussion in Santa Fe
On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote: > On 08/11/2016 03:45, Will Deacon wrote: > > Rather than treat these as separate problems, a better interface is to > > tell userspace about a set of reserved regions, and have this include > > the MSI doorbell, irrespective of whether or not it can be remapped. > > Don suggested that we statically pick an address for the doorbell in a > > similar way to x86, and have the kernel map it there. We could even pick > > 0xfee0. If it conflicts with a reserved region on the platform (due > > to (4)), then we'd obviously have to (deterministically?) allocate it > > somewhere else, but probably within the bottom 4G. > > This is tentatively achieved now with > [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II > (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html) Yup, I saw that fly by. Hopefully some of the internals can be reused with the current thinking on user ABI. > > The next question is how to tell userspace about all of the reserved > > regions. Initially, the idea was to extend VFIO, however Alex pointed > > out a horrible scenario: > > > > 1. QEMU spawns a VM on system 0 > > 2. VM is migrated to system 1 > > 3. QEMU attempts to passthrough a device using PCI hotplug > > > > In this scenario, the guest memory map is chosen at step (1), yet there > > is no VFIO fd available to determine the reserved regions. Furthermore, > > the reserved regions may vary between system 0 and system 1. This pretty > > much rules out using VFIO to determine the reserved regions.Alex suggested > > that the SMMU driver can advertise the regions via /sys/class/iommu/. This > > would solve part of the problem, but migration between systems with > > different memory maps can still cause problems if the reserved regions > > of the new system conflict with the guest memory map chosen by QEMU. > > > OK so I understand we do not want anymore the VFIO chain capability API > (patch 5 of above series) but we prefer a sysfs approach instead. Right. > I understand the sysfs approach which allows the userspace to get the > info earlier and independently on VFIO. Keeping in mind current QEMU > virt - which is not the only userspace - will not do much from this info > until we bring upheavals in virt address space management. So if I am > not wrong, at the moment the main action to be undertaken is the > rejection of the PCI hotplug in case we detect a collision. I don't think so; it should be up to userspace to reject the hotplug. If userspace doesn't have support for the regions, then that's fine -- you just end up in a situation where the CPU page table maps memory somewhere that the device can't see. In other words, you'll end up with spurious DMA failures, but that's exactly what happens with current systems if you passthrough an overlapping region (Robin demonstrated this on Juno). Additionally, you can imagine some future support where you can tell the guest not to use certain regions of its memory for DMA. In this case, you wouldn't want to refuse the hotplug in the case of overlapping regions. Really, I think the kernel side just needs to enumerate the fixed reserved regions, place the doorbell at a fixed address and then advertise these via sysfs. > I can respin [1] > - studying and taking into account Robin's comments about dm_regions > similarities > - removing the VFIO capability chain and replacing this by a sysfs API Ideally, this would be reusable between different SMMU drivers so the sysfs entries have the same format etc. > Would that be OK? Sounds good to me. Are you in a position to prototype something on the qemu side once we've got kernel-side agreement? > What about Alex comments who wanted to report the usable memory ranges > instead of unusable memory ranges? > > Also did you have a chance to discuss the following items: > 1) the VFIO irq safety assessment The discussion really focussed on system topology, as opposed to properties of the doorbell. Regardless of how the device talks to the doorbell, if the doorbell can't protect against things like MSI spoofing, then it's unsafe. My opinion is that we shouldn't allow passthrough by default on systems with unsafe doorbells (we could piggyback on allow_unsafe_interrupts cmdline option to VFIO). A first step would be making all this opt-in, and only supporting GICv3 ITS for now. > 2) the MSI reserved size computation (is an arbitrary size OK?) If we fix the base address, we could fix a size too. However, we'd still need to enumerate the doorbells to check that they fit in the region we have. If not, then we can warn during boot and treat it the same way as a resource conflict (that is, reallocate the region in some deterministic way). Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Summary of LPC guest MSI discussion in Santa Fe
On 11/07/2016 09:45 PM, Will Deacon wrote: Hi all, I figured this was a reasonable post to piggy-back on for the LPC minutes relating to guest MSIs on arm64. On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote: We can always have QEMU reject hot-adding the device if the reserved region overlaps existing guest RAM, but I don't even really see how we advise users to give them a reasonable chance of avoiding that possibility. Apparently there are also ARM platforms where MSI pages cannot be remapped to support the previous programmable user/VM address, is it even worthwhile to support those platforms? Does that decision influence whether user programmable MSI reserved regions are really a second class citizen to fixed reserved regions? I expect we'll be talking about this tomorrow morning, but I certainly haven't come up with any viable solutions to this. Thanks, At LPC last week, we discussed guest MSIs on arm64 as part of the PCI microconference. I presented some slides to illustrate some of the issues we're trying to solve: http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf Punit took some notes (thanks!) on the etherpad here: https://etherpad.openstack.org/p/LPC2016_PCI although the discussion was pretty lively and jumped about, so I've had to go from memory where the notes didn't capture everything that was said. To summarise, arm64 platforms differ in their handling of MSIs when compared to x86: 1. The physical memory map is not standardised (Jon pointed out that this is something that was realised late on) 2. MSIs are usually treated the same as DMA writes, in that they must be mapped by the SMMU page tables so that they target a physical MSI doorbell 3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI doorbell built into the PCI RC) Chaulk this case to 'the learning curve'. Q35 chipset (the one being use for x86-PCIe qemu model) had no intr-remap hw, only DMA addrs destined for real memory. assigned-device intrs had to be caught by kvm & injected into guests, and yes, a DoS was possible... and thus, the intr-remap support being done after initial iommu support. 4. Platforms typically have some set of addresses that abort before reaching the SMMU (e.g. because the PCI identifies them as P2P). ARM platforms that don't implement the equivalent of ACS (in PCI bridges within a PCIe switch) are either not device-assignment capable, or the IOMMU domain expands across the entire peer-to-peer (sub-)tree. ACS(-like) functionality is a fundamental component to the security model, as is the IOMMU itself. Without it, it's equivalent to not having an IOMMU. Dare I ask?: Can these systems, or parts of these systems, just be deemed "incomplete" or "not 100% secure" wrt device assignment, and other systems can or will be ??? Not much different then the first x86 systems that tried to get it right the first few times... :-/ I'm hearing (parts of) systems that are just not properly designed for device-assignment use-case, probably b/c this (system) architecture hasn't been pulled together from the variouis component architectures (CPU, SMMU, IOT, etc.). All of this means that userspace (QEMU) needs to identify the memory regions corresponding to points (3) and (4) and ensure that they are not allocated in the guest physical (IPA) space. For platforms that can remap the MSI doorbell as in (2), then some space also needs to be allocated for that. Again, proper ACS features/control eliminates this need. A (multi-function) device should never be able to perform IO to itself via its PCIe interface. Bridge-ACS pushes everything up to SMMU for desitination resolution. Without ACS, I don't see how a guest is migratible from one system to another, unless the system-migration-group consists of system that are exactly the same (wrt IO) [less/more system memory &/or cpu does not affect VM system map. Again, the initial Linux implementation did not have ACS, but was 'resolved' by the default/common system mapping putting the PCI devices into an area that was blocked from memory use (generally 3G->4G space). ARM may not have that single, simple implementation, but a method to indicated reserved regions, and then a check for matching/non-matching reserved regions for guest migration, is the only way I see to resolve this issue until ACS is sufficiently supported int the hw subsystems to be used for device-assignment. Rather than treat these as separate problems, a better interface is to tell userspace about a set of reserved regions, and have this include the MSI doorbell, irrespective of whether or not it can be remapped. Don suggested that we statically pick an address for the doorbell in a similar way to x86, and have the kernel map it there. We could even pick 0xfee0. I suggest picking a 'relative-fixed' address: the last n-pages of system memory address space, i.e., 0xfff[]fee0.
Re: [v15, 3/7] powerpc/fsl: move mpc85xx.h to include/linux/fsl
On Tuesday, November 8, 2016 6:49:51 AM CET Y.B. Lu wrote: > Hi Arnd, > > > > -Original Message- > > From: Arnd Bergmann [mailto:a...@arndb.de] > > Sent: Tuesday, November 08, 2016 5:20 AM > > To: Y.B. Lu > > Cc: linuxppc-...@lists.ozlabs.org; linux-...@vger.kernel.org; > > ulf.hans...@linaro.org; Scott Wood; Mark Rutland; Greg Kroah-Hartman; X.B. > > Xie; M.H. Lian; linux-...@vger.kernel.org; linux-...@vger.kernel.org; > > Qiang Zhao; Russell King; Bhupesh Sharma; Joerg Roedel; Claudiu Manoil; > > devicet...@vger.kernel.org; Rob Herring; Santosh Shilimkar; linux-arm- > > ker...@lists.infradead.org; net...@vger.kernel.org; linux- > > ker...@vger.kernel.org; Leo Li; iommu@lists.linux-foundation.org; Kumar > > Gala > > Subject: Re: [v15, 3/7] powerpc/fsl: move mpc85xx.h to include/linux/fsl > > > > On Monday, October 31, 2016 9:35:33 AM CET Y.B. Lu wrote: > > > > > > > > I don't see any of the contents of this header referenced by the soc > > > > driver any more. I think you can just drop this patch. > > > > > > > > > > [Lu Yangbo-B47093] This header file was included by guts.c. > > > The guts driver used macro SVR_MAJ/SVR_MIN for calculation. > > > > > > This header file was for powerpc arch before. And this patch is to > > > made it as common header file for both ARM and PPC. > > > Sooner or later this is needed. > > > > Let's discuss it once we actually need the header then, ok? > > [Lu Yangbo-B47093] As I said, this header file was included by guts.c in > patch 4. Ah sorry, I misread your earlier reply, thinking you meant a potential future patch. > The guts driver used macro SVR_MAJ/SVR_MIN for calculation which were > defined in this header file. > Did you suggest we dropped this patch and just calculated them in driver? That is probably nicer here: there is not that much value in sharing the two one-line macro definitions, and the driver already hardcodes the numeric per-chip IDs that make up most of the header file. Arnd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[git pull] IOMMU Fixes for Linux v4.9-rc4
Hi Linus, The following changes since commit bc33b0ca11e3df46a4fa7639ba488c9d4911: Linux 4.9-rc4 (2016-11-05 16:23:36 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git tags/iommu-fixes-v4.9-rc4 for you to fetch changes up to bea64033dd7b5fb6296eda8266acab6364ce1554: iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path (2016-11-08 15:08:26 +0100) IOMMU-Fixes for Linux v4.9-rc4 Two places need fixing: * Four patches from Robin Murphy fix several issues with the recently merged generic DT-bindings support for arm-smmu drivers * A fix for a dead-lock issue in the VT-d driver, which shows up on iommu hotplug Joerg Roedel (1): iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path Robin Murphy (4): iommu/arm-smmu: Work around ARM DMA configuration iommu/arm-smmu: Don't inadvertently reject multiple SMMUv3s iommu/arm-smmu: Check that iommu_fwspecs are ours iommu/arm-smmu: Fix out-of-bounds dereference drivers/iommu/arm-smmu-v3.c | 25 + drivers/iommu/arm-smmu.c| 16 ++-- drivers/iommu/intel-iommu.c | 14 -- 3 files changed, 43 insertions(+), 12 deletions(-) Please pull. Thanks, Joerg signature.asc Description: Digital signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 7/7] iommu/exynos: Use device dependency links to control runtime pm
On Tue, Nov 08, 2016 at 08:27:12AM +0100, Marek Szyprowski wrote: > On 2016-11-07 22:47, Luis R. Rodriguez wrote: > > Has there been any review of the existing similar solutions out there > > such as the DRM / audio component framework? Would that help ? > > Nope, none of that solution deals with runtime pm. Well, they do. Hybrid graphics laptops often have an HDA controller on the discrete GPU and I assume that's what Luis meant. There's code in drivers/gpu/vga/vga_switcheroo.c to make this (only sort of) work: * When the GPU is powered up/down, the HDA controller's driver is instructed to pm_runtime_get/put the HDA device (see call to set_audio_state() in vga_switcheroo_set_dynamic_switch()). * When a runtime PM ref is acquired on the HDA device, the GPU is powered up (see vga_switcheroo_runtime_resume_hdmi_audio()). Unfortunately this is all fairly broken, e.g.: * If a runtime PM ref on the HDA device is held for more than 5 sec (autosuspend delay of the GPU), the GPU will be powered down and the HDA device will become inaccessible, regardless of the runtime PM ref still being held (because vga_switcheroo_set_dynamic_switch() doesn't check the refcount of the HDA device). * The DRM device is afforded direct-complete but the HDA device is not. If the GPU is handled earlier by dpm_suspend(), then runtime PM will have been disabled on the GPU and thus the HDA device will fail to runtime resume before system sleep. Rafael's series allows representation of such inter-device dependencies in the PM core and can thus replace kludgy and broken "solutions" like the one above. There's one thing that I haven't understood myself though: In an e-mail exchange in September Rafael has argued that the above-mentioned hybrid graphics use case "isn't a good [example] IMO. That clearly is a case when two (or more) devices share power resources controlled by a single on/off switch. Which is a clear use case for a PM domain." The same seems to apply to Marek's SYSMMU use case. When applying device links to SYSMMU or hybrid graphics, we select one of the devices in the PM domain as master and have the other one depend on it as slave, i.e. a synthetic hierarchical relationship is established. I've responded to Rafael on September 18 that this can't be solved with a struct dev_pm_domain, but haven't received a reply since: https://lkml.org/lkml/2016/9/18/103 Thanks, Lukas ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma-iommu: properly respect configured address space size
Hi Robin, On 2016-11-08 15:44, Robin Murphy wrote: On 08/11/16 13:41, Marek Szyprowski wrote: On 2016-11-08 12:37, Robin Murphy wrote: On 07/11/16 13:06, Marek Szyprowski wrote: When one called iommu_dma_init_domain() with size smaller than device's DMA mask, the alloc_iova() will not respect it and always assume that all IOVA addresses will be allocated from the the (base ... dev->dma_mask) range. Is that actually a problem for anything? Yes, I found this issue while working on next version of ARM & ARM64 DMA-mapping/IOMMU integration patchset and adapting Exynos drivers for the new IOMMU/DMA-mapping glue. Some Exynos devices (codec and camera isp) operate only on the limited (and really small: 256M for example) DMA window. They use non-standard way of addressing memory: an offset from the firmware base. However they still have 32bit DMA mask, as the firmware can be located basically everywhere in the real DMA address space, but then they can access only next 256M from that firmware base. OK, that's good to know, thanks. However, I think in this case it sounds like it's really the DMA mask that's the underlying problem - if those blocks themselves can only drive 28 address bits, then the struct devices representing them should have 28-bit DMA masks, and the "firmware base" of whoever's driving the upper bits modelled with a dma_pfn_offset. Even if they do have full 32-bit interfaces themselves, but are constrained to segment-offset addressing internally, I still think it would be tidier to represent things that way. At some point in dma-iommu development I did have support for DMA offsets upstream of the IOMMU, and am happy to reinstate it if there's a real use case (assuming you can't simply always set the firmware base to 0 when using the IOMMU). That would indeed look a bit simpler, but I've already tried such approach and the firmware crashes when its base in real DMA address space is set to zero. This patch fixes this issue by taking the configured address space size parameter into account (if it is smaller than the device's dma_mask). TBH I've been pondering ripping the size stuff out of dma-iommu, as it all stems from me originally failing to understand what dma_32bit_pfn is actually for. The truth is that iova_domains just don't have a size or upper limit; however if devices with both large and small DMA masks share a domain, then the top-down nature of the allocator means that allocating for the less-capable devices would involve walking through every out-of-range entry in the tree every time. Having cached32_node based on dma_32bit_pfn just provides an optimised starting point for searching within the smaller mask. Right, this dma_32bit_pfn was really misleading at the first glance, but then I found that this was something like end_pfn in case of dma-iommu code. Yes, that was my incorrect assumption - at the time I interpreted it as a de-facto upper limit which was still possible to allocate above in special circumstances, which turns out to be almost entirely backwards. I'd rather not bake that into the dma-iommu code any further if we can avoid it (I'll try throwing an RFC together to clear up what's there already). Okay. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma-iommu: properly respect configured address space size
On 08/11/16 13:41, Marek Szyprowski wrote: > Hi Robin, > > > On 2016-11-08 12:37, Robin Murphy wrote: >> On 07/11/16 13:06, Marek Szyprowski wrote: >>> When one called iommu_dma_init_domain() with size smaller than device's >>> DMA mask, the alloc_iova() will not respect it and always assume that >>> all >>> IOVA addresses will be allocated from the the (base ... >>> dev->dma_mask) range. >> Is that actually a problem for anything? > > Yes, I found this issue while working on next version of ARM & ARM64 > DMA-mapping/IOMMU integration patchset and adapting Exynos drivers for the > new IOMMU/DMA-mapping glue. > > Some Exynos devices (codec and camera isp) operate only on the limited (and > really small: 256M for example) DMA window. They use non-standard way of > addressing memory: an offset from the firmware base. However they still > have > 32bit DMA mask, as the firmware can be located basically everywhere in the > real DMA address space, but then they can access only next 256M from that > firmware base. OK, that's good to know, thanks. However, I think in this case it sounds like it's really the DMA mask that's the underlying problem - if those blocks themselves can only drive 28 address bits, then the struct devices representing them should have 28-bit DMA masks, and the "firmware base" of whoever's driving the upper bits modelled with a dma_pfn_offset. Even if they do have full 32-bit interfaces themselves, but are constrained to segment-offset addressing internally, I still think it would be tidier to represent things that way. At some point in dma-iommu development I did have support for DMA offsets upstream of the IOMMU, and am happy to reinstate it if there's a real use case (assuming you can't simply always set the firmware base to 0 when using the IOMMU). >>> This patch fixes this issue by taking the configured address space size >>> parameter into account (if it is smaller than the device's dma_mask). >> TBH I've been pondering ripping the size stuff out of dma-iommu, as it >> all stems from me originally failing to understand what dma_32bit_pfn is >> actually for. The truth is that iova_domains just don't have a size or >> upper limit; however if devices with both large and small DMA masks >> share a domain, then the top-down nature of the allocator means that >> allocating for the less-capable devices would involve walking through >> every out-of-range entry in the tree every time. Having cached32_node >> based on dma_32bit_pfn just provides an optimised starting point for >> searching within the smaller mask. > > Right, this dma_32bit_pfn was really misleading at the first glance, > but then I found that this was something like end_pfn in case of dma-iommu > code. Yes, that was my incorrect assumption - at the time I interpreted it as a de-facto upper limit which was still possible to allocate above in special circumstances, which turns out to be almost entirely backwards. I'd rather not bake that into the dma-iommu code any further if we can avoid it (I'll try throwing an RFC together to clear up what's there already). Robin. >> Would it hurt any of your use-cases to relax/rework the reinitialisation >> checks in iommu_dma_init_domain()? Alternatively if we really do have a >> case for wanting a hard upper limit, it might make more sense to add an >> end_pfn to the iova_domain and handle it in the allocator itself. > > The proper support for end_pfn would be a better approach probably, > especially if we consider readability of the code. > > Best regards ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Summary of LPC guest MSI discussion in Santa Fe
Hi Will, On 08/11/2016 03:45, Will Deacon wrote: > Hi all, > > I figured this was a reasonable post to piggy-back on for the LPC minutes > relating to guest MSIs on arm64. > > On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote: >> We can always have QEMU reject hot-adding the device if the reserved >> region overlaps existing guest RAM, but I don't even really see how we >> advise users to give them a reasonable chance of avoiding that >> possibility. Apparently there are also ARM platforms where MSI pages >> cannot be remapped to support the previous programmable user/VM >> address, is it even worthwhile to support those platforms? Does that >> decision influence whether user programmable MSI reserved regions are >> really a second class citizen to fixed reserved regions? I expect >> we'll be talking about this tomorrow morning, but I certainly haven't >> come up with any viable solutions to this. Thanks, > > At LPC last week, we discussed guest MSIs on arm64 as part of the PCI > microconference. I presented some slides to illustrate some of the issues > we're trying to solve: > > http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf > > Punit took some notes (thanks!) on the etherpad here: > > https://etherpad.openstack.org/p/LPC2016_PCI Thanks to both of you for the minutes and slides. Unfortunately I could not travel but my ears were burning ;-) > > although the discussion was pretty lively and jumped about, so I've had > to go from memory where the notes didn't capture everything that was > said. > > To summarise, arm64 platforms differ in their handling of MSIs when compared > to x86: > > 1. The physical memory map is not standardised (Jon pointed out that > this is something that was realised late on) > 2. MSIs are usually treated the same as DMA writes, in that they must be > mapped by the SMMU page tables so that they target a physical MSI > doorbell > 3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI > doorbell built into the PCI RC) > 4. Platforms typically have some set of addresses that abort before > reaching the SMMU (e.g. because the PCI identifies them as P2P). > > All of this means that userspace (QEMU) needs to identify the memory > regions corresponding to points (3) and (4) and ensure that they are > not allocated in the guest physical (IPA) space. For platforms that can > remap the MSI doorbell as in (2), then some space also needs to be > allocated for that. > > Rather than treat these as separate problems, a better interface is to > tell userspace about a set of reserved regions, and have this include > the MSI doorbell, irrespective of whether or not it can be remapped. > Don suggested that we statically pick an address for the doorbell in a > similar way to x86, and have the kernel map it there. We could even pick > 0xfee0. If it conflicts with a reserved region on the platform (due > to (4)), then we'd obviously have to (deterministically?) allocate it > somewhere else, but probably within the bottom 4G. This is tentatively achieved now with [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html) > > The next question is how to tell userspace about all of the reserved > regions. Initially, the idea was to extend VFIO, however Alex pointed > out a horrible scenario: > > 1. QEMU spawns a VM on system 0 > 2. VM is migrated to system 1 > 3. QEMU attempts to passthrough a device using PCI hotplug > > In this scenario, the guest memory map is chosen at step (1), yet there > is no VFIO fd available to determine the reserved regions. Furthermore, > the reserved regions may vary between system 0 and system 1. This pretty > much rules out using VFIO to determine the reserved regions.Alex suggested > that the SMMU driver can advertise the regions via /sys/class/iommu/. This > would solve part of the problem, but migration between systems with > different memory maps can still cause problems if the reserved regions > of the new system conflict with the guest memory map chosen by QEMU. OK so I understand we do not want anymore the VFIO chain capability API (patch 5 of above series) but we prefer a sysfs approach instead. I understand the sysfs approach which allows the userspace to get the info earlier and independently on VFIO. Keeping in mind current QEMU virt - which is not the only userspace - will not do much from this info until we bring upheavals in virt address space management. So if I am not wrong, at the moment the main action to be undertaken is the rejection of the PCI hotplug in case we detect a collision. I can respin [1] - studying and taking into account Robin's comments about dm_regions similarities - removing the VFIO capability chain and replacing this by a sysfs API Would that be OK? What about Alex comments who wanted to report the usable memory ranges instead of u
[PATCH] iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path
On Fri, Nov 04, 2016 at 10:38:29AM +0100, Iago Abal wrote: > That patch was actually my first attempt at fixing the problem, but I > ran the tool and I found a second possibility of deadlock: > `domain_exit' calls to `domain_remove_dev_info', which also spin_locks > on `device_domain_lock'. > > Alternatively I could add another `__domain_exit' function that > doesn't take the lock. So that is actually not easy to do, I'd rather go with the simpler solution of dropping the lock for domain_exit() invocation and then re-start walking the list afterwards, like in the below patch: >From bea64033dd7b5fb6296eda8266acab6364ce1554 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Tue, 8 Nov 2016 15:08:26 +0100 Subject: [PATCH] iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path It turns out that the disable_dmar_iommu() code-path tried to get the device_domain_lock recursivly, which will dead-lock when this code runs on dmar removal. Fix both code-paths that could lead to the dead-lock. Fixes: 55d940430ab9 ('iommu/vt-d: Get rid of domain->iommu_lock') Reported-by: Iago Abal Signed-off-by: Joerg Roedel --- drivers/iommu/intel-iommu.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a4407ea..3965e73 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1711,6 +1711,7 @@ static void disable_dmar_iommu(struct intel_iommu *iommu) if (!iommu->domains || !iommu->domain_ids) return; +again: spin_lock_irqsave(&device_domain_lock, flags); list_for_each_entry_safe(info, tmp, &device_domain_list, global) { struct dmar_domain *domain; @@ -1723,10 +1724,19 @@ static void disable_dmar_iommu(struct intel_iommu *iommu) domain = info->domain; - dmar_remove_one_dev_info(domain, info->dev); + __dmar_remove_one_dev_info(info); - if (!domain_type_is_vm_or_si(domain)) + if (!domain_type_is_vm_or_si(domain)) { + /* +* The domain_exit() function can't be called under +* device_domain_lock, as it takes this lock itself. +* So release the lock here and re-run the loop +* afterwards. +*/ + spin_unlock_irqrestore(&device_domain_lock, flags); domain_exit(domain); + goto again; + } } spin_unlock_irqrestore(&device_domain_lock, flags); -- 2.6.6 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma-iommu: properly respect configured address space size
Hi Robin, On 2016-11-08 12:37, Robin Murphy wrote: On 07/11/16 13:06, Marek Szyprowski wrote: When one called iommu_dma_init_domain() with size smaller than device's DMA mask, the alloc_iova() will not respect it and always assume that all IOVA addresses will be allocated from the the (base ... dev->dma_mask) range. Is that actually a problem for anything? Yes, I found this issue while working on next version of ARM & ARM64 DMA-mapping/IOMMU integration patchset and adapting Exynos drivers for the new IOMMU/DMA-mapping glue. Some Exynos devices (codec and camera isp) operate only on the limited (and really small: 256M for example) DMA window. They use non-standard way of addressing memory: an offset from the firmware base. However they still have 32bit DMA mask, as the firmware can be located basically everywhere in the real DMA address space, but then they can access only next 256M from that firmware base. This patch fixes this issue by taking the configured address space size parameter into account (if it is smaller than the device's dma_mask). TBH I've been pondering ripping the size stuff out of dma-iommu, as it all stems from me originally failing to understand what dma_32bit_pfn is actually for. The truth is that iova_domains just don't have a size or upper limit; however if devices with both large and small DMA masks share a domain, then the top-down nature of the allocator means that allocating for the less-capable devices would involve walking through every out-of-range entry in the tree every time. Having cached32_node based on dma_32bit_pfn just provides an optimised starting point for searching within the smaller mask. Right, this dma_32bit_pfn was really misleading at the first glance, but then I found that this was something like end_pfn in case of dma-iommu code. Would it hurt any of your use-cases to relax/rework the reinitialisation checks in iommu_dma_init_domain()? Alternatively if we really do have a case for wanting a hard upper limit, it might make more sense to add an end_pfn to the iova_domain and handle it in the allocator itself. The proper support for end_pfn would be a better approach probably, especially if we consider readability of the code. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 2/7] iommu/exynos: Remove dead code
__sysmmu_enable/disable functions were designed to do ref-count based operations, but current code always calls them only once, so the code for checking the conditions and invalid conditions can be simply removed without any influence to the driver operation. Signed-off-by: Marek Szyprowski --- drivers/iommu/exynos-iommu.c | 65 1 file changed, 17 insertions(+), 48 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 8ba0d60..4056228 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -460,9 +460,6 @@ static bool __sysmmu_disable(struct sysmmu_drvdata *data) __sysmmu_disable_nocount(data); dev_dbg(data->sysmmu, "Disabled\n"); - } else { - dev_dbg(data->sysmmu, "%d times left to disable\n", - data->activations); } spin_unlock_irqrestore(&data->lock, flags); @@ -508,29 +505,18 @@ static void __sysmmu_enable_nocount(struct sysmmu_drvdata *data) static int __sysmmu_enable(struct sysmmu_drvdata *data, phys_addr_t pgtable, struct exynos_iommu_domain *domain) { - int ret = 0; unsigned long flags; spin_lock_irqsave(&data->lock, flags); if (set_sysmmu_active(data)) { data->pgtable = pgtable; data->domain = domain; - __sysmmu_enable_nocount(data); - dev_dbg(data->sysmmu, "Enabled\n"); - } else { - ret = (pgtable == data->pgtable) ? 1 : -EBUSY; - - dev_dbg(data->sysmmu, "already enabled\n"); } - - if (WARN_ON(ret < 0)) - set_sysmmu_inactive(data); /* decrement count */ - spin_unlock_irqrestore(&data->lock, flags); - return ret; + return 0; } static void sysmmu_tlb_invalidate_flpdcache(struct sysmmu_drvdata *data, @@ -793,8 +779,8 @@ static void exynos_iommu_domain_free(struct iommu_domain *iommu_domain) spin_lock_irqsave(&domain->lock, flags); list_for_each_entry_safe(data, next, &domain->clients, domain_node) { - if (__sysmmu_disable(data)) - data->master = NULL; + __sysmmu_disable(data); + data->master = NULL; list_del_init(&data->domain_node); } @@ -829,31 +815,23 @@ static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain, phys_addr_t pagetable = virt_to_phys(domain->pgtable); struct sysmmu_drvdata *data, *next; unsigned long flags; - bool found = false; if (!has_sysmmu(dev) || owner->domain != iommu_domain) return; spin_lock_irqsave(&domain->lock, flags); list_for_each_entry_safe(data, next, &domain->clients, domain_node) { - if (data->master == dev) { - if (__sysmmu_disable(data)) { - data->master = NULL; - list_del_init(&data->domain_node); - } - pm_runtime_put(data->sysmmu); - found = true; - } + __sysmmu_disable(data); + data->master = NULL; + list_del_init(&data->domain_node); + pm_runtime_put(data->sysmmu); } spin_unlock_irqrestore(&domain->lock, flags); owner->domain = NULL; - if (found) - dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n", - __func__, &pagetable); - else - dev_err(dev, "%s: No IOMMU is attached\n", __func__); + dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n", __func__, + &pagetable); } static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain, @@ -864,7 +842,6 @@ static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain, struct sysmmu_drvdata *data; phys_addr_t pagetable = virt_to_phys(domain->pgtable); unsigned long flags; - int ret = -ENODEV; if (!has_sysmmu(dev)) return -ENODEV; @@ -874,27 +851,19 @@ static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain, list_for_each_entry(data, &owner->controllers, owner_node) { pm_runtime_get_sync(data->sysmmu); - ret = __sysmmu_enable(data, pagetable, domain); - if (ret >= 0) { - data->master = dev; + __sysmmu_enable(data, pagetable, domain); + data->master = dev; - spin_lock_irqsave(&domain->lock, flags); - list_add_tail(&data->domain_node, &domain->clients); - spin_unlock_irqrestore(&domain->lock, flags); - } - } - - if (ret < 0) { - dev_err(dev
[PATCH v6 7/7] iommu/exynos: Use device dependency links to control runtime pm
This patch uses recently introduced device dependency links to track the runtime pm state of the master's device. The goal is to let SYSMMU controller device's runtime PM to follow the runtime PM state of the respective master's device. This way each SYSMMU controller is active only when its master's device is active and can properly restore or save its state instead on runtime PM transition of master's device. This approach replaces old behavior, when SYSMMU controller was set to runtime active once after attaching to the master device. In the new approach SYSMMU controllers no longer prevents respective power domains to be turned off when master's device is not being used. The dependency links also enforces proper order of suspending/restoring devices during system sleep transition, so there is no more need to use LATE_SYSTEM_SLEEP_PM_OPS-based workaround for ensuring that SYSMMUs are suspended after their master devices. Signed-off-by: Marek Szyprowski --- drivers/iommu/exynos-iommu.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 5e6d7bb..4b05a15 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -633,8 +633,8 @@ static int __maybe_unused exynos_sysmmu_resume(struct device *dev) static const struct dev_pm_ops sysmmu_pm_ops = { SET_RUNTIME_PM_OPS(exynos_sysmmu_suspend, exynos_sysmmu_resume, NULL) - SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, -pm_runtime_force_resume) + SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, + pm_runtime_force_resume) }; static const struct of_device_id sysmmu_of_match[] __initconst = { @@ -781,10 +781,6 @@ static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain, if (!has_sysmmu(dev) || owner->domain != iommu_domain) return; - list_for_each_entry(data, &owner->controllers, owner_node) { - pm_runtime_put_sync(data->sysmmu); - } - mutex_lock(&owner->rpm_lock); list_for_each_entry(data, &owner->controllers, owner_node) { @@ -848,10 +844,6 @@ static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain, mutex_unlock(&owner->rpm_lock); - list_for_each_entry(data, &owner->controllers, owner_node) { - pm_runtime_get_sync(data->sysmmu); - } - dev_dbg(dev, "%s: Attached IOMMU with pgtable %pa\n", __func__, &pagetable); @@ -1232,6 +1224,14 @@ static int exynos_iommu_of_xlate(struct device *dev, list_add_tail(&data->owner_node, &owner->controllers); data->master = dev; + + /* +* SYSMMU will be runtime activated via device link (dependency) to its +* master device, so there are no direct calls to pm_runtime_get/put +* in this driver. +*/ + device_link_add(dev, data->sysmmu, DL_FLAG_PM_RUNTIME); + return 0; } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 3/7] iommu/exynos: Simplify internal enable/disable functions
Remove remaining leftovers of the ref-count related code in the __sysmmu_enable/disable functions inline __sysmmu_enable/disable_nocount to them. Suspend/resume callbacks now checks if master device is set for given SYSMMU controller instead of relying on the activation count. Signed-off-by: Marek Szyprowski --- drivers/iommu/exynos-iommu.c | 104 --- 1 file changed, 29 insertions(+), 75 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 4056228..f45b274 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -237,8 +237,8 @@ struct sysmmu_drvdata { struct clk *aclk; /* SYSMMU's aclk clock */ struct clk *pclk; /* SYSMMU's pclk clock */ struct clk *clk_master; /* master's device clock */ - int activations;/* number of calls to sysmmu_enable */ spinlock_t lock;/* lock for modyfying state */ + int active; /* current status */ struct exynos_iommu_domain *domain; /* domain we belong to */ struct list_head domain_node; /* node for domain clients list */ struct list_head owner_node;/* node for owner controllers list */ @@ -251,25 +251,6 @@ static struct exynos_iommu_domain *to_exynos_domain(struct iommu_domain *dom) return container_of(dom, struct exynos_iommu_domain, domain); } -static bool set_sysmmu_active(struct sysmmu_drvdata *data) -{ - /* return true if the System MMU was not active previously - and it needs to be initialized */ - return ++data->activations == 1; -} - -static bool set_sysmmu_inactive(struct sysmmu_drvdata *data) -{ - /* return true if the System MMU is needed to be disabled */ - BUG_ON(data->activations < 1); - return --data->activations == 0; -} - -static bool is_sysmmu_active(struct sysmmu_drvdata *data) -{ - return data->activations > 0; -} - static void sysmmu_unblock(struct sysmmu_drvdata *data) { writel(CTRL_ENABLE, data->sfrbase + REG_MMU_CTRL); @@ -388,7 +369,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id) unsigned short reg_status, reg_clear; int ret = -ENOSYS; - WARN_ON(!is_sysmmu_active(data)); + WARN_ON(!data->active); if (MMU_MAJ_VER(data->version) < 5) { reg_status = REG_INT_STATUS; @@ -434,37 +415,19 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id) return IRQ_HANDLED; } -static void __sysmmu_disable_nocount(struct sysmmu_drvdata *data) +static void __sysmmu_disable(struct sysmmu_drvdata *data) { + unsigned long flags; + clk_enable(data->clk_master); + spin_lock_irqsave(&data->lock, flags); writel(CTRL_DISABLE, data->sfrbase + REG_MMU_CTRL); writel(0, data->sfrbase + REG_MMU_CFG); - - __sysmmu_disable_clocks(data); -} - -static bool __sysmmu_disable(struct sysmmu_drvdata *data) -{ - bool disabled; - unsigned long flags; - - spin_lock_irqsave(&data->lock, flags); - - disabled = set_sysmmu_inactive(data); - - if (disabled) { - data->pgtable = 0; - data->domain = NULL; - - __sysmmu_disable_nocount(data); - - dev_dbg(data->sysmmu, "Disabled\n"); - } - + data->active = false; spin_unlock_irqrestore(&data->lock, flags); - return disabled; + __sysmmu_disable_clocks(data); } static void __sysmmu_init_config(struct sysmmu_drvdata *data) @@ -481,17 +444,19 @@ static void __sysmmu_init_config(struct sysmmu_drvdata *data) writel(cfg, data->sfrbase + REG_MMU_CFG); } -static void __sysmmu_enable_nocount(struct sysmmu_drvdata *data) +static void __sysmmu_enable(struct sysmmu_drvdata *data) { + unsigned long flags; + __sysmmu_enable_clocks(data); + spin_lock_irqsave(&data->lock, flags); writel(CTRL_BLOCK, data->sfrbase + REG_MMU_CTRL); - __sysmmu_init_config(data); - __sysmmu_set_ptbase(data, data->pgtable); - writel(CTRL_ENABLE, data->sfrbase + REG_MMU_CTRL); + data->active = true; + spin_unlock_irqrestore(&data->lock, flags); /* * SYSMMU driver keeps master's clock enabled only for the short @@ -502,37 +467,18 @@ static void __sysmmu_enable_nocount(struct sysmmu_drvdata *data) clk_disable(data->clk_master); } -static int __sysmmu_enable(struct sysmmu_drvdata *data, phys_addr_t pgtable, - struct exynos_iommu_domain *domain) -{ - unsigned long flags; - - spin_lock_irqsave(&data->lock, flags); - if (set_sysmmu_active(data)) { - data->pgtable = pgtable; - data->domain = domain; - __sysmmu_enable_nocount(data); - dev_dbg(data->sysmmu, "Enabled\n"); - } - spin_unlock_irqrestore(&data-
[PATCH v6 6/7] iommu/exynos: Add runtime pm support
This patch adds runtime pm implementation, which is based on previous suspend/resume code. SYSMMU controller is now being enabled/disabled mainly from the runtime pm callbacks. System sleep callbacks relies on generic pm_runtime_force_suspend/pm_runtime_force_resume helpers. To ensure internal state consistency, additional lock for runtime pm transitions was introduced. Signed-off-by: Marek Szyprowski --- drivers/iommu/exynos-iommu.c | 45 +++- 1 file changed, 36 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index a959443..5e6d7bb 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -206,6 +206,7 @@ struct sysmmu_fault_info { struct exynos_iommu_owner { struct list_head controllers; /* list of sysmmu_drvdata.owner_node */ struct iommu_domain *domain;/* domain this device is attached */ + struct mutex rpm_lock; /* for runtime pm of all sysmmus */ }; /* @@ -594,40 +595,46 @@ static int __init exynos_sysmmu_probe(struct platform_device *pdev) return 0; } -#ifdef CONFIG_PM_SLEEP -static int exynos_sysmmu_suspend(struct device *dev) +static int __maybe_unused exynos_sysmmu_suspend(struct device *dev) { struct sysmmu_drvdata *data = dev_get_drvdata(dev); struct device *master = data->master; if (master) { - pm_runtime_put(dev); + struct exynos_iommu_owner *owner = master->archdata.iommu; + + mutex_lock(&owner->rpm_lock); if (data->domain) { dev_dbg(data->sysmmu, "saving state\n"); __sysmmu_disable(data); } + mutex_unlock(&owner->rpm_lock); } return 0; } -static int exynos_sysmmu_resume(struct device *dev) +static int __maybe_unused exynos_sysmmu_resume(struct device *dev) { struct sysmmu_drvdata *data = dev_get_drvdata(dev); struct device *master = data->master; if (master) { - pm_runtime_get_sync(dev); + struct exynos_iommu_owner *owner = master->archdata.iommu; + + mutex_lock(&owner->rpm_lock); if (data->domain) { dev_dbg(data->sysmmu, "restoring state\n"); __sysmmu_enable(data); } + mutex_unlock(&owner->rpm_lock); } return 0; } -#endif static const struct dev_pm_ops sysmmu_pm_ops = { - SET_LATE_SYSTEM_SLEEP_PM_OPS(exynos_sysmmu_suspend, exynos_sysmmu_resume) + SET_RUNTIME_PM_OPS(exynos_sysmmu_suspend, exynos_sysmmu_resume, NULL) + SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, +pm_runtime_force_resume) }; static const struct of_device_id sysmmu_of_match[] __initconst = { @@ -775,7 +782,15 @@ static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain, return; list_for_each_entry(data, &owner->controllers, owner_node) { - __sysmmu_disable(data); + pm_runtime_put_sync(data->sysmmu); + } + + mutex_lock(&owner->rpm_lock); + + list_for_each_entry(data, &owner->controllers, owner_node) { + pm_runtime_get_noresume(data->sysmmu); + if (pm_runtime_active(data->sysmmu)) + __sysmmu_disable(data); pm_runtime_put(data->sysmmu); } @@ -790,6 +805,7 @@ static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain, owner->domain = NULL; spin_unlock_irqrestore(&domain->lock, flags); + mutex_unlock(&owner->rpm_lock); dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n", __func__, &pagetable); @@ -810,6 +826,8 @@ static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain, if (owner->domain) exynos_iommu_detach_device(owner->domain, dev); + mutex_lock(&owner->rpm_lock); + spin_lock_irqsave(&domain->lock, flags); list_for_each_entry(data, &owner->controllers, owner_node) { spin_lock(&data->lock); @@ -822,8 +840,16 @@ static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain, spin_unlock_irqrestore(&domain->lock, flags); list_for_each_entry(data, &owner->controllers, owner_node) { + pm_runtime_get_noresume(data->sysmmu); + if (pm_runtime_active(data->sysmmu)) + __sysmmu_enable(data); + pm_runtime_put(data->sysmmu); + } + + mutex_unlock(&owner->rpm_lock); + + list_for_each_entry(data, &owner->controllers, owner_node) { pm_runtime_get_sync(data->sysmmu); - __sysmmu_enable(data); } dev_dbg(dev, "%s: Attached IOMMU with pgtable %pa\n", __func__, @@ -1200,6 +1226,7 @@ static int
[PATCH v6 4/7] iommu/exynos: Set master device once on boot
To avoid possible races, set master device pointer in each SYSMMU controller once on boot. Suspend/resume callbacks now properly relies on the configured iommu domain to enable or disable SYSMMU controller. While changing the code, also update the sleep debug messages and make them conditional. Signed-off-by: Marek Szyprowski --- drivers/iommu/exynos-iommu.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index f45b274..28e570b 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -600,10 +600,12 @@ static int exynos_sysmmu_suspend(struct device *dev) struct sysmmu_drvdata *data = dev_get_drvdata(dev); struct device *master = data->master; - dev_dbg(dev, "suspend\n"); if (master) { - __sysmmu_disable(data); pm_runtime_put(dev); + if (data->domain) { + dev_dbg(data->sysmmu, "saving state\n"); + __sysmmu_disable(data); + } } return 0; } @@ -613,10 +615,12 @@ static int exynos_sysmmu_resume(struct device *dev) struct sysmmu_drvdata *data = dev_get_drvdata(dev); struct device *master = data->master; - dev_dbg(dev, "resume\n"); if (master) { pm_runtime_get_sync(dev); - __sysmmu_enable(data); + if (data->domain) { + dev_dbg(data->sysmmu, "restoring state\n"); + __sysmmu_enable(data); + } } return 0; } @@ -730,7 +734,6 @@ static void exynos_iommu_domain_free(struct iommu_domain *iommu_domain) __sysmmu_disable(data); data->pgtable = 0; data->domain = NULL; - data->master = NULL; list_del_init(&data->domain_node); } @@ -772,7 +775,6 @@ static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain, spin_lock_irqsave(&domain->lock, flags); list_for_each_entry_safe(data, next, &domain->clients, domain_node) { __sysmmu_disable(data); - data->master = NULL; data->pgtable = 0; data->domain = NULL; list_del_init(&data->domain_node); @@ -806,7 +808,6 @@ static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain, data->domain = domain; pm_runtime_get_sync(data->sysmmu); __sysmmu_enable(data); - data->master = dev; spin_lock_irqsave(&domain->lock, flags); list_add_tail(&data->domain_node, &domain->clients); @@ -1192,6 +1193,7 @@ static int exynos_iommu_of_xlate(struct device *dev, } list_add_tail(&data->owner_node, &owner->controllers); + data->master = dev; return 0; } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 5/7] iommu/exynos: Rework and fix internal locking
This patch reworks locking in the exynos_iommu_attach/detach_device functions to ensure that all entries of the sysmmu_drvdata and exynos_iommu_owner structure are updated under the respective spinlocks, while runtime pm functions are called without any spinlocks held. Signed-off-by: Marek Szyprowski --- drivers/iommu/exynos-iommu.c | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 28e570b..a959443 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -731,10 +731,12 @@ static void exynos_iommu_domain_free(struct iommu_domain *iommu_domain) spin_lock_irqsave(&domain->lock, flags); list_for_each_entry_safe(data, next, &domain->clients, domain_node) { + spin_lock(&data->lock); __sysmmu_disable(data); data->pgtable = 0; data->domain = NULL; list_del_init(&data->domain_node); + spin_unlock(&data->lock); } spin_unlock_irqrestore(&domain->lock, flags); @@ -772,17 +774,22 @@ static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain, if (!has_sysmmu(dev) || owner->domain != iommu_domain) return; + list_for_each_entry(data, &owner->controllers, owner_node) { + __sysmmu_disable(data); + pm_runtime_put(data->sysmmu); + } + spin_lock_irqsave(&domain->lock, flags); list_for_each_entry_safe(data, next, &domain->clients, domain_node) { - __sysmmu_disable(data); + spin_lock(&data->lock); data->pgtable = 0; data->domain = NULL; list_del_init(&data->domain_node); - pm_runtime_put(data->sysmmu); + spin_unlock(&data->lock); } + owner->domain = NULL; spin_unlock_irqrestore(&domain->lock, flags); - owner->domain = NULL; dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n", __func__, &pagetable); @@ -803,18 +810,22 @@ static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain, if (owner->domain) exynos_iommu_detach_device(owner->domain, dev); + spin_lock_irqsave(&domain->lock, flags); list_for_each_entry(data, &owner->controllers, owner_node) { + spin_lock(&data->lock); data->pgtable = pagetable; data->domain = domain; + list_add_tail(&data->domain_node, &domain->clients); + spin_unlock(&data->lock); + } + owner->domain = iommu_domain; + spin_unlock_irqrestore(&domain->lock, flags); + + list_for_each_entry(data, &owner->controllers, owner_node) { pm_runtime_get_sync(data->sysmmu); __sysmmu_enable(data); - - spin_lock_irqsave(&domain->lock, flags); - list_add_tail(&data->domain_node, &domain->clients); - spin_unlock_irqrestore(&domain->lock, flags); } - owner->domain = iommu_domain; dev_dbg(dev, "%s: Attached IOMMU with pgtable %pa\n", __func__, &pagetable); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 0/7] Exynos IOMMU: proper runtime PM support (use device dependencies)
Hello, This is another update of the patchset for adding proper runtime PM support to Exynos IOMMU driver. This has been achieved by using recently introduced device links, which lets SYSMMU controller's runtime PM to follow master's device runtime PM state (the device which actually performs DMA transaction). The main idea behind this solution is an observation that any DMA activity from master device can be done only when master device is active, thus when master device is suspended SYSMMU controller device can also be suspended. This patchset solves the problem of all power domains being always enabled. It happened, because all SYSMMU controllers (which belongs to the same domains) are permanently kept active, because existing driver was simplified and kept SYSMMU device active all the time after initialization and attaching to the master device. This patchset is based on sixth version of Rafael's "Functional dependencies between devices" patchset [1], which has been merged to Greg's driver-core-next branch [2] (last patch commit id is baa8809f60971d10220dfe79248f54b2b265f003). As Greg pointed, the branch will not be rebased and can be used as a base for applying my patchset [3]. Joerg: I hope you can merge this version on top of Greg's driver-core-next branch to iommu tree. [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1261311.html [2] git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git driver-core-next [3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264906.html If one wants to test this patchset, I've provided a branch with all needed patches (a small fix for Exynos4 FIMC-IS DTS is still needed, it is already in samsung tree): https://git.linaro.org/people/marek.szyprowski/linux-srpol.git v4.9-iommu-pm-v6 Best regards Marek Szyprowski Samsung R&D Institute Poland Changelog: v6: - removed LATE_SYSTEM_SLEEP_PM_OPS-based workaround, because it is no longer needed after introducing device links (they also take care of proper system sleep suspend/resume sequence) - updated some comments v5: https://lkml.org/lkml/2016/10/20/70 - split main patch into several small changes for easier review (requested by Luis Rodriquez) - fixed usage of runtime_pm_active, now it is guarded by pm_runtime_get_noresume() and pm_runtime_put() pair v4: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1241601.html - rebased on top of v4 of device dependencies/links patchset, what resolved system hang on reboot v3: http://www.spinics.net/lists/linux-samsung-soc/msg55256.html - rebased on top of latest device dependencies/links patchset - added proper locking between runtime pm, iommu_attach/detach and sysmmu enable/disable(added per iommu owner device's rpm lock) v2: http://www.spinics.net/lists/arm-kernel/msg512082.html - replaced PM notifiers with generic device dependencies/links developed by Rafael J. Wysocki v1: http://www.spinics.net/lists/arm-kernel/msg509600.html - initial version Patch summary: Marek Szyprowski (7): iommu/exynos: Remove excessive, useless debug iommu/exynos: Remove dead code iommu/exynos: Simplify internal enable/disable functions iommu/exynos: Set master device once on boot iommu/exynos: Rework and fix internal locking iommu/exynos: Add runtime pm support iommu/exynos: Use device dependency links to control runtime pm drivers/iommu/exynos-iommu.c | 230 ++- 1 file changed, 95 insertions(+), 135 deletions(-) -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 1/7] iommu/exynos: Remove excessive, useless debug
Remove excessive, useless debug about skipping TLB invalidation, which is a normal situation when more aggressive power management is enabled. Signed-off-by: Marek Szyprowski --- drivers/iommu/exynos-iommu.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 30808e9..8ba0d60 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -578,9 +578,6 @@ static void sysmmu_tlb_invalidate_entry(struct sysmmu_drvdata *data, sysmmu_unblock(data); } clk_disable(data->clk_master); - } else { - dev_dbg(data->master, - "disabled. Skipping TLB invalidation @ %#x\n", iova); } spin_unlock_irqrestore(&data->lock, flags); } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma-iommu: properly respect configured address space size
Hi Marek, On 07/11/16 13:06, Marek Szyprowski wrote: > When one called iommu_dma_init_domain() with size smaller than device's > DMA mask, the alloc_iova() will not respect it and always assume that all > IOVA addresses will be allocated from the the (base ... dev->dma_mask) range. Is that actually a problem for anything? > This patch fixes this issue by taking the configured address space size > parameter into account (if it is smaller than the device's dma_mask). TBH I've been pondering ripping the size stuff out of dma-iommu, as it all stems from me originally failing to understand what dma_32bit_pfn is actually for. The truth is that iova_domains just don't have a size or upper limit; however if devices with both large and small DMA masks share a domain, then the top-down nature of the allocator means that allocating for the less-capable devices would involve walking through every out-of-range entry in the tree every time. Having cached32_node based on dma_32bit_pfn just provides an optimised starting point for searching within the smaller mask. Would it hurt any of your use-cases to relax/rework the reinitialisation checks in iommu_dma_init_domain()? Alternatively if we really do have a case for wanting a hard upper limit, it might make more sense to add an end_pfn to the iova_domain and handle it in the allocator itself. Robin. > Signed-off-by: Marek Szyprowski > --- > drivers/iommu/dma-iommu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index c5ab8667e6f2..8b4b72654359 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -212,11 +212,13 @@ static struct iova *__alloc_iova(struct iommu_domain > *domain, size_t size, > > if (domain->geometry.force_aperture) > dma_limit = min(dma_limit, domain->geometry.aperture_end); > + > + dma_limit = min(dma_limit >> shift, (dma_addr_t)iovad->dma_32bit_pfn); > /* >* Enforce size-alignment to be safe - there could perhaps be an >* attribute to control this per-device, or at least per-domain... >*/ > - return alloc_iova(iovad, length, dma_limit >> shift, true); > + return alloc_iova(iovad, length, dma_limit, true); > } > > /* The IOVA allocator knows what we mapped, so just unmap whatever that was > */ > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu