Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
On 5/6/24 12:43 PM, Jason Gunthorpe wrote: > On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote: >> We have legacy workloads using VFIO in userspace/kvm guests running >> on downstream distro kernels. We want these workloads to be able to >> continue running on our arch. > > It has been broken since 2018, I don't find this reasoning entirely > reasonable :\ > Raptor is currently working on an automated test runner setup to exercise the VFIO subsystem on PowerNV and (to a lesser extent) pSeries, so breakages like this going forward will hopefully be caught much more quickly. >> I firmly believe the refactoring in this patch series is a step in >> that direction. > > But fine, as long as we are going to fix it. PPC really needs this to > be resolved to keep working. > Agreed. Modernizing/de-cluttering PPC's IOMMU code in general is another task that we're working towards. As mentioned previously on the list, we're working towards a more standard IOMMU driver for PPC that can be used with dma_iommu, which I think will be a good step towards cleaning this up. Initially PowerNV is going to be our target, but to the extent that it is possible and useful, pSeries support could be brought in later. > Jason Thanks, Shawn
Re: [PATCH v3 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
Hi Krishna, On 6/24/24 7:09 AM, Krishna Kumar wrote: > Description of the problem: The hotplug driver for powerpc > (pci/hotplug/pnv_php.c) gives kernel crash when we try to > hot-unplug/disable the PCIe switch/bridge from the PHB. > > Root Cause of Crash: The crash is due to the reason that, though the msi > data structure has been released during disable/hot-unplug path and it > has been assigned with NULL, still during unregistartion the code was > again trying to explicitly disable the msi which causes the Null pointer > dereference and kernel crash. > > Proposed Fix : The fix is to correct the check during unregistration path > so that the code should not try to invoke pci_disable_msi/msix() if its > data structure is already freed. > > Cc: Michael Ellerman > Cc: Nicholas Piggin > Cc: Christophe Leroy > Cc: "Aneesh Kumar K.V" > Cc: Bjorn Helgaas > Cc: Gaurav Batra > Cc: Nathan Lynch > Cc: Brian King > > Signed-off-by: Krishna Kumar As with v1, I can confirm that this patch solves the panic encountered when hotplugging PCIe bridges on POWER9. Tested-by: Shawn Anastasio Thanks, Shawn
Re: [PATCH v3 2/2] powerpc: hotplug driver bridge support
Hi Krishna, On 6/24/24 7:09 AM, Krishna Kumar wrote: > There is an issue with the hotplug operation when it's done on the > bridge/switch slot. The bridge-port and devices behind the bridge, which > become offline by hot-unplug operation, don't get hot-plugged/enabled by > doing hot-plug operation on that slot. Only the first port of the bridge > gets enabled and the remaining port/devices remain unplugged. The hot > plug/unplug operation is done by the hotplug driver > (drivers/pci/hotplug/pnv_php.c). > > Root Cause Analysis: This behavior is due to missing code for the > switch/bridge. The existing driver depends on pci_hp_add_devices() > function for device enablement. This function calls pci_scan_slot() on > only one device-node/port of the bridge, not on all the siblings' > device-node/port. > > The missing code needs to be added which will find all the sibling > device-nodes/bridge-ports and will run explicit pci_scan_slot() on > those. A new function has been added for this purpose which gets > invoked from pci_hp_add_devices(). This new function > pci_traverse_sibling_nodes_and_scan_slot() gets all the sibling > bridge-ports by traversal and explicitly invokes pci_scan_slot on them. > > Cc: Michael Ellerman > Cc: Nicholas Piggin > Cc: Christophe Leroy > Cc: "Aneesh Kumar K.V" > Cc: Bjorn Helgaas > Cc: Gaurav Batra > Cc: Nathan Lynch > Cc: Brian King > > Signed-off-by: Krishna Kumar Other than the case with NVMe devices failing that we discussed in v1's thread, I can confirm that this patch resolves many of the issues we've encountered with PCIe hotplug on POWER9. Tested-by: Shawn Anastasio Thanks, Shawn
Re: [PATCH 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
Hi Krishna, On 5/9/24 7:05 AM, Krishna Kumar wrote: > Description of the problem: The hotplug driver for powerpc > (pci/hotplug/pnv_php.c) gives kernel crash when we try to > hot-unplug/disable the PCIe switch/bridge from the PHB. > > > Root Cause of Crash: The crash is due to the reason that, though the msi > data structure has been released during disable/hot-unplug path and it > has been assigned with NULL, still during unregistartion the code was > again trying to explicitly disable the msi which causes the Null pointer > dereference and kernel crash. > > > Proposed Fix : The fix is to correct the check during unregistration path > so that the code should not try to invoke pci_disable_msi/msix() if its > data structure is already freed. > > > Signed-off-by: Krishna Kumar I've tested this on a POWER9 box and can confirm that it fixes the panics when hotplugging PCIe bridges. Tested-by: Shawn Anastasio Thanks, Shawn
Re: [PATCH v3 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
Hi Michael, On 6/27/24 11:48 PM, Michael Ellerman wrote: > Was the panic reported anywhere? So we can link to the report in the > commit. It was indeed -- here is the link to the thread from last December: https://lists.ozlabs.org/pipermail/linuxppc-dev/2023-December/267034.html > cheers Thanks, Shawn
[PATCH] powerpc/dma: Fix invalid DMA mmap behavior
The refactor of powerpc DMA functions in commit cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly changes the way DMA mappings are handled on powerpc. Since this change, all mapped pages are marked as cache-inhibited through the default implementation of arch_dma_mmap_pgprot. This differs from the previous behavior of only marking pages in noncoherent mappings as cache-inhibited and has resulted in sporadic system crashes in certain hardware configurations and workloads (see Bugzilla). This commit restores the previous correct behavior by providing an implementation of arch_dma_mmap_pgprot that only marks pages in noncoherent mappings as cache-inhibited. As this behavior should be universal for all powerpc platforms a new file, dma-generic.c, was created to store it. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204145 Fixes: cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") Signed-off-by: Shawn Anastasio --- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/Makefile | 3 ++- arch/powerpc/kernel/dma-common.c | 17 + 3 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/dma-common.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index d8dcd8820369..77f6ebf97113 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -121,6 +121,7 @@ config PPC select ARCH_32BIT_OFF_T if PPC32 select ARCH_HAS_DEBUG_VIRTUAL select ARCH_HAS_DEVMEM_IS_ALLOWED + select ARCH_HAS_DMA_MMAP_PGPROT select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 56dfa7a2a6f2..ea0c69236789 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -49,7 +49,8 @@ obj-y := cputable.o ptrace.o syscalls.o \ signal.o sysfs.o cacheinfo.o time.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ - of_platform.o prom_parse.o + of_platform.o prom_parse.o \ + dma-common.o obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \ signal_64.o ptrace32.o \ paca.o nvram_64.o firmware.o diff --git a/arch/powerpc/kernel/dma-common.c b/arch/powerpc/kernel/dma-common.c new file mode 100644 index ..5a15f99f4199 --- /dev/null +++ b/arch/powerpc/kernel/dma-common.c @@ -0,0 +1,17 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Contains common dma routines for all powerpc platforms. + * + * Copyright (C) 2019 Shawn Anastasio (sh...@anastas.io) + */ + +#include +#include + +pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot, + unsigned long attrs) +{ + if (!dev_is_dma_coherent(dev)) + return pgprot_noncached(prot); + return prot; +} -- 2.22.0
Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior
On 7/17/19 9:59 PM, Alexey Kardashevskiy wrote: On 18/07/2019 09:54, Shawn Anastasio wrote: The refactor of powerpc DMA functions in commit cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly changes the way DMA mappings are handled on powerpc. Since this change, all mapped pages are marked as cache-inhibited through the default implementation of arch_dma_mmap_pgprot. This differs from the previous behavior of only marking pages in noncoherent mappings as cache-inhibited and has resulted in sporadic system crashes in certain hardware configurations and workloads (see Bugzilla). This commit restores the previous correct behavior by providing an implementation of arch_dma_mmap_pgprot that only marks pages in noncoherent mappings as cache-inhibited. As this behavior should be universal for all powerpc platforms a new file, dma-generic.c, was created to store it. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204145 Fixes: cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") Signed-off-by: Shawn Anastasio Is this the default one? include/linux/dma-noncoherent.h # define arch_dma_mmap_pgprot(dev, prot, attrs) pgprot_noncached(prot) Yep, that's the one. Out of curiosity - do not we want to fix this one for everyone? Other than m68k, mips, and arm64, everybody else that doesn't have ARCH_NO_COHERENT_DMA_MMAP set uses this default implementation, so I assume this behavior is acceptable on those architectures. Either way, the patch is correct. I'm glad to know it was not my " powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59" which broke it :) Yeah, turns out it was just bad luck that I happened to run into these crashes right after deciding to try out your patch :) Reviewed-by: Alexey Kardashevskiy Thanks! --- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/Makefile | 3 ++- arch/powerpc/kernel/dma-common.c | 17 + 3 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/dma-common.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index d8dcd8820369..77f6ebf97113 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -121,6 +121,7 @@ config PPC select ARCH_32BIT_OFF_T if PPC32 select ARCH_HAS_DEBUG_VIRTUAL select ARCH_HAS_DEVMEM_IS_ALLOWED + select ARCH_HAS_DMA_MMAP_PGPROT select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 56dfa7a2a6f2..ea0c69236789 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -49,7 +49,8 @@ obj-y := cputable.o ptrace.o syscalls.o \ signal.o sysfs.o cacheinfo.o time.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ - of_platform.o prom_parse.o + of_platform.o prom_parse.o \ + dma-common.o obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ signal_64.o ptrace32.o \ paca.o nvram_64.o firmware.o diff --git a/arch/powerpc/kernel/dma-common.c b/arch/powerpc/kernel/dma-common.c new file mode 100644 index ..5a15f99f4199 --- /dev/null +++ b/arch/powerpc/kernel/dma-common.c @@ -0,0 +1,17 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Contains common dma routines for all powerpc platforms. + * + * Copyright (C) 2019 Shawn Anastasio (sh...@anastas.io) + */ + +#include +#include + +pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot, + unsigned long attrs) +{ + if (!dev_is_dma_coherent(dev)) + return pgprot_noncached(prot); + return prot; +}
Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior
On 7/18/19 4:52 AM, Christoph Hellwig wrote: On Thu, Jul 18, 2019 at 10:49:34AM +0200, Christoph Hellwig wrote: On Thu, Jul 18, 2019 at 01:45:16PM +1000, Oliver O'Halloran wrote: Other than m68k, mips, and arm64, everybody else that doesn't have ARCH_NO_COHERENT_DMA_MMAP set uses this default implementation, so I assume this behavior is acceptable on those architectures. It might be acceptable, but there's no reason to use pgport_noncached if the platform supports cache-coherent DMA. Christoph (+cc) made the change so maybe he saw something we're missing. I always found the forcing of noncached access even for coherent devices a little odd, but this was inherited from the previous implementation, which surprised me a bit as the different attributes are usually problematic even on x86. Let me dig into the history a bit more, but I suspect the righ fix is to default to cached mappings for coherent devices. Ok, some history: The generic dma mmap implementation, which we are effectively still using today was added by: commit 64ccc9c033c6089b2d426dad3c56477ab066c999 Author: Marek Szyprowski Date: Thu Jun 14 13:03:04 2012 +0200 common: dma-mapping: add support for generic dma_mmap_* calls and unconditionally uses pgprot_noncached in dma_common_mmap, which is then used as the fallback by dma_mmap_attrs if no ->mmap method is present. At that point we already had the powerpc implementation that only uses pgprot_noncached for non-coherent mappings, and the arm one, which uses pgprot_writecombine if DMA_ATTR_WRITE_COMBINE is set and otherwise pgprot_dmacoherent, which seems to be uncached. Arm did support coherent platforms at that time, but they might have been an afterthought and not handled properly. So it migt have been that we were all wrong for that time and might have to fix it up. Personally, I'm not a huge fan of an implicit default for something inherently architecture-dependent like this at all. What I'd like to see is a mechanism that forces architecture code to explicitly opt in to the default pgprot settings if they don't provide an implementation of arch_dma_mmap_pgprot. This could perhaps be done by reversing ARCH_HAS_DMA_MMAP_PGPROT to something like ARCH_USE_DEFAULT_DMA_MMAP_PGPROT. This way as more systems are moved to use the common mmap code instead of their ops->mmap, the people doing the refactoring have to make an explicit decision about the pgprot settings to use. Such a configuration would have likely prevented this situation with powerpc from happening. That being said, if the default behavior doesn't make sense in the general case it should probably be fixed as well. Curious to hear some thoughts on this.
Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior
On 7/19/19 2:06 AM, Christoph Hellwig wrote: > What is inherently architecture specific here over the fact that > the pgprot_* expand to architecture specific bits? What I meant is that different architectures seem to have different criteria for setting the different pgprot_ bits. i.e. ppc checks for cache coherency, arm64 checks for cache coherency and writecombine, mips just checks for writecombine, etc. That being said, I'm no expert here and there is probably some behavior here that would make for a much more sane default. > I'd rather not create boilerplate code where we don't have to it. Note > that arch_dma_mmap_pgprot is a little misnamed now, as we also use it > for the in-kernel remapping in kernel/dma/remap.c, which I'm slowly > switching a lot of architectures to. powerpc will follow soon once > I get the ppc44x that was given to me to actually boot with a recent > kernel (not that I've tried much so far). Fair enough. I didn't realize that most of the other architectures don't use the common code anyways as you mention below. > Every arch except for arm32 now uses dma direct for the direct mapping, > and thus the common code without the indeed odd default. I also have > a series to remove the default fallback, which is inherently a bad idea: > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-no-defaults Awesome. This is great to hear. > I think your patch that started this thread is fine for 5.3 and -stable: > > Reviewed-by: Christoph Hellwig Thanks! > But going forward I'd rather have a sane default. Agreed.
Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior
On 7/22/19 7:16 AM, Michael Ellerman wrote: Arnd Bergmann writes: On Thu, Jul 18, 2019 at 11:52 AM Christoph Hellwig wrote: On Thu, Jul 18, 2019 at 10:49:34AM +0200, Christoph Hellwig wrote: On Thu, Jul 18, 2019 at 01:45:16PM +1000, Oliver O'Halloran wrote: Other than m68k, mips, and arm64, everybody else that doesn't have ARCH_NO_COHERENT_DMA_MMAP set uses this default implementation, so I assume this behavior is acceptable on those architectures. It might be acceptable, but there's no reason to use pgport_noncached if the platform supports cache-coherent DMA. Christoph (+cc) made the change so maybe he saw something we're missing. I always found the forcing of noncached access even for coherent devices a little odd, but this was inherited from the previous implementation, which surprised me a bit as the different attributes are usually problematic even on x86. Let me dig into the history a bit more, but I suspect the righ fix is to default to cached mappings for coherent devices. Ok, some history: The generic dma mmap implementation, which we are effectively still using today was added by: commit 64ccc9c033c6089b2d426dad3c56477ab066c999 Author: Marek Szyprowski Date: Thu Jun 14 13:03:04 2012 +0200 common: dma-mapping: add support for generic dma_mmap_* calls and unconditionally uses pgprot_noncached in dma_common_mmap, which is then used as the fallback by dma_mmap_attrs if no ->mmap method is present. At that point we already had the powerpc implementation that only uses pgprot_noncached for non-coherent mappings, and the arm one, which uses pgprot_writecombine if DMA_ATTR_WRITE_COMBINE is set and otherwise pgprot_dmacoherent, which seems to be uncached. Arm did support coherent platforms at that time, but they might have been an afterthought and not handled properly. Cache-coherent devices are still very rare on 32-bit ARM. Among the callers of dma_mmap_coherent(), almost all are in platform specific device drivers that only ever run on noncoherent ARM SoCs, which explains why nobody would have noticed problems. There is also a difference in behavior between ARM and PowerPC when dealing with mismatched cacheability attributes: If the same page is mapped as both cached and uncached to, this may cause silent undefined behavior on ARM, while PowerPC should enter a checkstop as soon as it notices. On newer Power CPUs it's actually more like the ARM behaviour. I don't know for sure that it will *never* checkstop but there are at least cases where it won't. There's some (not much) detail in the Power8/9 user manuals. The issue was discovered due to sporadic checkstops on P9, so it seems like it will happen at least sometimes. cheers
Re: [PATCH 1/2] dma-mapping: fix page attributes for dma_mmap_*
On 8/5/19 10:01 AM, Christoph Hellwig wrote: diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h index 3813211a9aad..9ae5cee543c4 100644 --- a/include/linux/dma-noncoherent.h +++ b/include/linux/dma-noncoherent.h @@ -42,13 +42,8 @@ void arch_dma_free(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs); long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr, dma_addr_t dma_addr); - -#ifdef CONFIG_ARCH_HAS_DMA_MMAP_PGPROT pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs); -#else -# define arch_dma_mmap_pgprot(dev, prot, attrs)pgprot_noncached(prot) -#endif Nit, but maybe the prototype should still be ifdef'd here? It at least could prevent a reader from incorrectly thinking that the function is always present. Also, like Will mentioned earlier, the function name isn't entirely accurate anymore. I second the suggestion of using something like arch_dma_noncoherent_pgprot(). As for your idea of defining pgprot_dmacoherent for all architectures as #ifndef pgprot_dmacoherent #define pgprot_dmacoherent pgprot_noncached #endif I think that the name here is kind of misleading too, since this definition will only be used when there is no support for proper DMA coherency.
Re: [PATCH 1/2] dma-mapping: fix page attributes for dma_mmap_*
On 8/7/19 8:04 AM, Christoph Hellwig wrote: Actually it is typical modern Linux style to just provide a prototype and then use "if (IS_ENABLED(CONFIG_FOO))" to guard the call(s) to it. I see. Also, like Will mentioned earlier, the function name isn't entirely accurate anymore. I second the suggestion of using something like arch_dma_noncoherent_pgprot(). As mentioned I plan to remove arch_dma_mmap_pgprot for 5.4, so I'd rather avoid churn for the short period of time. Yeah, fair enough. As for your idea of defining pgprot_dmacoherent for all architectures as #ifndef pgprot_dmacoherent #define pgprot_dmacoherent pgprot_noncached #endif I think that the name here is kind of misleading too, since this definition will only be used when there is no support for proper DMA coherency. Do you have a suggestion for a better name? I'm pretty bad at naming, so just reusing the arm name seemed like a good way to avoid having to make naming decisions myself. Good question. Perhaps something like `pgprot_dmacoherent_fallback` would better convey that this is only used for devices that don't support DMA coherency? Or maybe `pgprot_dma_noncoherent`?
[PATCH 0/2] Fix IOMMU setup for hotplugged devices on pseries
On pseries QEMU guests, IOMMU setup for hotplugged PCI devices is currently broken for all but the first device on a given bus. The culprit is an ordering issue in the pseries hotplug path (via pci_rescan_bus()) which results in IOMMU group assigment occuring before device registration in sysfs. This triggers the following check in arch/powerpc/kernel/iommu.c: /* * The sysfs entries should be populated before * binding IOMMU group. If sysfs entries isn't * ready, we simply bail. */ if (!device_is_registered(dev)) return -ENOENT; This fails for hotplugged devices since the pcibios_add_device() call in the pseries hotplug path (in pci_device_add()) occurs before device_add(). Since the IOMMU groups are set up in pcibios_add_device(), this means that a sysfs entry will not yet be present and it will fail. There is a special case that allows the first hotplugged device on a bus to succeed, though. The powerpc pcibios_add_device() implementation will skip initializing the device if bus setup is not yet complete. Later, the pci core will call pcibios_fixup_bus() which will perform setup for the first (and only) device on the bus and since it has already been registered in sysfs, the IOMMU setup will succeed. My current solution is to introduce another pcibios function, pcibios_fixup_dev, which is called after device_add() in pci_device_add(). Then in powerpc code, pcibios_setup_device() was moved from pcibios_add_device() to this new function which will occur after sysfs registration so IOMMU assignment will succeed. I added a new pcibios function rather than moving the pcibios_add_device() call to after the device_add() call in pci_add_device() because there are other architectures that use it and it wasn't immediately clear to me whether moving it would break them. If anybody has more insight or a better way to fix this, please let me know. Shawn Anastasio (2): PCI: Introduce pcibios_fixup_dev() powerpc/pci: Fix IOMMU setup for hotplugged devices on pseries arch/powerpc/kernel/pci-common.c | 13 ++--- drivers/pci/probe.c | 14 ++ include/linux/pci.h | 1 + 3 files changed, 21 insertions(+), 7 deletions(-) -- 2.20.1
[PATCH 1/2] PCI: Introduce pcibios_fixup_dev()
Introduce pcibios_fixup_dev to allow platform-specific code to perform final setup of a PCI device after it has been registered in sysfs. The default implementation is a no-op. Signed-off-by: Shawn Anastasio --- drivers/pci/probe.c | 14 ++ include/linux/pci.h | 1 + 2 files changed, 15 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index a3c7338fad86..14eb7ee38794 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2652,6 +2652,17 @@ static void pci_set_msi_domain(struct pci_dev *dev) dev_set_msi_domain(&dev->dev, d); } +/** + * pcibios_fixup_dev - Platform-specific device setup + * @dev: Device to set up + * + * Default empty implementation. Replace with an architecture-specific + * setup routine, if necessary. + */ +void __weak pcibios_fixup_dev(struct pci_dev *dev) +{ +} + void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) { int ret; @@ -2699,6 +2710,9 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) dev->match_driver = false; ret = device_add(&dev->dev); WARN_ON(ret < 0); + + /* Allow platform-specific code to perform final setup of device */ + pcibios_fixup_dev(dev); } struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) diff --git a/include/linux/pci.h b/include/linux/pci.h index 82e4cd1b7ac3..83eb0e241137 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -960,6 +960,7 @@ void pcibios_bus_add_device(struct pci_dev *pdev); void pcibios_add_bus(struct pci_bus *bus); void pcibios_remove_bus(struct pci_bus *bus); void pcibios_fixup_bus(struct pci_bus *); +void pcibios_fixup_dev(struct pci_dev *); int __must_check pcibios_enable_device(struct pci_dev *, int mask); /* Architecture-specific versions may override this (weak) */ char *pcibios_setup(char *str); -- 2.20.1
[PATCH 2/2] powerpc/pci: Fix IOMMU setup for hotplugged devices on pseries
Move PCI device setup from pcibios_add_device() to pcibios_fixup_dev(). This ensures that platform-specific DMA and IOMMU setup occurs after the device has been registered in sysfs, which is a requirement for IOMMU group assignment to work. This fixes IOMMU group assignment for hotplugged devices on pseries, where the existing behavior results in IOMMU assignment before registration. Signed-off-by: Shawn Anastasio --- arch/powerpc/kernel/pci-common.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index f627e15bb43c..21b4761bb0ed 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -987,15 +987,14 @@ static void pcibios_setup_device(struct pci_dev *dev) ppc_md.pci_irq_fixup(dev); } -int pcibios_add_device(struct pci_dev *dev) +void pcibios_fixup_dev(struct pci_dev *dev) { - /* -* We can only call pcibios_setup_device() after bus setup is complete, -* since some of the platform specific DMA setup code depends on it. -*/ - if (dev->bus->is_added) - pcibios_setup_device(dev); + /* Device is registered in sysfs and ready to be set up */ + pcibios_setup_device(dev); +} +int pcibios_add_device(struct pci_dev *dev) +{ #ifdef CONFIG_PCI_IOV if (ppc_md.pcibios_fixup_sriov) ppc_md.pcibios_fixup_sriov(dev); -- 2.20.1
Re: [PATCH 0/2] Fix IOMMU setup for hotplugged devices on pseries
On 9/5/19 4:08 AM, Alexey Kardashevskiy wrot> I just tried hotplugging 3 virtio-net devices into a guest system with v5.2 kernel and it seems working (i.e. BARs mapped, a driver is bound): root@le-dbg:~# lspci -v | egrep -i '(virtio|Memory)' 00:00.0 Ethernet controller: Red Hat, Inc Virtio network device Memory at 20008004 (32-bit, non-prefetchable) [size=4K] Memory at 2100 (64-bit, prefetchable) [size=16K] Kernel driver in use: virtio-pci 00:01.0 Ethernet controller: Red Hat, Inc Virtio network device Memory at 200080041000 (32-bit, non-prefetchable) [size=4K] Memory at 21004000 (64-bit, prefetchable) [size=16K] Kernel driver in use: virtio-pci 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device Memory at 200080042000 (32-bit, non-prefetchable) [size=4K] Memory at 21008000 (64-bit, prefetchable) [size=16K] Kernel driver in use: virtio-pci Can you explain in detail what you are doing exactly and what is failing and what qemu/guest kernel/guest distro is used? Thanks, Sure. I'm on host kernel 5.2.8, guest on 5.3-rc7 (also tested on 5.1.16) and I'm hotplugging ivshmem devices to a separate spapr-pci-host-bridge defined as follows: -device spapr-pci-host-bridge,index=1,id=pci.1 Device hotplug and BAR assignment does work, but IOMMU group assignment seems to fail. This is evidenced by the kernel log which shows the following message for the first device but not the second: [ 136.849448] pci 0001:00:00.0: Adding to iommu group 1 Trying to bind the second device to vfio-pci as a result of this fails: [ 471.691948] vfio-pci: probe of 0001:00:01.0 failed with error -22 I traced that failure to a call to iommu_group_get() which returns NULL for the second device. I then traced that back to the ordering issue I described. For your second and third virtio-net devices, was the "Adding to iommu group N" kernel message printed?
Re: [PATCH 0/2] Fix IOMMU setup for hotplugged devices on pseries
On 9/5/19 4:38 AM, Lukas Wunner wrote: On Wed, Sep 04, 2019 at 11:22:13PM -0500, Shawn Anastasio wrote: If anybody has more insight or a better way to fix this, please let me know. Have you considered moving the invocation of pcibios_setup_device() to pcibios_bus_add_device()? The latter is called from pci_bus_add_device() in drivers/pci/bus.c. At this point device_add() has been called, so the device exists in sysfs. Basically when adding a PCI device, the order is: * pci_device_add() populates struct pci_dev, calls device_add(), binding the device to a driver is prevented * after pci_device_add() has been called for all discovered devices, resources are allocated * pci_bus_add_device() is called for each device, calls pcibios_bus_add_device() and binds the device to a driver Thank you, this is exactly what I was looking for! Just tested and this seems to work perfectly. I'll go ahead and submit a v2 that does this instead. Thanks again, Shawn
[PATCH v2 0/1] Fix IOMMU setup for hotplugged devices on pseries
Changes from v2: - Remove pcibios_fixup_dev() - Remove pcibios_setup_bus_device() call in pcibios_fixup_bus() since all device setup will now be handled in pcibios_bus_add_device() On pseries QEMU guests, IOMMU setup for hotplugged PCI devices is currently broken for all but the first device on a given bus. The culprit is an ordering issue in the pseries hotplug path (via pci_rescan_bus()) which results in IOMMU group assigment occuring before device registration in sysfs. This triggers the following check in arch/powerpc/kernel/iommu.c: /* * The sysfs entries should be populated before * binding IOMMU group. If sysfs entries isn't * ready, we simply bail. */ if (!device_is_registered(dev)) return -ENOENT; This fails for hotplugged devices since the pcibios_add_device() call in the pseries hotplug path (in pci_device_add()) occurs before device_add(). Since the IOMMU groups are set up in pcibios_add_device(), this means that a sysfs entry will not yet be present and it will fail. There is a special case that allows the first hotplugged device on a bus to succeed, though. The powerpc pcibios_add_device() implementation will skip initializing the device if bus setup is not yet complete. Later, the pci core will call pcibios_fixup_bus() which will perform setup for the first (and only) device on the bus and since it has already been registered in sysfs, the IOMMU setup will succeed. The current solution is to move all device setup to pcibios_bus_add_device() which will occur after all devices have been registered. Shawn Anastasio (1): powerpc/pci: Fix pcibios_setup_device() ordering arch/powerpc/kernel/pci-common.c | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) -- 2.20.1
[PATCH v2 1/1] powerpc/pci: Fix pcibios_setup_device() ordering
Move PCI device setup from pcibios_add_device() and pcibios_fixup_bus() to pcibios_bus_add_device(). This ensures that platform-specific DMA and IOMMU setup occurs after the device has been registered in sysfs, which is a requirement for IOMMU group assignment to work This fixes IOMMU group assignment for hotplugged devices on pseries, where the existing behavior results in IOMMU assignment before registration. Thanks to Lukas Wunner for the suggestion. Signed-off-by: Shawn Anastasio --- arch/powerpc/kernel/pci-common.c | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index f627e15bb43c..d119c77efb69 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -261,12 +261,6 @@ int pcibios_sriov_disable(struct pci_dev *pdev) #endif /* CONFIG_PCI_IOV */ -void pcibios_bus_add_device(struct pci_dev *pdev) -{ - if (ppc_md.pcibios_bus_add_device) - ppc_md.pcibios_bus_add_device(pdev); -} - static resource_size_t pcibios_io_size(const struct pci_controller *hose) { #ifdef CONFIG_PPC64 @@ -987,15 +981,17 @@ static void pcibios_setup_device(struct pci_dev *dev) ppc_md.pci_irq_fixup(dev); } -int pcibios_add_device(struct pci_dev *dev) +void pcibios_bus_add_device(struct pci_dev *pdev) { - /* -* We can only call pcibios_setup_device() after bus setup is complete, -* since some of the platform specific DMA setup code depends on it. -*/ - if (dev->bus->is_added) - pcibios_setup_device(dev); + /* Perform platform-specific device setup */ + pcibios_setup_device(pdev); + + if (ppc_md.pcibios_bus_add_device) + ppc_md.pcibios_bus_add_device(pdev); +} +int pcibios_add_device(struct pci_dev *dev) +{ #ifdef CONFIG_PCI_IOV if (ppc_md.pcibios_fixup_sriov) ppc_md.pcibios_fixup_sriov(dev); @@ -1037,9 +1033,6 @@ void pcibios_fixup_bus(struct pci_bus *bus) /* Now fixup the bus bus */ pcibios_setup_bus_self(bus); - - /* Now fixup devices on that bus */ - pcibios_setup_bus_devices(bus); } EXPORT_SYMBOL(pcibios_fixup_bus); -- 2.20.1
Re: [PATCH v2 1/1] powerpc/pci: Fix pcibios_setup_device() ordering
On 9/9/19 2:59 AM, Alexey Kardashevskiy wrote: On 06/09/2019 05:13, Shawn Anastasio wrote: Move PCI device setup from pcibios_add_device() and pcibios_fixup_bus() to pcibios_bus_add_device(). This ensures that platform-specific DMA and IOMMU setup occurs after the device has been registered in sysfs, which is a requirement for IOMMU group assignment to work This fixes IOMMU group assignment for hotplugged devices on pseries, where the existing behavior results in IOMMU assignment before registration. Although this is a correct approach which we should proceed with, this breaks adding of SRIOV VFs from pnv_tce_iommu_bus_notifier (and possibly the bare metal PCI hotplug), I am trying to fix that now... Were you able to make any progress? I can think of a couple of ways to fix SRIOV, but they're not particularly elegant and involve duplication.
Re: IOMMU group creation for pseries hotplug, and powernv VFs
On 9/29/19 9:08 PM, Oliver O'Halloran wrote: A couple of extra patches on top of Shawn's existing re-ordering patch. This seems to fix the problem Alexey noted with Shawn's change causing VFs to lose their IOMMU group. I've tried pretty hard to make this a minimal fix it's still a bit large. If mpe is happy to take this as a fix for 5.4 then I'll leave it, otherwise we might want to look at different approaches. Thanks for fixing this Oliver! Reviewed-by: Shawn Anastasio
Re: [PATCH v2 1/3] powernv/iov: Ensure the pdn for VFs always contains a valid PE number
On 10/28/19 3:54 AM, Oliver O'Halloran wrote: On pseries there is a bug with adding hotplugged devices to an IOMMU group. For a number of dumb reasons fixing that bug first requires re-working how VFs are configured on PowerNV. Are these patches expected to land soon, or is there another fix in the pipeline? As far as I can tell this is still an issue.
[PATCH 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init
On pseries, custom PCI resource alignment specified with the commandline argument pci=resource_alignment is disabled due to PCI resources being managed by the firmware. However, in the case of PCI hotplug the resources are managed by the kernel, so custom alignments should be honored in these cases. This is done by only honoring custom alignments after initial PCI initialization is done, to ensure that all devices managed by the firmware are excluded. Without this ability, sub-page BARs sometimes get mapped in between page boundaries for hotplugged devices and are therefore unusable with the VFIO framework. This change allows users to request page alignment for devices they wish to access via VFIO using the pci=resource_alignment commandline argument. In the future, this could be extended to provide page-aligned resources by default for hotplugged devices, similar to what is done on powernv by commit 382746376993 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned") Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/machdep.h | 3 +++ arch/powerpc/kernel/pci-common.c | 9 + arch/powerpc/platforms/pseries/setup.c | 22 ++ 3 files changed, 34 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 2fbfaa9176ed..46eb62c0954e 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -179,6 +179,9 @@ struct machdep_calls { resource_size_t (*pcibios_default_alignment)(void); + /* Called when determining PCI resource alignment */ + int (*pcibios_ignore_alignment_request)(void); + #ifdef CONFIG_PCI_IOV void (*pcibios_fixup_sriov)(struct pci_dev *pdev); resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno); diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index ff4b7539cbdf..1a6ded45a701 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void) return 0; } +resource_size_t pcibios_ignore_alignment_request(void) +{ + if (ppc_md.pcibios_ignore_alignment_request) + return ppc_md.pcibios_ignore_alignment_request(); + + /* Fall back to default method of checking PCI_PROBE_ONLY */ + return pci_has_flag(PCI_PROBE_ONLY); +} + #ifdef CONFIG_PCI_IOV resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno) { diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index e4f0dfd4ae33..c6af2ed8ee0f 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize); int fwnmi_active; /* TRUE if an FWNMI handler is present */ +int initial_pci_init_done; /* TRUE if initial pcibios init has completed */ + static void pSeries_show_cpuinfo(struct seq_file *m) { struct device_node *root; @@ -749,6 +751,23 @@ static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev, } #endif +static void pseries_after_init(void) +{ + initial_pci_init_done = 1; +} + +static int pseries_ignore_alignment_request(void) +{ + if (initial_pci_init_done) + /* +* Allow custom alignments after init for things +* like PCI hotplugging. +*/ + return 0; + + return pci_has_flag(PCI_PROBE_ONLY); +} + static void __init pSeries_setup_arch(void) { set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); @@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void) } ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare; + ppc_md.pcibios_after_init = pseries_after_init; + ppc_md.pcibios_ignore_alignment_request = + pseries_ignore_alignment_request; } static void pseries_panic(char *str) -- 2.20.1
[PATCH 1/3] PCI: Introduce pcibios_ignore_alignment_request
Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 766f5779db92..e7a925f2188e 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5896,6 +5896,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5920,9 +5925,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } -- 2.20.1
[PATCH 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64
Enable the pcibios_after_init hook on all powerpc platforms. This hook is executed at the end of pcibios_init and was previously only available on CONFIG_PPC32. Since it is useful and not inherently limited to 32-bit mode, remove the limitation and allow it on all powerpc platforms. Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/machdep.h | 3 +-- arch/powerpc/kernel/pci_64.c | 4 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 2f0ca6560e47..2fbfaa9176ed 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -150,6 +150,7 @@ struct machdep_calls { void(*init)(void); void(*kgdb_map_scc)(void); +#endif /* CONFIG_PPC32 */ /* * optional PCI "hooks" @@ -157,8 +158,6 @@ struct machdep_calls { /* Called at then very end of pcibios_init() */ void (*pcibios_after_init)(void); -#endif /* CONFIG_PPC32 */ - /* Called in indirect_* to avoid touching devices */ int (*pci_exclude_device)(struct pci_controller *, unsigned char, unsigned char); diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index 9d8c10d55407..fba7fe6e4a50 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -68,6 +68,10 @@ static int __init pcibios_init(void) printk(KERN_DEBUG "PCI: Probing PCI hardware done\n"); + /* Call machine dependent post-init code */ + if (ppc_md.pcibios_after_init) + ppc_md.pcibios_after_init(); + return 0; } -- 2.20.1
[PATCH 0/3] Allow custom PCI resource alignment on pseries
(Resent to include relevant mailing lists - sorry about that!) Hello all, This patch set implements support for user-specified PCI resource alignment on the pseries platform for hotplugged PCI devices. Currently on pseries, PCI resource alignments specified with the pci=resource_alignment commandline argument are ignored, since the firmware is in charge of managing the PCI resources. In the case of hotplugged devices, though, the kernel is in charge of configuring the resources and should obey alignment requirements. The current behavior of ignoring the alignment for hotplugged devices results in sub-page BARs landing between page boundaries and becoming un-mappable from userspace via the VFIO framework. This issue was observed on a pseries KVM guest with hotplugged ivshmem devices. With these changes, users can specify an appropriate pci=resource_alignment argument on boot for devices they wish to use with VFIO. In the future, this could be extended to provide page-aligned resources by default for hotplugged devices, similar to what is done on powernv by commit 382746376993 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned"). Feedback is appreciated. Thanks, Shawn Shawn Anastasio (3): PCI: Introduce pcibios_ignore_alignment_request powerpc/64: Enable pcibios_after_init hook on ppc64 powerpc/pseries: Allow user-specified PCI resource alignment after init arch/powerpc/include/asm/machdep.h | 6 -- arch/powerpc/kernel/pci-common.c | 9 + arch/powerpc/kernel/pci_64.c | 4 arch/powerpc/platforms/pseries/setup.c | 22 ++ drivers/pci/pci.c | 9 +++-- 5 files changed, 46 insertions(+), 4 deletions(-) -- 2.20.1
[RESEND PATCH 0/3] Allow custom PCI resource alignment on pseries
Hello all, This patch set implements support for user-specified PCI resource alignment on the pseries platform for hotplugged PCI devices. Currently on pseries, PCI resource alignments specified with the pci=resource_alignment commandline argument are ignored, since the firmware is in charge of managing the PCI resources. In the case of hotplugged devices, though, the kernel is in charge of configuring the resources and should obey alignment requirements. The current behavior of ignoring the alignment for hotplugged devices results in sub-page BARs landing between page boundaries and becoming un-mappable from userspace via the VFIO framework. This issue was observed on a pseries KVM guest with hotplugged ivshmem devices. With these changes, users can specify an appropriate pci=resource_alignment argument on boot for devices they wish to use with VFIO. In the future, this could be extended to provide page-aligned resources by default for hotplugged devices, similar to what is done on powernv by commit 382746376993 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned"). Feedback is appreciated. Thanks, Shawn Shawn Anastasio (3): PCI: Introduce pcibios_ignore_alignment_request powerpc/64: Enable pcibios_after_init hook on ppc64 powerpc/pseries: Allow user-specified PCI resource alignment after init arch/powerpc/include/asm/machdep.h | 6 -- arch/powerpc/kernel/pci-common.c | 9 + arch/powerpc/kernel/pci_64.c | 4 arch/powerpc/platforms/pseries/setup.c | 22 ++ drivers/pci/pci.c | 9 +++-- 5 files changed, 46 insertions(+), 4 deletions(-) -- 2.20.1
[RESEND PATCH 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64
Enable the pcibios_after_init hook on all powerpc platforms. This hook is executed at the end of pcibios_init and was previously only available on CONFIG_PPC32. Since it is useful and not inherently limited to 32-bit mode, remove the limitation and allow it on all powerpc platforms. Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/machdep.h | 3 +-- arch/powerpc/kernel/pci_64.c | 4 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 2f0ca6560e47..2fbfaa9176ed 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -150,6 +150,7 @@ struct machdep_calls { void(*init)(void); void(*kgdb_map_scc)(void); +#endif /* CONFIG_PPC32 */ /* * optional PCI "hooks" @@ -157,8 +158,6 @@ struct machdep_calls { /* Called at then very end of pcibios_init() */ void (*pcibios_after_init)(void); -#endif /* CONFIG_PPC32 */ - /* Called in indirect_* to avoid touching devices */ int (*pci_exclude_device)(struct pci_controller *, unsigned char, unsigned char); diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index 9d8c10d55407..fba7fe6e4a50 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -68,6 +68,10 @@ static int __init pcibios_init(void) printk(KERN_DEBUG "PCI: Probing PCI hardware done\n"); + /* Call machine dependent post-init code */ + if (ppc_md.pcibios_after_init) + ppc_md.pcibios_after_init(); + return 0; } -- 2.20.1
[RESEND PATCH 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init
On pseries, custom PCI resource alignment specified with the commandline argument pci=resource_alignment is disabled due to PCI resources being managed by the firmware. However, in the case of PCI hotplug the resources are managed by the kernel, so custom alignments should be honored in these cases. This is done by only honoring custom alignments after initial PCI initialization is done, to ensure that all devices managed by the firmware are excluded. Without this ability, sub-page BARs sometimes get mapped in between page boundaries for hotplugged devices and are therefore unusable with the VFIO framework. This change allows users to request page alignment for devices they wish to access via VFIO using the pci=resource_alignment commandline argument. In the future, this could be extended to provide page-aligned resources by default for hotplugged devices, similar to what is done on powernv by commit 382746376993 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned") Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/machdep.h | 3 +++ arch/powerpc/kernel/pci-common.c | 9 + arch/powerpc/platforms/pseries/setup.c | 22 ++ 3 files changed, 34 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 2fbfaa9176ed..46eb62c0954e 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -179,6 +179,9 @@ struct machdep_calls { resource_size_t (*pcibios_default_alignment)(void); + /* Called when determining PCI resource alignment */ + int (*pcibios_ignore_alignment_request)(void); + #ifdef CONFIG_PCI_IOV void (*pcibios_fixup_sriov)(struct pci_dev *pdev); resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno); diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index ff4b7539cbdf..1a6ded45a701 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void) return 0; } +resource_size_t pcibios_ignore_alignment_request(void) +{ + if (ppc_md.pcibios_ignore_alignment_request) + return ppc_md.pcibios_ignore_alignment_request(); + + /* Fall back to default method of checking PCI_PROBE_ONLY */ + return pci_has_flag(PCI_PROBE_ONLY); +} + #ifdef CONFIG_PCI_IOV resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno) { diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index e4f0dfd4ae33..c6af2ed8ee0f 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize); int fwnmi_active; /* TRUE if an FWNMI handler is present */ +int initial_pci_init_done; /* TRUE if initial pcibios init has completed */ + static void pSeries_show_cpuinfo(struct seq_file *m) { struct device_node *root; @@ -749,6 +751,23 @@ static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev, } #endif +static void pseries_after_init(void) +{ + initial_pci_init_done = 1; +} + +static int pseries_ignore_alignment_request(void) +{ + if (initial_pci_init_done) + /* +* Allow custom alignments after init for things +* like PCI hotplugging. +*/ + return 0; + + return pci_has_flag(PCI_PROBE_ONLY); +} + static void __init pSeries_setup_arch(void) { set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); @@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void) } ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare; + ppc_md.pcibios_after_init = pseries_after_init; + ppc_md.pcibios_ignore_alignment_request = + pseries_ignore_alignment_request; } static void pseries_panic(char *str) -- 2.20.1
[RESEND PATCH 1/3] PCI: Introduce pcibios_ignore_alignment_request
Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } -- 2.20.1
[PATCH v2 1/3] PCI: Introduce pcibios_ignore_alignment_request
Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } diff --git a/include/linux/pci.h b/include/linux/pci.h index 4a5a84d7bdd4..47471dcdbaf9 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {} int pcibios_alloc_irq(struct pci_dev *dev); void pcibios_free_irq(struct pci_dev *dev); resource_size_t pcibios_default_alignment(void); +int pcibios_ignore_alignment_request(void); #ifdef CONFIG_HIBERNATE_CALLBACKS extern struct dev_pm_ops pcibios_pm_ops; -- 2.20.1
[PATCH v2 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init
On pseries, custom PCI resource alignment specified with the commandline argument pci=resource_alignment is disabled due to PCI resources being managed by the firmware. However, in the case of PCI hotplug the resources are managed by the kernel, so custom alignments should be honored in these cases. This is done by only honoring custom alignments after initial PCI initialization is done, to ensure that all devices managed by the firmware are excluded. Without this ability, sub-page BARs sometimes get mapped in between page boundaries for hotplugged devices and are therefore unusable with the VFIO framework. This change allows users to request page alignment for devices they wish to access via VFIO using the pci=resource_alignment commandline argument. In the future, this could be extended to provide page-aligned resources by default for hotplugged devices, similar to what is done on powernv by commit 382746376993 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned") Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/machdep.h | 3 +++ arch/powerpc/kernel/pci-common.c | 9 + arch/powerpc/platforms/pseries/setup.c | 22 ++ 3 files changed, 34 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 2fbfaa9176ed..46eb62c0954e 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -179,6 +179,9 @@ struct machdep_calls { resource_size_t (*pcibios_default_alignment)(void); + /* Called when determining PCI resource alignment */ + int (*pcibios_ignore_alignment_request)(void); + #ifdef CONFIG_PCI_IOV void (*pcibios_fixup_sriov)(struct pci_dev *pdev); resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno); diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index ff4b7539cbdf..1a6ded45a701 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void) return 0; } +resource_size_t pcibios_ignore_alignment_request(void) +{ + if (ppc_md.pcibios_ignore_alignment_request) + return ppc_md.pcibios_ignore_alignment_request(); + + /* Fall back to default method of checking PCI_PROBE_ONLY */ + return pci_has_flag(PCI_PROBE_ONLY); +} + #ifdef CONFIG_PCI_IOV resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno) { diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index e4f0dfd4ae33..07f03be02afe 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize); int fwnmi_active; /* TRUE if an FWNMI handler is present */ +static int initial_pci_init_done; /* TRUE if initial pcibios init has completed */ + static void pSeries_show_cpuinfo(struct seq_file *m) { struct device_node *root; @@ -749,6 +751,23 @@ static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev, } #endif +static void pseries_after_init(void) +{ + initial_pci_init_done = 1; +} + +static int pseries_ignore_alignment_request(void) +{ + if (initial_pci_init_done) + /* +* Allow custom alignments after init for things +* like PCI hotplugging. +*/ + return 0; + + return pci_has_flag(PCI_PROBE_ONLY); +} + static void __init pSeries_setup_arch(void) { set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); @@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void) } ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare; + ppc_md.pcibios_after_init = pseries_after_init; + ppc_md.pcibios_ignore_alignment_request = + pseries_ignore_alignment_request; } static void pseries_panic(char *str) -- 2.20.1
[PATCH v2 0/3] Allow custom PCI resource alignment on pseries
Changes from v1 to v2: - Fix function declaration warnings caught by sparse Hello all, This patch set implements support for user-specified PCI resource alignment on the pseries platform for hotplugged PCI devices. Currently on pseries, PCI resource alignments specified with the pci=resource_alignment commandline argument are ignored, since the firmware is in charge of managing the PCI resources. In the case of hotplugged devices, though, the kernel is in charge of configuring the resources and should obey alignment requirements. The current behavior of ignoring the alignment for hotplugged devices results in sub-page BARs landing between page boundaries and becoming un-mappable from userspace via the VFIO framework. This issue was observed on a pseries KVM guest with hotplugged ivshmem devices. With these changes, users can specify an appropriate pci=resource_alignment argument on boot for devices they wish to use with VFIO. In the future, this could be extended to provide page-aligned resources by default for hotplugged devices, similar to what is done on powernv by commit 382746376993 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned"). Feedback is appreciated. Thanks, Shawn Shawn Anastasio (3): PCI: Introduce pcibios_ignore_alignment_request powerpc/64: Enable pcibios_after_init hook on ppc64 powerpc/pseries: Allow user-specified PCI resource alignment after init arch/powerpc/include/asm/machdep.h | 6 -- arch/powerpc/kernel/pci-common.c | 9 + arch/powerpc/kernel/pci_64.c | 4 arch/powerpc/platforms/pseries/setup.c | 22 ++ drivers/pci/pci.c | 9 +++-- include/linux/pci.h| 1 + 6 files changed, 47 insertions(+), 4 deletions(-) -- 2.20.1
[PATCH v2 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64
Enable the pcibios_after_init hook on all powerpc platforms. This hook is executed at the end of pcibios_init and was previously only available on CONFIG_PPC32. Since it is useful and not inherently limited to 32-bit mode, remove the limitation and allow it on all powerpc platforms. Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/machdep.h | 3 +-- arch/powerpc/kernel/pci_64.c | 4 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 2f0ca6560e47..2fbfaa9176ed 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -150,6 +150,7 @@ struct machdep_calls { void(*init)(void); void(*kgdb_map_scc)(void); +#endif /* CONFIG_PPC32 */ /* * optional PCI "hooks" @@ -157,8 +158,6 @@ struct machdep_calls { /* Called at then very end of pcibios_init() */ void (*pcibios_after_init)(void); -#endif /* CONFIG_PPC32 */ - /* Called in indirect_* to avoid touching devices */ int (*pci_exclude_device)(struct pci_controller *, unsigned char, unsigned char); diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index 9d8c10d55407..fba7fe6e4a50 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -68,6 +68,10 @@ static int __init pcibios_init(void) printk(KERN_DEBUG "PCI: Probing PCI hardware done\n"); + /* Call machine dependent post-init code */ + if (ppc_md.pcibios_after_init) + ppc_md.pcibios_after_init(); + return 0; } -- 2.20.1
[PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init
On pseries, custom PCI resource alignment specified with the commandline argument pci=resource_alignment is disabled due to PCI resources being managed by the firmware. However, in the case of PCI hotplug the resources are managed by the kernel, so custom alignments should be honored in these cases. This is done by only honoring custom alignments after initial PCI initialization is done, to ensure that all devices managed by the firmware are excluded. Without this ability, sub-page BARs sometimes get mapped in between page boundaries for hotplugged devices and are therefore unusable with the VFIO framework. This change allows users to request page alignment for devices they wish to access via VFIO using the pci=resource_alignment commandline argument. In the future, this could be extended to provide page-aligned resources by default for hotplugged devices, similar to what is done on powernv by commit 382746376993 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned") Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/machdep.h | 3 +++ arch/powerpc/kernel/pci-common.c | 9 + arch/powerpc/platforms/pseries/setup.c | 22 ++ 3 files changed, 34 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 2fbfaa9176ed..46eb62c0954e 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -179,6 +179,9 @@ struct machdep_calls { resource_size_t (*pcibios_default_alignment)(void); + /* Called when determining PCI resource alignment */ + int (*pcibios_ignore_alignment_request)(void); + #ifdef CONFIG_PCI_IOV void (*pcibios_fixup_sriov)(struct pci_dev *pdev); resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno); diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index ff4b7539cbdf..8e0d73b4c188 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void) return 0; } +int pcibios_ignore_alignment_request(void) +{ + if (ppc_md.pcibios_ignore_alignment_request) + return ppc_md.pcibios_ignore_alignment_request(); + + /* Fall back to default method of checking PCI_PROBE_ONLY */ + return pci_has_flag(PCI_PROBE_ONLY); +} + #ifdef CONFIG_PCI_IOV resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno) { diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index e4f0dfd4ae33..07f03be02afe 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize); int fwnmi_active; /* TRUE if an FWNMI handler is present */ +static int initial_pci_init_done; /* TRUE if initial pcibios init has completed */ + static void pSeries_show_cpuinfo(struct seq_file *m) { struct device_node *root; @@ -749,6 +751,23 @@ static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev, } #endif +static void pseries_after_init(void) +{ + initial_pci_init_done = 1; +} + +static int pseries_ignore_alignment_request(void) +{ + if (initial_pci_init_done) + /* +* Allow custom alignments after init for things +* like PCI hotplugging. +*/ + return 0; + + return pci_has_flag(PCI_PROBE_ONLY); +} + static void __init pSeries_setup_arch(void) { set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); @@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void) } ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare; + ppc_md.pcibios_after_init = pseries_after_init; + ppc_md.pcibios_ignore_alignment_request = + pseries_ignore_alignment_request; } static void pseries_panic(char *str) -- 2.20.1
[PATCH v3 0/3] Allow custom PCI resource alignment on pseries
Changes from v2 to v3: - Fix wrong return type of ppc pcibios_ignore_alignment_request (Not sure how my local compile didn't catch that!) Hello all, This patch set implements support for user-specified PCI resource alignment on the pseries platform for hotplugged PCI devices. Currently on pseries, PCI resource alignments specified with the pci=resource_alignment commandline argument are ignored, since the firmware is in charge of managing the PCI resources. In the case of hotplugged devices, though, the kernel is in charge of configuring the resources and should obey alignment requirements. The current behavior of ignoring the alignment for hotplugged devices results in sub-page BARs landing between page boundaries and becoming un-mappable from userspace via the VFIO framework. This issue was observed on a pseries KVM guest with hotplugged ivshmem devices. With these changes, users can specify an appropriate pci=resource_alignment argument on boot for devices they wish to use with VFIO. In the future, this could be extended to provide page-aligned resources by default for hotplugged devices, similar to what is done on powernv by commit 382746376993 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned"). Feedback is appreciated. Thanks, Shawn Shawn Anastasio (3): PCI: Introduce pcibios_ignore_alignment_request powerpc/64: Enable pcibios_after_init hook on ppc64 powerpc/pseries: Allow user-specified PCI resource alignment after init arch/powerpc/include/asm/machdep.h | 6 -- arch/powerpc/kernel/pci-common.c | 9 + arch/powerpc/kernel/pci_64.c | 4 arch/powerpc/platforms/pseries/setup.c | 22 ++ drivers/pci/pci.c | 9 +++-- include/linux/pci.h| 1 + 6 files changed, 47 insertions(+), 4 deletions(-) -- 2.20.1
[PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } diff --git a/include/linux/pci.h b/include/linux/pci.h index 4a5a84d7bdd4..47471dcdbaf9 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {} int pcibios_alloc_irq(struct pci_dev *dev); void pcibios_free_irq(struct pci_dev *dev); resource_size_t pcibios_default_alignment(void); +int pcibios_ignore_alignment_request(void); #ifdef CONFIG_HIBERNATE_CALLBACKS extern struct dev_pm_ops pcibios_pm_ops; -- 2.20.1
[PATCH v3 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64
Enable the pcibios_after_init hook on all powerpc platforms. This hook is executed at the end of pcibios_init and was previously only available on CONFIG_PPC32. Since it is useful and not inherently limited to 32-bit mode, remove the limitation and allow it on all powerpc platforms. Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/machdep.h | 3 +-- arch/powerpc/kernel/pci_64.c | 4 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 2f0ca6560e47..2fbfaa9176ed 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -150,6 +150,7 @@ struct machdep_calls { void(*init)(void); void(*kgdb_map_scc)(void); +#endif /* CONFIG_PPC32 */ /* * optional PCI "hooks" @@ -157,8 +158,6 @@ struct machdep_calls { /* Called at then very end of pcibios_init() */ void (*pcibios_after_init)(void); -#endif /* CONFIG_PPC32 */ - /* Called in indirect_* to avoid touching devices */ int (*pci_exclude_device)(struct pci_controller *, unsigned char, unsigned char); diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index 9d8c10d55407..fba7fe6e4a50 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -68,6 +68,10 @@ static int __init pcibios_init(void) printk(KERN_DEBUG "PCI: Probing PCI hardware done\n"); + /* Call machine dependent post-init code */ + if (ppc_md.pcibios_after_init) + ppc_md.pcibios_after_init(); + return 0; } -- 2.20.1
Re: [RESEND PATCH 0/3] Allow custom PCI resource alignment on pseries
On 5/27/19 11:01 PM, Oliver wrote: On Tue, May 28, 2019 at 8:56 AM Shawn Anastasio wrote: Hello all, This patch set implements support for user-specified PCI resource alignment on the pseries platform for hotplugged PCI devices. Currently on pseries, PCI resource alignments specified with the pci=resource_alignment commandline argument are ignored, since the firmware is in charge of managing the PCI resources. In the case of hotplugged devices, though, the kernel is in charge of configuring the resources and should obey alignment requirements. Are you using hotplug to work around SLOF (the OF we use under qemu) not aligning BARs to 64K? It looks like there is a commit in SLOF to fix that (https://git.qemu.org/?p=SLOF.git;a=commit;f=board-qemu/slof/pci-phb.fs;h=1903174472f8800caf50c959b304501b4c01153c). No, my application actually requires PCI hotplug at run-time. The current behavior of ignoring the alignment for hotplugged devices results in sub-page BARs landing between page boundaries and becoming un-mappable from userspace via the VFIO framework. This issue was observed on a pseries KVM guest with hotplugged ivshmem devices. With these changes, users can specify an appropriate pci=resource_alignment argument on boot for devices they wish to use with VFIO. In the future, this could be extended to provide page-aligned resources by default for hotplugged devices, similar to what is done on powernv by commit 382746376993 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned"). Can we make aligning the BARs to PAGE_SIZE the default behaviour? The BAR assignment process is complex enough as-is so I'd rather we didn't add another platform hack into the mix. Absolutely. This will still require the existing changes so that the custom alignment isn't flat-out ignored on pseries, but I can set it to default to PAGE_SIZE as well, similar to how it's done on PowerNV. I've just pushed a v3 to fix a typo and I'll incorporate this change in v4. Feedback is appreciated. Thanks, Shawn Shawn Anastasio (3): PCI: Introduce pcibios_ignore_alignment_request powerpc/64: Enable pcibios_after_init hook on ppc64 powerpc/pseries: Allow user-specified PCI resource alignment after init arch/powerpc/include/asm/machdep.h | 6 -- arch/powerpc/kernel/pci-common.c | 9 + arch/powerpc/kernel/pci_64.c | 4 arch/powerpc/platforms/pseries/setup.c | 22 ++ drivers/pci/pci.c | 9 +++-- 5 files changed, 46 insertions(+), 4 deletions(-) -- 2.20.1
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/28/19 12:36 AM, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... Interesting, I wasn't aware that hotplugged devices are configured by the hypervisor on PowerVM. That at least means that this patch is wrong as-is since it won't handle that properly. Definitely seems like there will need to be different behavior here depending on the hypervisor. That being said, wouldn't PCI_PROBE_ONLY still be set on pseries/kvm (at least for initial boot) to observe SLOF's original BAR assignments? Perhaps it should be un-set after initial PCI init? IIRC Sam Bobroff was looking at hotplug under pseries recently so he might have something to add. He's sick at the moment, but I'll ask him to take a look at this once he's back among the living Good to know. I'll await his comments before continuing here. diff --git a/include/linux/pci.h b/include/linux/pci.h index 4a5a84d7bdd4..47471dcdbaf9 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {} int pcibios_alloc_irq(struct pci_dev *dev); void pcibios_free_irq(struct pci_dev *dev); resource_size_t pcibios_default_alignment(void); +int pcibios_ignore_alignment_request(void); #ifdef CONFIG_HIBERNATE_CALLBACKS extern struct dev_pm_ops pcibios_pm_ops; -- 2.20.1
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. [c6e6f960] [c05f62d4] pci_assign_resource+0x44/0x360 [c6e6fa10] [c05f8b54] assign_requested_resources_sorted+0x84/0x110 [c6e6fa60] [c05f9540] __assign_resources_sorted+0xd0/0x750 [c6e6fb40] [c05fb2e0] __pci_bus_assign_resources+0x80/0x280 [c6e6fc00] [c05fb95c] pci_assign_unassigned_bus_resources+0xbc/0x100 [c6e6fc60] [c05e3d74] pci_rescan_bus+0x34/0x60 [c6e6fc90] [c05f1ef4] rescan_store+0x84/0xc0 [c6e6fcd0] [c068060c] bus_attr_store+0x3c/0x60 [c6e6fcf0] [c037853c] sysfs_kf_write+0x5c/0x80 IIRC Sam Bobroff was looking at hotplug under pseries recently so he might have something to add. He's sick at the moment, but I'll ask him to take a look at this once he's back among the living diff --git a/include/linux/pci.h b/include/linux/pci.h index 4a5a84d7bdd4..47471dcdbaf9 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {} int pcibios_alloc_irq(struct pci_dev *dev); void pcibios_free_irq(struct pci_dev *dev); resource_size_t pcibios_default_alignment(void); +int pcibios_ignore_alignment_request(void); #ifdef CONFIG_HIBERNATE_CALLBACKS extern struct dev_pm_ops pcibios_pm_ops; -- 2.20.1
Re: [EXTERNAL] Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/30/19 1:55 AM, Sam Bobroff wrote: On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... IIRC Sam Bobroff was looking at hotplug under pseries recently so he might have something to add. He's sick at the moment, but I'll ask him to take a look at this once he's back among the living There seems to be some code already in the kernel that will disable PCI_PROBE_ONLY based on a device tree property, so I did a quick test today and it seems to work. Only a trivial tweak is needed in QEMU to do it (have spapr_dt_chosen() add a node called "linux,pci-probe-only" with a value of 0), and that would allow us to set it only for QEMU (and not PowerVM) if that's what we want to do. Is that useful? (I haven't done any real testing yet but the guest booted up OK.) It was my understanding that PCI_PROBE_ONLY should actually be set initially so that Linux uses SLOF's BAR assignments. The issue here is that PCI_PROBE_ONLY shouldn't be honored after initial bringup on KVM so that hotplugged PCI devices can have custom BAR alignments. Of course, if there's no need to honor SLOF's initial assignments, I assume disabling PCI_PROBE_ONLY would work fine. In fact, I'm not entirely sure why it's done in the first place. Does anybody know? If there is actually a valid reason for preserving SLOF's initial assignments, then it seems like the correct solution is to disable PCI_PROBE_ONLY after initial PCI bringup or ignore it in pci_specified_resource_alignment() like I do in this patch set. Bjorn Helgaas also suggested marking individual resources provided by SLOF/PHYP with IORESOURCE_PCI_FIXED which would remove the need to use PCI_PROBE_ONLY altogether. Any thoughts? - Shawn
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by default and SLOF's assignments won't be honored (I think). I guess it boils down to one question - is it important that we observe SLOF's initial BAR assignments? If not, the device tree modification that Sam found would work fine here. Otherwise, we need a way to honor the initial assignments from SLOF while still allowing custom alignments for hotplugged devices, either by deferring to the platform code like I do here, unsetting PCI_PROBE_ONLY in certain cases or by using IORESOURCE_PCI_FIXED like Bjorn suggested. [c6e6f960] [c05f62d4] pci_assign_resource+0x44/0x360 [c6e6fa10] [c05f8b54] assign_requested_resources_sorted+0x84/0x110 [c6e6fa60] [c05f9540] __assign_resources_sorted+0xd0/0x750 [c6e6fb40] [c05fb2e0] __pci_bus_assign_resources+0x80/0x280 [c6e6fc00] [c05fb95c] pci_assign_unassigned_bus_resources+0xbc/0x100 [c6e6fc60] [c05e3d74] pci_rescan_bus+0x34/0x60 [c6e6fc90] [c05f1ef4] rescan_store+0x84/0xc0
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: On 31/05/2019 08:49, Shawn Anastasio wrote: On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by default and SLOF's assignments won't be honored (I think). I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and tried booting with and without pci=resource_alignment= and I can see no difference - BARs are still aligned to 64K as programmed in SLOF; if I hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves them unchanged. I guess it boils down to one question - is it important that we observe SLOF's initial BAR assignments? It isn't if it's SLOF but it is if it's phyp. It used to not allow/support BAR reassignment and even if it does not, I'd rather avoid touching them. A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which worked, but if I add an implementation of pcibios_default_alignment which simply returns PAGE_SIZE, my VM fails to boot and many errors from the virtio disk driver are printed to the console. After some investigation
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 6/3/19 3:35 AM, Alexey Kardashevskiy wrote: On 03/06/2019 15:02, Alexey Kardashevskiy wrote: On 03/06/2019 12:23, Shawn Anastasio wrote: On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: On 31/05/2019 08:49, Shawn Anastasio wrote: On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by default and SLOF's assignments won't be honored (I think). I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and tried booting with and without pci=resource_alignment= and I can see no difference - BARs are still aligned to 64K as programmed in SLOF; if I hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves them unchanged. I guess it boils down to one question - is it important that we observe SLOF's initial BAR assignments? It isn't if it's SLOF but it is if it's phyp. It used to not allow/support BAR reassignment and even if it does not, I'd rather avoid touching them. A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which worked, but if I add an implementation of pcibios_defau
Re: [PATCH kernel] powerpc/pci/of: Fix OF flags parsing for 64bit BARs
On 6/4/19 10:38 PM, Alexey Kardashevskiy wrote: When the firmware does PCI BAR resource allocation, it passes the assigned addresses and flags (prefetch/64bit/...) via the "reg" property of a PCI device device tree node so the kernel does not need to do resource allocation. The flags are stored in resource::flags - the lower byte stores PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc. Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated, such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc. When parsing the "reg" property, we copy the prefetch flag but we skip on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync. The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions: 1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch() or by passing "/chosen/linux,pci-probe-only"); 2. we request resource alignment (by passing pci=resource_alignment= via the kernel cmd line to request PAGE_SIZE alignment or defining ppc_md.pcibios_default_alignment which returns anything but 0). Note that the alignment requests are ignored if PCI_PROBE_ONLY is enabled. With 1) and 2), the generic PCI code in the kernel unconditionally decides to: - reassign the BARs in pci_specified_resource_alignment() (works fine) - write new BARs to the device - this fails for 64bit BARs as the generic code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping in the hypervisor. This fixes the issue by copying the flag. This is useful if we want to enforce certain BAR alignment per platform as handling subpage sized BARs is proven to cause problems with hotplug (SLOF already aligns BARs to 64k). Signed-off-by: Alexey Kardashevskiy --- This code is there for ages (from 200x) hence no "Fixes:". Ideally I want to enforce /chosen/linux,pci-probe-only in QEMU as at the moment: - pci=resource_alignment= alone does not do anything; - /chosen/linux,pci-probe-only alone does not cause the kernel to reassign resources; - pci=resource_alignment= with /chosen/linux,pci-probe-only is broken anyway. --- arch/powerpc/kernel/pci_of_scan.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c index 24191ea2d9a7..64ad92016b63 100644 --- a/arch/powerpc/kernel/pci_of_scan.c +++ b/arch/powerpc/kernel/pci_of_scan.c @@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge) if (addr0 & 0x0200) { flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY; flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64; + if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64) + flags |= IORESOURCE_MEM_64; flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M; if (addr0 & 0x4000) flags |= IORESOURCE_PREFETCH I have confirmed that this fixes the case with PCI_PROBE_ONLY disabled and a ppc_md.pcibios_default_alignment implementation that returns PAGE_SIZE. Reviewed-by: Shawn Anastasio
Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote: This is an attempt to allow DMA masks between 32..59 which are not large enough to use either a PHB3 bypass mode or a sketchy bypass. Depending on the max order, up to 40 is usually available. This is based on v5.2-rc2. Please comment. Thanks. I have tested this patch set with an AMD GPU that's limited to <64bit DMA (I believe it's 40 or 42 bit). It successfully allows the card to operate without falling back to 32-bit DMA mode as it does without the patches. Relevant kernel log message: ``` [0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass ``` Tested-by: Shawn Anastasio
Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
On 6/5/19 11:11 PM, Shawn Anastasio wrote: On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote: This is an attempt to allow DMA masks between 32..59 which are not large enough to use either a PHB3 bypass mode or a sketchy bypass. Depending on the max order, up to 40 is usually available. This is based on v5.2-rc2. Please comment. Thanks. I have tested this patch set with an AMD GPU that's limited to <64bit DMA (I believe it's 40 or 42 bit). It successfully allows the card to operate without falling back to 32-bit DMA mode as it does without the patches. Relevant kernel log message: ``` [ 0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass ``` Tested-by: Shawn Anastasio After a few days of further testing, I've started to run into stability issues with the patch applied and used with an AMD GPU. Specifically, the system sometimes spontaneously crashes. Not just EEH errors either, the whole system shuts down in what looks like a checkstop. Perhaps some subtle corruption is occurring?
Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
On 6/12/19 1:16 AM, Oliver O'Halloran wrote: On Wed, Jun 12, 2019 at 3:06 PM Shawn Anastasio wrote: On 6/5/19 11:11 PM, Shawn Anastasio wrote: On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote: This is an attempt to allow DMA masks between 32..59 which are not large enough to use either a PHB3 bypass mode or a sketchy bypass. Depending on the max order, up to 40 is usually available. This is based on v5.2-rc2. Please comment. Thanks. I have tested this patch set with an AMD GPU that's limited to <64bit DMA (I believe it's 40 or 42 bit). It successfully allows the card to operate without falling back to 32-bit DMA mode as it does without the patches. Relevant kernel log message: ``` [0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass ``` Tested-by: Shawn Anastasio After a few days of further testing, I've started to run into stability issues with the patch applied and used with an AMD GPU. Specifically, the system sometimes spontaneously crashes. Not just EEH errors either, the whole system shuts down in what looks like a checkstop. Any specific workload? Checkstops are harder to debug without a system in the failed state so we'd need to replicate that locally to get a decent idea what's up. I haven't been able to pinpoint the exact cause. The first time it happened was after about 4 days of uptime while playing a 1080p video in mpv. The second time was about 5 minutes after booting up while restoring a firefox session.
Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
On 6/12/19 2:07 AM, Alexey Kardashevskiy wrote: On 12/06/2019 15:05, Shawn Anastasio wrote: On 6/5/19 11:11 PM, Shawn Anastasio wrote: On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote: This is an attempt to allow DMA masks between 32..59 which are not large enough to use either a PHB3 bypass mode or a sketchy bypass. Depending on the max order, up to 40 is usually available. This is based on v5.2-rc2. Please comment. Thanks. I have tested this patch set with an AMD GPU that's limited to <64bit DMA (I believe it's 40 or 42 bit). It successfully allows the card to operate without falling back to 32-bit DMA mode as it does without the patches. Relevant kernel log message: ``` [ 0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass ``` Tested-by: Shawn Anastasio After a few days of further testing, I've started to run into stability issues with the patch applied and used with an AMD GPU. Specifically, the system sometimes spontaneously crashes. Not just EEH errors either, the whole system shuts down in what looks like a checkstop. Perhaps some subtle corruption is occurring? Have you tried this? https://patchwork.ozlabs.org/patch/1113506/ I have not. I'll give it a shot and try it out for a few days to see if I'm able to reproduce the crashes.
Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
On 6/12/19 2:15 PM, Shawn Anastasio wrote: On 6/12/19 2:07 AM, Alexey Kardashevskiy wrote: On 12/06/2019 15:05, Shawn Anastasio wrote: On 6/5/19 11:11 PM, Shawn Anastasio wrote: On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote: This is an attempt to allow DMA masks between 32..59 which are not large enough to use either a PHB3 bypass mode or a sketchy bypass. Depending on the max order, up to 40 is usually available. This is based on v5.2-rc2. Please comment. Thanks. I have tested this patch set with an AMD GPU that's limited to <64bit DMA (I believe it's 40 or 42 bit). It successfully allows the card to operate without falling back to 32-bit DMA mode as it does without the patches. Relevant kernel log message: ``` [ 0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass ``` Tested-by: Shawn Anastasio After a few days of further testing, I've started to run into stability issues with the patch applied and used with an AMD GPU. Specifically, the system sometimes spontaneously crashes. Not just EEH errors either, the whole system shuts down in what looks like a checkstop. Perhaps some subtle corruption is occurring? Have you tried this? https://patchwork.ozlabs.org/patch/1113506/ I have not. I'll give it a shot and try it out for a few days to see if I'm able to reproduce the crashes. A few days later and I was able to reproduce the checkstop while watching a video in mpv. At this point the system had ~4 day uptime and this wasn't the first video I watched during that time. This is with https://patchwork.ozlabs.org/patch/1113506/ applied, too.
Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
On 6/18/19 1:39 AM, Alexey Kardashevskiy wrote: On 18/06/2019 14:26, Shawn Anastasio wrote: On 6/12/19 2:15 PM, Shawn Anastasio wrote: On 6/12/19 2:07 AM, Alexey Kardashevskiy wrote: On 12/06/2019 15:05, Shawn Anastasio wrote: On 6/5/19 11:11 PM, Shawn Anastasio wrote: On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote: This is an attempt to allow DMA masks between 32..59 which are not large enough to use either a PHB3 bypass mode or a sketchy bypass. Depending on the max order, up to 40 is usually available. This is based on v5.2-rc2. Please comment. Thanks. I have tested this patch set with an AMD GPU that's limited to <64bit DMA (I believe it's 40 or 42 bit). It successfully allows the card to operate without falling back to 32-bit DMA mode as it does without the patches. Relevant kernel log message: ``` [ 0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass ``` Tested-by: Shawn Anastasio After a few days of further testing, I've started to run into stability issues with the patch applied and used with an AMD GPU. Specifically, the system sometimes spontaneously crashes. Not just EEH errors either, the whole system shuts down in what looks like a checkstop. Perhaps some subtle corruption is occurring? Have you tried this? https://patchwork.ozlabs.org/patch/1113506/ I have not. I'll give it a shot and try it out for a few days to see if I'm able to reproduce the crashes. A few days later and I was able to reproduce the checkstop while watching a video in mpv. At this point the system had ~4 day uptime and this wasn't the first video I watched during that time. This is with https://patchwork.ozlabs.org/patch/1113506/ applied, too. Any logs left? What was the reason for the checkstop and what is the hardware? "lscpu" and "lspci -vv" for the starter would help. Thanks, The machine is a Talos II with 2x 8 core DD2.2 Sforza modules. I've added the output of lscpu and lspci below. As for logs, it doesn't seem there are any kernel logs of the event. The opal-gard utility shows some error records which I have also included below. opal-gard: ``` $ sudo ./opal-gard show 1 Record ID:0x0001 Error ID: 0x900b Error Type: Fatal (0xe3) Path Type: physical >Sys, Instance #0 >Node, Instance #0 >Proc, Instance #1 >EQ, Instance #0 >EX, Instance #0 $ sudo ./opal-gard show 2 Record ID:0x0002 Error ID: 0x9021 Error Type: Fatal (0xe3) Path Type: physical >Sys, Instance #0 >Node, Instance #0 >Proc, Instance #1 >EQ, Instance #2 >EX, Instance #1 ``` lscpu: ``` Architecture:ppc64le Byte Order: Little Endian CPU(s): 52 On-line CPU(s) list: 0-3,8-31,36-47,52-63 Thread(s) per core: 4 Core(s) per socket: 6 Socket(s): 2 NUMA node(s):2 Model: 2.2 (pvr 004e 1202) Model name: POWER9, altivec supported CPU max MHz: 3800. CPU min MHz: 2154. L1d cache: 32K L1i cache: 32K L2 cache:512K L3 cache:10240K NUMA node0 CPU(s): 0-3,8-31 NUMA node8 CPU(s): 36-47,52-63 ``` lspci -vv: Output at: https://upaste.anastas.io/IwVQzt
Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support
I'm a bit concerned about the removal of PROT_SAO. From what I can see, a feature like this would be extremely useful for emulating architectures with stronger memory models. QEMU's multi- threaded TCG project in particular looks like it would be a good candidate, since as far as I'm aware it is currently completely unable to perform strong-on-weak emulation. Without hardware support like SAO provides, the only way I could see to achieve this would be by emitting tons of unnecessary and costly memory barrier instructions. I understand that ISA 3.1 and POWER10 have dropped SAO, but as a POWER9 user it seems a bit silly to have a potentially useful feature dropped from the kernel just because a future processor doesn't support it. Curious to hear more thoughts on this. Regards, Shawn On 6/7/20 7:02 AM, Nicholas Piggin wrote: ISA v3.1 does not support the SAO storage control attribute required to implement PROT_SAO. PROT_SAO was used by specialised system software (Lx86) that has been discontinued for about 7 years, and is not thought to be used elsewhere, so removal should not cause problems. We rather remove it than keep support for older processors, because live migrating guest partitions to newer processors may not be possible if SAO is in use. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/pgtable.h | 8 ++-- arch/powerpc/include/asm/cputable.h | 9 ++-- arch/powerpc/include/asm/kvm_book3s_64.h | 3 +- arch/powerpc/include/asm/mman.h | 24 +++ arch/powerpc/include/asm/nohash/64/pgtable.h | 2 - arch/powerpc/kernel/dt_cpu_ftrs.c | 2 +- arch/powerpc/mm/book3s64/hash_utils.c | 2 - include/linux/mm.h| 2 - include/trace/events/mmflags.h| 2 - mm/ksm.c | 4 -- tools/testing/selftests/powerpc/mm/.gitignore | 1 - tools/testing/selftests/powerpc/mm/Makefile | 4 +- tools/testing/selftests/powerpc/mm/prot_sao.c | 42 --- 13 files changed, 18 insertions(+), 87 deletions(-) delete mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index f17442c3a092..d9e92586f8dc 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -20,9 +20,13 @@ #define _PAGE_RW (_PAGE_READ | _PAGE_WRITE) #define _PAGE_RWX (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC) #define _PAGE_PRIVILEGED 0x8 /* kernel access only */ -#define _PAGE_SAO 0x00010 /* Strong access order */ + +#define _PAGE_CACHE_CTL0x00030 /* Bits for the folowing cache modes */ + /* No bits set is normal cacheable memory */ + /* 0x00010 unused, is SAO bit on radix POWER9 */ #define _PAGE_NON_IDEMPOTENT 0x00020 /* non idempotent memory */ #define _PAGE_TOLERANT0x00030 /* tolerant memory, cache inhibited */ + #define _PAGE_DIRTY 0x00080 /* C: page changed */ #define _PAGE_ACCESSED0x00100 /* R: page referenced */ /* @@ -825,8 +829,6 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, return hash__set_pte_at(mm, addr, ptep, pte, percpu); } -#define _PAGE_CACHE_CTL (_PAGE_SAO | _PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT) - #define pgprot_noncached pgprot_noncached static inline pgprot_t pgprot_noncached(pgprot_t prot) { diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index bac2252c839e..c7e923ba 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -191,7 +191,6 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_SPURR LONG_ASM_CONST(0x0100) #define CPU_FTR_DSCR LONG_ASM_CONST(0x0200) #define CPU_FTR_VSX LONG_ASM_CONST(0x0400) -#define CPU_FTR_SAOLONG_ASM_CONST(0x0800) #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x1000) #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x2000) #define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x4000) @@ -435,7 +434,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \ + CPU_FTR_DSCR | CPU_FTR_ASYM_SMT | \ CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ CPU_FTR_CFAR | CPU_FTR_HVMODE | \ CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX | CPU_FTR_PKEY) @@ -444,7 +443,7 @@ static inline void cpu_feature_
Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support
On 8/18/20 2:11 AM, Nicholas Piggin wrote> Very reasonable point. The problem we're trying to get a handle on is live partition migration where a running guest might be using SAO then get migrated to a P10. I don't think we have a good way to handle this case. Potentially the hypervisor could revoke the page tables if the guest is running in hash mode and the guest kernel could be taught about that and sigbus the process, but in radix the guest controls those page tables and the SAO state and I don't think there's a way to cause it to take a fault. I also don't know what the proprietary hypervisor does here. We could add it back, default to n, or make it bare metal only, or somehow try to block live migration to a later CPU without the faciliy. I wouldn't be against that. Admittedly I'm not too familiar with the specifics of live migration or guest memory management, but restoring the functionality and adding a way to prevent migration of SAO-using guests seems like a reasonable choice to me. Would this be done with help from the guest using some sort of infrastructure to signal to the hypervisor that SAO is in use, or entirely on the hypervisor by e.g. scanning the through the process table for SAO pages? It would be very interesting to know how it performs in such a "real" situation. I don't know how well POWER9 has optimised it -- it's possible that it's not much better than putting lwsync after every load or store. This is definitely worth investigating in depth. That said, even if the performance on P9 isn't super great, I think the feature could still be useful, since it would offer more granularity than the sledgehammer approach of emitting lwsync everywhere. I'd be happy to put in some of the work required to get this to a point where it can be reintroduced without breaking guest migration - I'd just need some pointers on getting started with whatever approach is decided on. Thanks, Shawn
[PATCH 0/2] Reintroduce PROT_SAO
This set re-introduces the PROT_SAO prot flag removed in Commit 5c9fa16e8abd ("powerpc/64s: Remove PROT_SAO support"). To address concerns regarding live migration of guests using SAO to P10 hosts without SAO support, the flag is disabled by default in LPARs. A new config option, PPC_PROT_SAO_LPAR was added to allow users to explicitly enable it if they will not be running in an environment where this is a conern. Shawn Anastasio (2): Revert "powerpc/64s: Remove PROT_SAO support" powerpc/64s: Disallow PROT_SAO in LPARs by default arch/powerpc/Kconfig | 12 ++ arch/powerpc/include/asm/book3s/64/pgtable.h | 8 ++-- arch/powerpc/include/asm/cputable.h | 10 ++--- arch/powerpc/include/asm/mman.h | 31 -- arch/powerpc/include/asm/nohash/64/pgtable.h | 2 + arch/powerpc/include/uapi/asm/mman.h | 2 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 2 +- arch/powerpc/mm/book3s64/hash_utils.c | 2 + include/linux/mm.h| 2 + include/trace/events/mmflags.h| 2 + mm/ksm.c | 4 ++ tools/testing/selftests/powerpc/mm/.gitignore | 1 + tools/testing/selftests/powerpc/mm/Makefile | 4 +- tools/testing/selftests/powerpc/mm/prot_sao.c | 42 +++ 14 files changed, 107 insertions(+), 17 deletions(-) create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c -- 2.28.0
[PATCH 1/2] Revert "powerpc/64s: Remove PROT_SAO support"
This reverts commit 5c9fa16e8abd342ce04dc830c1ebb2a03abf6c05. Since PROT_SAO can still be useful for certain classes of software, reintroduce it. Concerns about guest migration for LPARs using SAO will be addressed next. Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/book3s/64/pgtable.h | 8 ++-- arch/powerpc/include/asm/cputable.h | 10 ++--- arch/powerpc/include/asm/mman.h | 26 ++-- arch/powerpc/include/asm/nohash/64/pgtable.h | 2 + arch/powerpc/include/uapi/asm/mman.h | 2 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 2 +- arch/powerpc/mm/book3s64/hash_utils.c | 2 + include/linux/mm.h| 2 + include/trace/events/mmflags.h| 2 + mm/ksm.c | 4 ++ tools/testing/selftests/powerpc/mm/.gitignore | 1 + tools/testing/selftests/powerpc/mm/Makefile | 4 +- tools/testing/selftests/powerpc/mm/prot_sao.c | 42 +++ 13 files changed, 90 insertions(+), 17 deletions(-) create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 6de56c3b33c4..495fc0ccb453 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -20,13 +20,9 @@ #define _PAGE_RW (_PAGE_READ | _PAGE_WRITE) #define _PAGE_RWX (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC) #define _PAGE_PRIVILEGED 0x8 /* kernel access only */ - -#define _PAGE_CACHE_CTL0x00030 /* Bits for the folowing cache modes */ - /* No bits set is normal cacheable memory */ - /* 0x00010 unused, is SAO bit on radix POWER9 */ +#define _PAGE_SAO 0x00010 /* Strong access order */ #define _PAGE_NON_IDEMPOTENT 0x00020 /* non idempotent memory */ #define _PAGE_TOLERANT 0x00030 /* tolerant memory, cache inhibited */ - #define _PAGE_DIRTY0x00080 /* C: page changed */ #define _PAGE_ACCESSED 0x00100 /* R: page referenced */ /* @@ -828,6 +824,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, return hash__set_pte_at(mm, addr, ptep, pte, percpu); } +#define _PAGE_CACHE_CTL(_PAGE_SAO | _PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT) + #define pgprot_noncached pgprot_noncached static inline pgprot_t pgprot_noncached(pgprot_t prot) { diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index fdddb822d564..f89205eff691 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -191,7 +191,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_SPURR LONG_ASM_CONST(0x0100) #define CPU_FTR_DSCR LONG_ASM_CONST(0x0200) #define CPU_FTR_VSXLONG_ASM_CONST(0x0400) -// Free LONG_ASM_CONST(0x0800) +#define CPU_FTR_SAOLONG_ASM_CONST(0x0800) #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x1000) #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x2000) #define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x4000) @@ -436,7 +436,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | CPU_FTR_ASYM_SMT | \ + CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \ CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ CPU_FTR_CFAR | CPU_FTR_HVMODE | \ CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX ) @@ -445,7 +445,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | \ + CPU_FTR_DSCR | CPU_FTR_SAO | \ CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \ @@ -456,7 +456,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | \ + CPU_FTR_DSCR | CPU_FTR_SAO | \ CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ @@ -474,7 +474,7 @@ static inline void
[PATCH 2/2] powerpc/64s: Disallow PROT_SAO in LPARs by default
Since migration of guests using SAO to ISA 3.1 hosts may cause issues, disable PROT_SAO in LPARs by default and introduce a new Kconfig option PPC_PROT_SAO_LPAR to allow users to enable it if desired. Signed-off-by: Shawn Anastasio --- arch/powerpc/Kconfig| 12 arch/powerpc/include/asm/mman.h | 9 +++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1f48bbfb3ce9..65bed1fdeaad 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -860,6 +860,18 @@ config PPC_SUBPAGE_PROT If unsure, say N here. +config PPC_PROT_SAO_LPAR + bool "Support PROT_SAO mappings in LPARs" + depends on PPC_BOOK3S_64 + help + This option adds support for PROT_SAO mappings from userspace + inside LPARs on supported CPUs. + + This may cause issues when performing guest migration from + a CPU that supports SAO to one that does not. + + If unsure, say N here. + config PPC_COPRO_BASE bool diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h index 4ba303ea27f5..7cb6d18f5cd6 100644 --- a/arch/powerpc/include/asm/mman.h +++ b/arch/powerpc/include/asm/mman.h @@ -40,8 +40,13 @@ static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) { if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO)) return false; - if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO)) - return false; + if (prot & PROT_SAO) { + if (!cpu_has_feature(CPU_FTR_SAO)) + return false; + if (firmware_has_feature(FW_FEATURE_LPAR) && + !IS_ENABLED(CONFIG_PPC_PROT_SAO_LPAR)) + return false; + } return true; } #define arch_validate_prot arch_validate_prot -- 2.28.0
Re: [PATCH 2/2] powerpc/64s: Disallow PROT_SAO in LPARs by default
On 8/21/20 5:37 AM, Nicholas Piggin wrote:> I think this should be okay. Could you also update the selftest to skip if we have PPC_FEATURE2_ARCH_3_1 set? Sure. I'll send out a v2 shortly with another patch for this. Thanks, Nick Thanks, Shawn
[PATCH v2 0/3] Reintroduce PROT_SAO
Changes in v2: - Update prot_sao selftest to skip ISA 3.1 This set re-introduces the PROT_SAO prot flag removed in Commit 5c9fa16e8abd ("powerpc/64s: Remove PROT_SAO support"). To address concerns regarding live migration of guests using SAO to P10 hosts without SAO support, the flag is disabled by default in LPARs. A new config option, PPC_PROT_SAO_LPAR was added to allow users to explicitly enable it if they will not be running in an environment where this is a conern. Shawn Anastasio (3): Revert "powerpc/64s: Remove PROT_SAO support" powerpc/64s: Disallow PROT_SAO in LPARs by default selftests/powerpc: Update PROT_SAO test to skip ISA 3.1 arch/powerpc/Kconfig | 12 ++ arch/powerpc/include/asm/book3s/64/pgtable.h | 8 ++-- arch/powerpc/include/asm/cputable.h | 10 ++--- arch/powerpc/include/asm/mman.h | 31 +++-- arch/powerpc/include/asm/nohash/64/pgtable.h | 2 + arch/powerpc/include/uapi/asm/mman.h | 2 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 2 +- arch/powerpc/mm/book3s64/hash_utils.c | 2 + include/linux/mm.h| 2 + include/trace/events/mmflags.h| 2 + mm/ksm.c | 4 ++ tools/testing/selftests/powerpc/mm/.gitignore | 1 + tools/testing/selftests/powerpc/mm/Makefile | 4 +- tools/testing/selftests/powerpc/mm/prot_sao.c | 43 +++ 14 files changed, 108 insertions(+), 17 deletions(-) create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c -- 2.28.0
[PATCH v2 2/3] powerpc/64s: Disallow PROT_SAO in LPARs by default
Since migration of guests using SAO to ISA 3.1 hosts may cause issues, disable PROT_SAO in LPARs by default and introduce a new Kconfig option PPC_PROT_SAO_LPAR to allow users to enable it if desired. Signed-off-by: Shawn Anastasio --- arch/powerpc/Kconfig| 12 arch/powerpc/include/asm/mman.h | 9 +++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1f48bbfb3ce9..65bed1fdeaad 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -860,6 +860,18 @@ config PPC_SUBPAGE_PROT If unsure, say N here. +config PPC_PROT_SAO_LPAR + bool "Support PROT_SAO mappings in LPARs" + depends on PPC_BOOK3S_64 + help + This option adds support for PROT_SAO mappings from userspace + inside LPARs on supported CPUs. + + This may cause issues when performing guest migration from + a CPU that supports SAO to one that does not. + + If unsure, say N here. + config PPC_COPRO_BASE bool diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h index 4ba303ea27f5..7cb6d18f5cd6 100644 --- a/arch/powerpc/include/asm/mman.h +++ b/arch/powerpc/include/asm/mman.h @@ -40,8 +40,13 @@ static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) { if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO)) return false; - if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO)) - return false; + if (prot & PROT_SAO) { + if (!cpu_has_feature(CPU_FTR_SAO)) + return false; + if (firmware_has_feature(FW_FEATURE_LPAR) && + !IS_ENABLED(CONFIG_PPC_PROT_SAO_LPAR)) + return false; + } return true; } #define arch_validate_prot arch_validate_prot -- 2.28.0
[PATCH v2 3/3] selftests/powerpc: Update PROT_SAO test to skip ISA 3.1
Since SAO support was removed from ISA 3.1, skip the prot_sao test if PPC_FEATURE2_ARCH_3_1 is set. Signed-off-by: Shawn Anastasio --- tools/testing/selftests/powerpc/mm/prot_sao.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/powerpc/mm/prot_sao.c b/tools/testing/selftests/powerpc/mm/prot_sao.c index e2eed65b7735..e0cf8ebbf8cd 100644 --- a/tools/testing/selftests/powerpc/mm/prot_sao.c +++ b/tools/testing/selftests/powerpc/mm/prot_sao.c @@ -18,8 +18,9 @@ int test_prot_sao(void) { char *p; - /* 2.06 or later should support SAO */ - SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06)); + /* SAO was introduced in 2.06 and removed in 3.1 */ + SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06) || + have_hwcap2(PPC_FEATURE2_ARCH_3_1)); /* * Ensure we can ask for PROT_SAO. -- 2.28.0
[PATCH v2 1/3] Revert "powerpc/64s: Remove PROT_SAO support"
This reverts commit 5c9fa16e8abd342ce04dc830c1ebb2a03abf6c05. Since PROT_SAO can still be useful for certain classes of software, reintroduce it. Concerns about guest migration for LPARs using SAO will be addressed next. Signed-off-by: Shawn Anastasio --- arch/powerpc/include/asm/book3s/64/pgtable.h | 8 ++-- arch/powerpc/include/asm/cputable.h | 10 ++--- arch/powerpc/include/asm/mman.h | 26 ++-- arch/powerpc/include/asm/nohash/64/pgtable.h | 2 + arch/powerpc/include/uapi/asm/mman.h | 2 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 2 +- arch/powerpc/mm/book3s64/hash_utils.c | 2 + include/linux/mm.h| 2 + include/trace/events/mmflags.h| 2 + mm/ksm.c | 4 ++ tools/testing/selftests/powerpc/mm/.gitignore | 1 + tools/testing/selftests/powerpc/mm/Makefile | 4 +- tools/testing/selftests/powerpc/mm/prot_sao.c | 42 +++ 13 files changed, 90 insertions(+), 17 deletions(-) create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 6de56c3b33c4..495fc0ccb453 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -20,13 +20,9 @@ #define _PAGE_RW (_PAGE_READ | _PAGE_WRITE) #define _PAGE_RWX (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC) #define _PAGE_PRIVILEGED 0x8 /* kernel access only */ - -#define _PAGE_CACHE_CTL0x00030 /* Bits for the folowing cache modes */ - /* No bits set is normal cacheable memory */ - /* 0x00010 unused, is SAO bit on radix POWER9 */ +#define _PAGE_SAO 0x00010 /* Strong access order */ #define _PAGE_NON_IDEMPOTENT 0x00020 /* non idempotent memory */ #define _PAGE_TOLERANT 0x00030 /* tolerant memory, cache inhibited */ - #define _PAGE_DIRTY0x00080 /* C: page changed */ #define _PAGE_ACCESSED 0x00100 /* R: page referenced */ /* @@ -828,6 +824,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, return hash__set_pte_at(mm, addr, ptep, pte, percpu); } +#define _PAGE_CACHE_CTL(_PAGE_SAO | _PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT) + #define pgprot_noncached pgprot_noncached static inline pgprot_t pgprot_noncached(pgprot_t prot) { diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index fdddb822d564..f89205eff691 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -191,7 +191,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_SPURR LONG_ASM_CONST(0x0100) #define CPU_FTR_DSCR LONG_ASM_CONST(0x0200) #define CPU_FTR_VSXLONG_ASM_CONST(0x0400) -// Free LONG_ASM_CONST(0x0800) +#define CPU_FTR_SAOLONG_ASM_CONST(0x0800) #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x1000) #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x2000) #define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x4000) @@ -436,7 +436,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | CPU_FTR_ASYM_SMT | \ + CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \ CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ CPU_FTR_CFAR | CPU_FTR_HVMODE | \ CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX ) @@ -445,7 +445,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | \ + CPU_FTR_DSCR | CPU_FTR_SAO | \ CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \ @@ -456,7 +456,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | \ + CPU_FTR_DSCR | CPU_FTR_SAO | \ CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ @@ -474,7 +474,7 @@ static inline void
[PATCH] powerpc/pseries: Add pcibios_default_alignment implementation
Implement pcibios_default_alignment for pseries so that resources are page-aligned. The main benefit of this is being able to map any resource from userspace via mechanisms like VFIO. This is identical to powernv's implementation. Signed-off-by: Shawn Anastasio --- arch/powerpc/platforms/pseries/pci.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c index 911534b89c85..6d922c096354 100644 --- a/arch/powerpc/platforms/pseries/pci.c +++ b/arch/powerpc/platforms/pseries/pci.c @@ -210,6 +210,11 @@ int pseries_pcibios_sriov_disable(struct pci_dev *pdev) } #endif +static resource_size_t pseries_pcibios_default_alignment(void) +{ + return PAGE_SIZE; +} + static void __init pSeries_request_regions(void) { if (!isa_io_base) @@ -231,6 +236,8 @@ void __init pSeries_final_fixup(void) eeh_show_enabled(); + ppc_md.pcibios_default_alignment = pseries_pcibios_default_alignment; + #ifdef CONFIG_PCI_IOV ppc_md.pcibios_sriov_enable = pseries_pcibios_sriov_enable; ppc_md.pcibios_sriov_disable = pseries_pcibios_sriov_disable; -- 2.28.0