from:"Shawn Anastasio"

Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO

2024-05-10 Thread Shawn Anastasio

On 5/6/24 12:43 PM, Jason Gunthorpe wrote:
> On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote:
>> We have legacy workloads using VFIO in userspace/kvm guests running
>> on downstream distro kernels. We want these workloads to be able to
>> continue running on our arch.
> 
> It has been broken since 2018, I don't find this reasoning entirely
> reasonable :\
>

Raptor is currently working on an automated test runner setup to
exercise the VFIO subsystem on PowerNV and (to a lesser extent) pSeries,
so breakages like this going forward will hopefully be caught much more
quickly.

>> I firmly believe the refactoring in this patch series is a step in
>> that direction.
> 
> But fine, as long as we are going to fix it. PPC really needs this to
> be resolved to keep working.
>

Agreed. Modernizing/de-cluttering PPC's IOMMU code in general is another
task that we're working towards. As mentioned previously on the list,
we're working towards a more standard IOMMU driver for PPC that can be
used with dma_iommu, which I think will be a good step towards cleaning
this up. Initially PowerNV is going to be our target, but to the extent
that it is possible and useful, pSeries support could be brought in
later.

> Jason

Thanks,
Shawn

Re: [PATCH v3 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv

2024-06-27 Thread Shawn Anastasio

Hi Krishna,

On 6/24/24 7:09 AM, Krishna Kumar wrote:
> Description of the problem: The hotplug driver for powerpc
> (pci/hotplug/pnv_php.c) gives kernel crash when we try to
> hot-unplug/disable the PCIe switch/bridge from the PHB.
> 
> Root Cause of Crash: The crash is due to the reason that, though the msi
> data structure has been released during disable/hot-unplug path and it
> has been assigned with NULL, still during unregistartion the code was
> again trying to explicitly disable the msi which causes the Null pointer
> dereference and kernel crash.
> 
> Proposed Fix : The fix is to correct the check during unregistration path
> so that the code should not  try to invoke pci_disable_msi/msix() if its
> data structure is already freed.
> 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Christophe Leroy 
> Cc: "Aneesh Kumar K.V" 
> Cc: Bjorn Helgaas 
> Cc: Gaurav Batra 
> Cc: Nathan Lynch 
> Cc: Brian King 
> 
> Signed-off-by: Krishna Kumar 

As with v1, I can confirm that this patch solves the panic encountered
when hotplugging PCIe bridges on POWER9.

Tested-by: Shawn Anastasio 

Thanks,
Shawn

Re: [PATCH v3 2/2] powerpc: hotplug driver bridge support

2024-06-27 Thread Shawn Anastasio

Hi Krishna,

On 6/24/24 7:09 AM, Krishna Kumar wrote:
> There is an issue with the hotplug operation when it's done on the
> bridge/switch slot. The bridge-port and devices behind the bridge, which
> become offline by hot-unplug operation, don't get hot-plugged/enabled by
> doing hot-plug operation on that slot. Only the first port of the bridge
> gets enabled and the remaining port/devices remain unplugged. The hot
> plug/unplug operation is done by the hotplug driver
> (drivers/pci/hotplug/pnv_php.c).
> 
> Root Cause Analysis: This behavior is due to missing code for the
> switch/bridge. The existing driver depends on pci_hp_add_devices()
> function for device enablement. This function calls pci_scan_slot() on
> only one device-node/port of the bridge, not on all the siblings'
> device-node/port.
> 
> The missing code needs to be added which will find all the sibling
> device-nodes/bridge-ports and will run explicit pci_scan_slot() on
> those.  A new function has been added for this purpose which gets
> invoked from pci_hp_add_devices(). This new function
> pci_traverse_sibling_nodes_and_scan_slot() gets all the sibling
> bridge-ports by traversal and explicitly invokes pci_scan_slot on them.
> 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Christophe Leroy 
> Cc: "Aneesh Kumar K.V" 
> Cc: Bjorn Helgaas 
> Cc: Gaurav Batra 
> Cc: Nathan Lynch 
> Cc: Brian King 
> 
> Signed-off-by: Krishna Kumar 

Other than the case with NVMe devices failing that we discussed in v1's
thread, I can confirm that this patch resolves many of the issues we've
encountered with PCIe hotplug on POWER9.

Tested-by: Shawn Anastasio 

Thanks,
Shawn

Re: [PATCH 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv

2024-06-27 Thread Shawn Anastasio

Hi Krishna,

On 5/9/24 7:05 AM, Krishna Kumar wrote:
> Description of the problem: The hotplug driver for powerpc
> (pci/hotplug/pnv_php.c) gives kernel crash when we try to
> hot-unplug/disable the PCIe switch/bridge from the PHB.
> 
> 
> Root Cause of Crash: The crash is due to the reason that, though the msi
> data structure has been released during disable/hot-unplug path and it
> has been assigned with NULL, still during unregistartion the code was
> again trying to explicitly disable the msi which causes the Null pointer
> dereference and kernel crash.
> 
> 
> Proposed Fix : The fix is to correct the check during unregistration path
> so that the code should not  try to invoke pci_disable_msi/msix() if its
> data structure is already freed.
> 
> 
> Signed-off-by: Krishna Kumar 

I've tested this on a POWER9 box and can confirm that it fixes the panics
when hotplugging PCIe bridges.

Tested-by: Shawn Anastasio 

Thanks,
Shawn

Re: [PATCH v3 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv

2024-06-28 Thread Shawn Anastasio

Hi Michael,

On 6/27/24 11:48 PM, Michael Ellerman wrote:
> Was the panic reported anywhere? So we can link to the report in the
> commit.

It was indeed -- here is the link to the thread from last December:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2023-December/267034.html

> cheers

Thanks,
Shawn

[PATCH] powerpc/dma: Fix invalid DMA mmap behavior

2019-07-17 Thread Shawn Anastasio

The refactor of powerpc DMA functions in commit cc17d780
("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly
changes the way DMA mappings are handled on powerpc.
Since this change, all mapped pages are marked as cache-inhibited
through the default implementation of arch_dma_mmap_pgprot.
This differs from the previous behavior of only marking pages
in noncoherent mappings as cache-inhibited and has resulted in
sporadic system crashes in certain hardware configurations and
workloads (see Bugzilla).

This commit restores the previous correct behavior by providing
an implementation of arch_dma_mmap_pgprot that only marks
pages in noncoherent mappings as cache-inhibited. As this behavior
should be universal for all powerpc platforms a new file,
dma-generic.c, was created to store it.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204145
Fixes: cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent")
Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/kernel/Makefile |  3 ++-
 arch/powerpc/kernel/dma-common.c | 17 +
 3 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/kernel/dma-common.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d8dcd8820369..77f6ebf97113 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -121,6 +121,7 @@ config PPC
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
+   select ARCH_HAS_DMA_MMAP_PGPROT
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 56dfa7a2a6f2..ea0c69236789 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -49,7 +49,8 @@ obj-y := cputable.o ptrace.o 
syscalls.o \
   signal.o sysfs.o cacheinfo.o time.o \
   prom.o traps.o setup-common.o \
   udbg.o misc.o io.o misc_$(BITS).o \
-  of_platform.o prom_parse.o
+  of_platform.o prom_parse.o \
+  dma-common.o
 obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \
   signal_64.o ptrace32.o \
   paca.o nvram_64.o firmware.o
diff --git a/arch/powerpc/kernel/dma-common.c b/arch/powerpc/kernel/dma-common.c
new file mode 100644
index ..5a15f99f4199
--- /dev/null
+++ b/arch/powerpc/kernel/dma-common.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Contains common dma routines for all powerpc platforms.
+ *
+ * Copyright (C) 2019 Shawn Anastasio (sh...@anastas.io)
+ */
+
+#include 
+#include 
+
+pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
+   unsigned long attrs)
+{
+   if (!dev_is_dma_coherent(dev))
+   return pgprot_noncached(prot);
+   return prot;
+}
-- 
2.22.0

Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior

2019-07-17 Thread Shawn Anastasio


On 7/17/19 9:59 PM, Alexey Kardashevskiy wrote:



On 18/07/2019 09:54, Shawn Anastasio wrote:

The refactor of powerpc DMA functions in commit cc17d780
("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly
changes the way DMA mappings are handled on powerpc.
Since this change, all mapped pages are marked as cache-inhibited
through the default implementation of arch_dma_mmap_pgprot.
This differs from the previous behavior of only marking pages
in noncoherent mappings as cache-inhibited and has resulted in
sporadic system crashes in certain hardware configurations and
workloads (see Bugzilla).

This commit restores the previous correct behavior by providing
an implementation of arch_dma_mmap_pgprot that only marks
pages in noncoherent mappings as cache-inhibited. As this behavior
should be universal for all powerpc platforms a new file,
dma-generic.c, was created to store it.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204145
Fixes: cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent")
Signed-off-by: Shawn Anastasio 



Is this the default one?

include/linux/dma-noncoherent.h
# define arch_dma_mmap_pgprot(dev, prot, attrs) pgprot_noncached(prot)


Yep, that's the one.


Out of curiosity - do not we want to fix this one for everyone?


Other than m68k, mips, and arm64, everybody else that doesn't have
ARCH_NO_COHERENT_DMA_MMAP set uses this default implementation, so
I assume this behavior is acceptable on those architectures.

Either way, the patch is correct. I'm glad to know it was not my " 
powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59" 
which broke it :)


Yeah, turns out it was just bad luck that I happened to run into these
crashes right after deciding to try out your patch :)


Reviewed-by: Alexey Kardashevskiy 


Thanks!






---
  arch/powerpc/Kconfig |  1 +
  arch/powerpc/kernel/Makefile |  3 ++-
  arch/powerpc/kernel/dma-common.c | 17 +
  3 files changed, 20 insertions(+), 1 deletion(-)
  create mode 100644 arch/powerpc/kernel/dma-common.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d8dcd8820369..77f6ebf97113 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -121,6 +121,7 @@ config PPC
  select ARCH_32BIT_OFF_T if PPC32
  select ARCH_HAS_DEBUG_VIRTUAL
  select ARCH_HAS_DEVMEM_IS_ALLOWED
+    select ARCH_HAS_DMA_MMAP_PGPROT
  select ARCH_HAS_ELF_RANDOMIZE
  select ARCH_HAS_FORTIFY_SOURCE
  select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 56dfa7a2a6f2..ea0c69236789 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -49,7 +49,8 @@ obj-y    := cputable.o ptrace.o 
syscalls.o \

 signal.o sysfs.o cacheinfo.o time.o \
 prom.o traps.o setup-common.o \
 udbg.o misc.o io.o misc_$(BITS).o \
-   of_platform.o prom_parse.o
+   of_platform.o prom_parse.o \
+   dma-common.o
  obj-$(CONFIG_PPC64)    += setup_64.o sys_ppc32.o \
 signal_64.o ptrace32.o \
 paca.o nvram_64.o firmware.o
diff --git a/arch/powerpc/kernel/dma-common.c 
b/arch/powerpc/kernel/dma-common.c

new file mode 100644
index ..5a15f99f4199
--- /dev/null
+++ b/arch/powerpc/kernel/dma-common.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Contains common dma routines for all powerpc platforms.
+ *
+ * Copyright (C) 2019 Shawn Anastasio (sh...@anastas.io)
+ */
+
+#include 
+#include 
+
+pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
+    unsigned long attrs)
+{
+    if (!dev_is_dma_coherent(dev))
+    return pgprot_noncached(prot);
+    return prot;
+}

Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior

2019-07-18 Thread Shawn Anastasio


On 7/18/19 4:52 AM, Christoph Hellwig wrote:

On Thu, Jul 18, 2019 at 10:49:34AM +0200, Christoph Hellwig wrote:

On Thu, Jul 18, 2019 at 01:45:16PM +1000, Oliver O'Halloran wrote:

Other than m68k, mips, and arm64, everybody else that doesn't have
ARCH_NO_COHERENT_DMA_MMAP set uses this default implementation, so
I assume this behavior is acceptable on those architectures.


It might be acceptable, but there's no reason to use pgport_noncached
if the platform supports cache-coherent DMA.

Christoph (+cc) made the change so maybe he saw something we're missing.


I always found the forcing of noncached access even for coherent
devices a little odd, but this was inherited from the previous
implementation, which surprised me a bit as the different attributes
are usually problematic even on x86.  Let me dig into the history a
bit more, but I suspect the righ fix is to default to cached mappings
for coherent devices.


Ok, some history:

The generic dma mmap implementation, which we are effectively still
using today was added by:

commit 64ccc9c033c6089b2d426dad3c56477ab066c999
Author: Marek Szyprowski 
Date:   Thu Jun 14 13:03:04 2012 +0200

 common: dma-mapping: add support for generic dma_mmap_* calls

and unconditionally uses pgprot_noncached in dma_common_mmap, which is
then used as the fallback by dma_mmap_attrs if no ->mmap method is
present.  At that point we already had the powerpc implementation
that only uses pgprot_noncached for non-coherent mappings, and
the arm one, which uses pgprot_writecombine if DMA_ATTR_WRITE_COMBINE
is set and otherwise pgprot_dmacoherent, which seems to be uncached.
Arm did support coherent platforms at that time, but they might have
been an afterthought and not handled properly.

So it migt have been that we were all wrong for that time and might
have to fix it up.


Personally, I'm not a huge fan of an implicit default for something
inherently architecture-dependent like this at all. What I'd like to
see is a mechanism that forces architecture code to explicitly
opt in to the default pgprot settings if they don't provide an
implementation of arch_dma_mmap_pgprot. This could perhaps be done
by reversing ARCH_HAS_DMA_MMAP_PGPROT to something like
ARCH_USE_DEFAULT_DMA_MMAP_PGPROT.

This way as more systems are moved to use the common mmap code instead
of their ops->mmap, the people doing the refactoring have to make an
explicit decision about the pgprot settings to use. Such a configuration
would have likely prevented this situation with powerpc from happening.

That being said, if the default behavior doesn't make sense in the
general case it should probably be fixed as well.

Curious to hear some thoughts on this.

Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior

2019-07-19 Thread Shawn Anastasio

On 7/19/19 2:06 AM, Christoph Hellwig wrote:
> What is inherently architecture specific here over the fact that
> the pgprot_* expand to architecture specific bits?

What I meant is that different architectures seem to have different
criteria for setting the different pgprot_ bits. i.e. ppc checks
for cache coherency, arm64 checks for cache coherency and
writecombine, mips just checks for writecombine, etc.

That being said, I'm no expert here and there is probably some behavior
here that would make for a much more sane default.

> I'd rather not create boilerplate code where we don't have to it. Note
> that arch_dma_mmap_pgprot is a little misnamed now, as we also use it
> for the in-kernel remapping in kernel/dma/remap.c, which I'm slowly
> switching a lot of architectures to.  powerpc will follow soon once
> I get the ppc44x that was given to me to actually boot with a recent
> kernel (not that I've tried much so far).

Fair enough. I didn't realize that most of the other architectures
don't use the common code anyways as you mention below.

> Every arch except for arm32 now uses dma direct for the direct mapping,
> and thus the common code without the indeed odd default.  I also have
> a series to remove the default fallback, which is inherently a bad idea:
>
> 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-no-defaults

Awesome. This is great to hear.

> I think your patch that started this thread is fine for 5.3 and -stable:
>
> Reviewed-by: Christoph Hellwig 

Thanks!

> But going forward I'd rather have a sane default.

Agreed.

Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior

2019-07-22 Thread Shawn Anastasio





On 7/22/19 7:16 AM, Michael Ellerman wrote:

Arnd Bergmann  writes:

On Thu, Jul 18, 2019 at 11:52 AM Christoph Hellwig  wrote:

On Thu, Jul 18, 2019 at 10:49:34AM +0200, Christoph Hellwig wrote:

On Thu, Jul 18, 2019 at 01:45:16PM +1000, Oliver O'Halloran wrote:

Other than m68k, mips, and arm64, everybody else that doesn't have
ARCH_NO_COHERENT_DMA_MMAP set uses this default implementation, so
I assume this behavior is acceptable on those architectures.


It might be acceptable, but there's no reason to use pgport_noncached
if the platform supports cache-coherent DMA.

Christoph (+cc) made the change so maybe he saw something we're missing.


I always found the forcing of noncached access even for coherent
devices a little odd, but this was inherited from the previous
implementation, which surprised me a bit as the different attributes
are usually problematic even on x86.  Let me dig into the history a
bit more, but I suspect the righ fix is to default to cached mappings
for coherent devices.


Ok, some history:

The generic dma mmap implementation, which we are effectively still
using today was added by:

commit 64ccc9c033c6089b2d426dad3c56477ab066c999
Author: Marek Szyprowski 
Date:   Thu Jun 14 13:03:04 2012 +0200

 common: dma-mapping: add support for generic dma_mmap_* calls

and unconditionally uses pgprot_noncached in dma_common_mmap, which is
then used as the fallback by dma_mmap_attrs if no ->mmap method is
present.  At that point we already had the powerpc implementation
that only uses pgprot_noncached for non-coherent mappings, and
the arm one, which uses pgprot_writecombine if DMA_ATTR_WRITE_COMBINE
is set and otherwise pgprot_dmacoherent, which seems to be uncached.
Arm did support coherent platforms at that time, but they might have
been an afterthought and not handled properly.


Cache-coherent devices are still very rare on 32-bit ARM.

Among the callers of dma_mmap_coherent(), almost all are in platform
specific device drivers that only ever run on noncoherent ARM SoCs,
which explains why nobody would have noticed problems.

There is also a difference in behavior between ARM and PowerPC
when dealing with mismatched cacheability attributes: If the same
page is mapped as both cached and uncached to, this may
cause silent undefined behavior on ARM, while PowerPC should
enter a checkstop as soon as it notices.


On newer Power CPUs it's actually more like the ARM behaviour.

I don't know for sure that it will *never* checkstop but there are at
least cases where it won't. There's some (not much) detail in the
Power8/9 user manuals.


The issue was discovered due to sporadic checkstops on P9, so it
seems like it will happen at least sometimes.


cheers

Re: [PATCH 1/2] dma-mapping: fix page attributes for dma_mmap_*

2019-08-06 Thread Shawn Anastasio


On 8/5/19 10:01 AM, Christoph Hellwig wrote:

diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index 3813211a9aad..9ae5cee543c4 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -42,13 +42,8 @@ void arch_dma_free(struct device *dev, size_t size, void 
*cpu_addr,
dma_addr_t dma_addr, unsigned long attrs);
  long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
dma_addr_t dma_addr);
-
-#ifdef CONFIG_ARCH_HAS_DMA_MMAP_PGPROT
  pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
unsigned long attrs);
-#else
-# define arch_dma_mmap_pgprot(dev, prot, attrs)pgprot_noncached(prot)
-#endif


Nit, but maybe the prototype should still be ifdef'd here? It at least
could prevent a reader from incorrectly thinking that the function is
always present.

Also, like Will mentioned earlier, the function name isn't entirely
accurate anymore. I second the suggestion of using something like
arch_dma_noncoherent_pgprot(). As for your idea of defining
pgprot_dmacoherent for all architectures as

#ifndef pgprot_dmacoherent
#define pgprot_dmacoherent pgprot_noncached
#endif

I think that the name here is kind of misleading too, since this
definition will only be used when there is no support for proper
DMA coherency.

Re: [PATCH 1/2] dma-mapping: fix page attributes for dma_mmap_*

2019-08-07 Thread Shawn Anastasio


On 8/7/19 8:04 AM, Christoph Hellwig wrote:

Actually it is typical modern Linux style to just provide a prototype
and then use "if (IS_ENABLED(CONFIG_FOO))" to guard the call(s) to it.


I see.


Also, like Will mentioned earlier, the function name isn't entirely
accurate anymore. I second the suggestion of using something like
arch_dma_noncoherent_pgprot().


As mentioned I plan to remove arch_dma_mmap_pgprot for 5.4, so I'd
rather avoid churn for the short period of time.


Yeah, fair enough.


As for your idea of defining
pgprot_dmacoherent for all architectures as

#ifndef pgprot_dmacoherent
#define pgprot_dmacoherent pgprot_noncached
#endif

I think that the name here is kind of misleading too, since this
definition will only be used when there is no support for proper
DMA coherency.


Do you have a suggestion for a better name?  I'm pretty bad at naming,
so just reusing the arm name seemed like a good way to avoid having
to make naming decisions myself.


Good question. Perhaps something like `pgprot_dmacoherent_fallback`
would better convey that this is only used for devices that don't
support DMA coherency? Or maybe `pgprot_dma_noncoherent`?

[PATCH 0/2] Fix IOMMU setup for hotplugged devices on pseries

2019-09-04 Thread Shawn Anastasio

On pseries QEMU guests, IOMMU setup for hotplugged PCI devices is currently
broken for all but the first device on a given bus. The culprit is an ordering
issue in the pseries hotplug path (via pci_rescan_bus()) which results in IOMMU
group assigment occuring before device registration in sysfs. This triggers
the following check in arch/powerpc/kernel/iommu.c:

/*
 * The sysfs entries should be populated before
 * binding IOMMU group. If sysfs entries isn't
 * ready, we simply bail.
 */
if (!device_is_registered(dev))
return -ENOENT;

This fails for hotplugged devices since the pcibios_add_device() call in the
pseries hotplug path (in pci_device_add()) occurs before device_add().
Since the IOMMU groups are set up in pcibios_add_device(), this means that a
sysfs entry will not yet be present and it will fail.

There is a special case that allows the first hotplugged device on a bus to
succeed, though. The powerpc pcibios_add_device() implementation will skip
initializing the device if bus setup is not yet complete.
Later, the pci core will call pcibios_fixup_bus() which will perform setup
for the first (and only) device on the bus and since it has already been
registered in sysfs, the IOMMU setup will succeed.

My current solution is to introduce another pcibios function, pcibios_fixup_dev,
which is called after device_add() in pci_device_add(). Then in powerpc code,
pcibios_setup_device() was moved from pcibios_add_device() to this new function
which will occur after sysfs registration so IOMMU assignment will succeed.

I added a new pcibios function rather than moving the pcibios_add_device() call
to after the device_add() call in pci_add_device() because there are other
architectures that use it and it wasn't immediately clear to me whether moving
it would break them.

If anybody has more insight or a better way to fix this, please let me know.

Shawn Anastasio (2):
  PCI: Introduce pcibios_fixup_dev()
  powerpc/pci: Fix IOMMU setup for hotplugged devices on pseries

 arch/powerpc/kernel/pci-common.c | 13 ++---
 drivers/pci/probe.c  | 14 ++
 include/linux/pci.h  |  1 +
 3 files changed, 21 insertions(+), 7 deletions(-)

-- 
2.20.1

[PATCH 1/2] PCI: Introduce pcibios_fixup_dev()

2019-09-04 Thread Shawn Anastasio

Introduce pcibios_fixup_dev to allow platform-specific code to perform
final setup of a PCI device after it has been registered in sysfs.

The default implementation is a no-op.

Signed-off-by: Shawn Anastasio 
---
 drivers/pci/probe.c | 14 ++
 include/linux/pci.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a3c7338fad86..14eb7ee38794 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2652,6 +2652,17 @@ static void pci_set_msi_domain(struct pci_dev *dev)
dev_set_msi_domain(&dev->dev, d);
 }
 
+/**
+ * pcibios_fixup_dev - Platform-specific device setup
+ * @dev: Device to set up
+ *
+ * Default empty implementation. Replace with an architecture-specific
+ * setup routine, if necessary.
+ */
+void __weak pcibios_fixup_dev(struct pci_dev *dev)
+{
+}
+
 void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 {
int ret;
@@ -2699,6 +2710,9 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus 
*bus)
dev->match_driver = false;
ret = device_add(&dev->dev);
WARN_ON(ret < 0);
+
+   /* Allow platform-specific code to perform final setup of device */
+   pcibios_fixup_dev(dev);
 }
 
 struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 82e4cd1b7ac3..83eb0e241137 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -960,6 +960,7 @@ void pcibios_bus_add_device(struct pci_dev *pdev);
 void pcibios_add_bus(struct pci_bus *bus);
 void pcibios_remove_bus(struct pci_bus *bus);
 void pcibios_fixup_bus(struct pci_bus *);
+void pcibios_fixup_dev(struct pci_dev *);
 int __must_check pcibios_enable_device(struct pci_dev *, int mask);
 /* Architecture-specific versions may override this (weak) */
 char *pcibios_setup(char *str);
-- 
2.20.1

[PATCH 2/2] powerpc/pci: Fix IOMMU setup for hotplugged devices on pseries

2019-09-04 Thread Shawn Anastasio

Move PCI device setup from pcibios_add_device() to pcibios_fixup_dev().
This ensures that platform-specific DMA and IOMMU setup occurs after the
device has been registered in sysfs, which is a requirement for IOMMU group
assignment to work.

This fixes IOMMU group assignment for hotplugged devices on pseries, where
the existing behavior results in IOMMU assignment before registration.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/kernel/pci-common.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index f627e15bb43c..21b4761bb0ed 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -987,15 +987,14 @@ static void pcibios_setup_device(struct pci_dev *dev)
ppc_md.pci_irq_fixup(dev);
 }
 
-int pcibios_add_device(struct pci_dev *dev)
+void pcibios_fixup_dev(struct pci_dev *dev)
 {
-   /*
-* We can only call pcibios_setup_device() after bus setup is complete,
-* since some of the platform specific DMA setup code depends on it.
-*/
-   if (dev->bus->is_added)
-   pcibios_setup_device(dev);
+   /* Device is registered in sysfs and ready to be set up */
+   pcibios_setup_device(dev);
+}
 
+int pcibios_add_device(struct pci_dev *dev)
+{
 #ifdef CONFIG_PCI_IOV
if (ppc_md.pcibios_fixup_sriov)
ppc_md.pcibios_fixup_sriov(dev);
-- 
2.20.1

Re: [PATCH 0/2] Fix IOMMU setup for hotplugged devices on pseries

2019-09-05 Thread Shawn Anastasio


On 9/5/19 4:08 AM, Alexey Kardashevskiy wrot>
I just tried hotplugging 3 virtio-net devices into a guest system with 
v5.2 kernel and it seems working (i.e. BARs mapped, a driver is bound):



root@le-dbg:~# lspci -v | egrep -i '(virtio|Memory)'
00:00.0 Ethernet controller: Red Hat, Inc Virtio network device
     Memory at 20008004 (32-bit, non-prefetchable) [size=4K]
     Memory at 2100 (64-bit, prefetchable) [size=16K]
     Kernel driver in use: virtio-pci
00:01.0 Ethernet controller: Red Hat, Inc Virtio network device
     Memory at 200080041000 (32-bit, non-prefetchable) [size=4K]
     Memory at 21004000 (64-bit, prefetchable) [size=16K]
     Kernel driver in use: virtio-pci
00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
     Memory at 200080042000 (32-bit, non-prefetchable) [size=4K]
     Memory at 21008000 (64-bit, prefetchable) [size=16K]
     Kernel driver in use: virtio-pci

Can you explain in detail what you are doing exactly and what is failing 
and what qemu/guest kernel/guest distro is used? Thanks,


Sure. I'm on host kernel 5.2.8, guest on 5.3-rc7 (also tested on 5.1.16)
and I'm hotplugging ivshmem devices to a separate spapr-pci-host-bridge
defined as follows:

-device spapr-pci-host-bridge,index=1,id=pci.1

Device hotplug and BAR assignment does work, but IOMMU group assignment
seems to fail. This is evidenced by the kernel log which shows the
following message for the first device but not the second:

[  136.849448] pci 0001:00:00.0: Adding to iommu group 1

Trying to bind the second device to vfio-pci as a result of this
fails:

[  471.691948] vfio-pci: probe of 0001:00:01.0 failed with error -22

I traced that failure to a call to iommu_group_get() which returns
NULL for the second device. I then traced that back to the ordering
issue I described.

For your second and third virtio-net devices, was the
"Adding to iommu group N" kernel message printed?

Re: [PATCH 0/2] Fix IOMMU setup for hotplugged devices on pseries

2019-09-05 Thread Shawn Anastasio


On 9/5/19 4:38 AM, Lukas Wunner wrote:

On Wed, Sep 04, 2019 at 11:22:13PM -0500, Shawn Anastasio wrote:

If anybody has more insight or a better way to fix this, please let me know.


Have you considered moving the invocation of pcibios_setup_device()
to pcibios_bus_add_device()?

The latter is called from pci_bus_add_device() in drivers/pci/bus.c.
At this point device_add() has been called, so the device exists in
sysfs.

Basically when adding a PCI device, the order is:

* pci_device_add() populates struct pci_dev, calls device_add(),
   binding the device to a driver is prevented
* after pci_device_add() has been called for all discovered devices,
   resources are allocated
* pci_bus_add_device() is called for each device,
   calls pcibios_bus_add_device() and binds the device to a driver


Thank you, this is exactly what I was looking for! Just tested and
this seems to work perfectly. I'll go ahead and submit a v2 that
does this instead.

Thanks again,
Shawn

[PATCH v2 0/1] Fix IOMMU setup for hotplugged devices on pseries

2019-09-05 Thread Shawn Anastasio

Changes from v2:
  - Remove pcibios_fixup_dev()
  - Remove pcibios_setup_bus_device() call in pcibios_fixup_bus() since
all device setup will now be handled in pcibios_bus_add_device()

On pseries QEMU guests, IOMMU setup for hotplugged PCI devices is currently
broken for all but the first device on a given bus. The culprit is an ordering
issue in the pseries hotplug path (via pci_rescan_bus()) which results in IOMMU
group assigment occuring before device registration in sysfs. This triggers
the following check in arch/powerpc/kernel/iommu.c:

/*
 * The sysfs entries should be populated before
 * binding IOMMU group. If sysfs entries isn't
 * ready, we simply bail.
 */
if (!device_is_registered(dev))
return -ENOENT;

This fails for hotplugged devices since the pcibios_add_device() call in the
pseries hotplug path (in pci_device_add()) occurs before device_add().
Since the IOMMU groups are set up in pcibios_add_device(), this means that a
sysfs entry will not yet be present and it will fail.

There is a special case that allows the first hotplugged device on a bus to
succeed, though. The powerpc pcibios_add_device() implementation will skip
initializing the device if bus setup is not yet complete.
Later, the pci core will call pcibios_fixup_bus() which will perform setup
for the first (and only) device on the bus and since it has already been
registered in sysfs, the IOMMU setup will succeed.

The current solution is to move all device setup to pcibios_bus_add_device()
which will occur after all devices have been registered.

Shawn Anastasio (1):
  powerpc/pci: Fix pcibios_setup_device() ordering

 arch/powerpc/kernel/pci-common.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

-- 
2.20.1

[PATCH v2 1/1] powerpc/pci: Fix pcibios_setup_device() ordering

2019-09-05 Thread Shawn Anastasio

Move PCI device setup from pcibios_add_device() and pcibios_fixup_bus() to
pcibios_bus_add_device(). This ensures that platform-specific DMA and IOMMU
setup occurs after the device has been registered in sysfs, which is a
requirement for IOMMU group assignment to work

This fixes IOMMU group assignment for hotplugged devices on pseries, where
the existing behavior results in IOMMU assignment before registration.

Thanks to Lukas Wunner  for the suggestion.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/kernel/pci-common.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index f627e15bb43c..d119c77efb69 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -261,12 +261,6 @@ int pcibios_sriov_disable(struct pci_dev *pdev)
 
 #endif /* CONFIG_PCI_IOV */
 
-void pcibios_bus_add_device(struct pci_dev *pdev)
-{
-   if (ppc_md.pcibios_bus_add_device)
-   ppc_md.pcibios_bus_add_device(pdev);
-}
-
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
@@ -987,15 +981,17 @@ static void pcibios_setup_device(struct pci_dev *dev)
ppc_md.pci_irq_fixup(dev);
 }
 
-int pcibios_add_device(struct pci_dev *dev)
+void pcibios_bus_add_device(struct pci_dev *pdev)
 {
-   /*
-* We can only call pcibios_setup_device() after bus setup is complete,
-* since some of the platform specific DMA setup code depends on it.
-*/
-   if (dev->bus->is_added)
-   pcibios_setup_device(dev);
+   /* Perform platform-specific device setup */
+   pcibios_setup_device(pdev);
+
+   if (ppc_md.pcibios_bus_add_device)
+   ppc_md.pcibios_bus_add_device(pdev);
+}
 
+int pcibios_add_device(struct pci_dev *dev)
+{
 #ifdef CONFIG_PCI_IOV
if (ppc_md.pcibios_fixup_sriov)
ppc_md.pcibios_fixup_sriov(dev);
@@ -1037,9 +1033,6 @@ void pcibios_fixup_bus(struct pci_bus *bus)
 
/* Now fixup the bus bus */
pcibios_setup_bus_self(bus);
-
-   /* Now fixup devices on that bus */
-   pcibios_setup_bus_devices(bus);
 }
 EXPORT_SYMBOL(pcibios_fixup_bus);
 
-- 
2.20.1

Re: [PATCH v2 1/1] powerpc/pci: Fix pcibios_setup_device() ordering

2019-09-27 Thread Shawn Anastasio


On 9/9/19 2:59 AM, Alexey Kardashevskiy wrote:



On 06/09/2019 05:13, Shawn Anastasio wrote:

Move PCI device setup from pcibios_add_device() and pcibios_fixup_bus() to
pcibios_bus_add_device(). This ensures that platform-specific DMA and IOMMU
setup occurs after the device has been registered in sysfs, which is a
requirement for IOMMU group assignment to work

This fixes IOMMU group assignment for hotplugged devices on pseries, where
the existing behavior results in IOMMU assignment before registration.



Although this is a correct approach which we should proceed with, this
breaks adding of SRIOV VFs from pnv_tce_iommu_bus_notifier (and possibly
the bare metal PCI hotplug), I am trying to fix that now...


Were you able to make any progress? I can think of a couple of ways
to fix SRIOV, but they're not particularly elegant and involve
duplication.

Re: IOMMU group creation for pseries hotplug, and powernv VFs

2019-10-01 Thread Shawn Anastasio


On 9/29/19 9:08 PM, Oliver O'Halloran wrote:

A couple of extra patches on top of Shawn's existing re-ordering patch.
This seems to fix the problem Alexey noted with Shawn's change causing
VFs to lose their IOMMU group. I've tried pretty hard to make this a
minimal fix it's still a bit large.

If mpe is happy to take this as a fix for 5.4 then I'll leave it,
otherwise we might want to look at different approaches.



Thanks for fixing this Oliver!

Reviewed-by: Shawn Anastasio

Re: [PATCH v2 1/3] powernv/iov: Ensure the pdn for VFs always contains a valid PE number

2019-12-03 Thread Shawn Anastasio


On 10/28/19 3:54 AM, Oliver O'Halloran wrote:

On pseries there is a bug with adding hotplugged devices to an IOMMU group.
For a number of dumb reasons fixing that bug first requires re-working how
VFs are configured on PowerNV.


Are these patches expected to land soon, or is there another fix in the
pipeline? As far as I can tell this is still an issue.

[PATCH 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init

2019-05-07 Thread Shawn Anastasio

On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.

Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
 
resource_size_t (*pcibios_default_alignment)(void);
 
+   /* Called when determining PCI resource alignment */
+   int (*pcibios_ignore_alignment_request)(void);
+
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..1a6ded45a701 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
return 0;
 }
 
+resource_size_t pcibios_ignore_alignment_request(void)
+{
+   if (ppc_md.pcibios_ignore_alignment_request)
+   return ppc_md.pcibios_ignore_alignment_request();
+
+   /* Fall back to default method of checking PCI_PROBE_ONLY */
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..c6af2ed8ee0f 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
+int initial_pci_init_done; /* TRUE if initial pcibios init has completed */
+
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t 
pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
 }
 #endif
 
+static void pseries_after_init(void)
+{
+   initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+   if (initial_pci_init_done)
+   /*
+* Allow custom alignments after init for things
+* like PCI hotplugging.
+*/
+   return 0;
+
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 static void __init pSeries_setup_arch(void)
 {
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
}
 
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+   ppc_md.pcibios_after_init = pseries_after_init;
+   ppc_md.pcibios_ignore_alignment_request =
+   pseries_ignore_alignment_request;
 }
 
 static void pseries_panic(char *str)
-- 
2.20.1

[PATCH 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-07 Thread Shawn Anastasio

Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
 drivers/pci/pci.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 766f5779db92..e7a925f2188e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5896,6 +5896,11 @@ resource_size_t __weak pcibios_default_alignment(void)
return 0;
 }
 
+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5920,9 +5925,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
p = resource_alignment_param;
if (!*p && !align)
goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
goto out;
}
 
-- 
2.20.1

[PATCH 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64

2019-05-07 Thread Shawn Anastasio

Enable the pcibios_after_init hook on all powerpc platforms.
This hook is executed at the end of pcibios_init and was previously
only available on CONFIG_PPC32.

Since it is useful and not inherently limited to 32-bit mode,
remove the limitation and allow it on all powerpc platforms.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h | 3 +--
 arch/powerpc/kernel/pci_64.c   | 4 
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2f0ca6560e47..2fbfaa9176ed 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -150,6 +150,7 @@ struct machdep_calls {
void(*init)(void);
 
void(*kgdb_map_scc)(void);
+#endif /* CONFIG_PPC32 */
 
/*
 * optional PCI "hooks"
@@ -157,8 +158,6 @@ struct machdep_calls {
/* Called at then very end of pcibios_init() */
void (*pcibios_after_init)(void);
 
-#endif /* CONFIG_PPC32 */
-
/* Called in indirect_* to avoid touching devices */
int (*pci_exclude_device)(struct pci_controller *, unsigned char, 
unsigned char);
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..fba7fe6e4a50 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -68,6 +68,10 @@ static int __init pcibios_init(void)
 
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
+   /* Call machine dependent post-init code */
+   if (ppc_md.pcibios_after_init)
+   ppc_md.pcibios_after_init();
+
return 0;
 }
 
-- 
2.20.1

[PATCH 0/3] Allow custom PCI resource alignment on pseries

2019-05-07 Thread Shawn Anastasio

(Resent to include relevant mailing lists - sorry about that!)

Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of 
configuring the resources and should obey alignment requirements.

The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
 
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use 
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").

Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
  PCI: Introduce pcibios_ignore_alignment_request
  powerpc/64: Enable pcibios_after_init hook on ppc64
  powerpc/pseries: Allow user-specified PCI resource alignment after
init

 arch/powerpc/include/asm/machdep.h |  6 --
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/kernel/pci_64.c   |  4 
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 drivers/pci/pci.c  |  9 +++--
 5 files changed, 46 insertions(+), 4 deletions(-)

-- 
2.20.1

[RESEND PATCH 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Shawn Anastasio

Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of 
configuring the resources and should obey alignment requirements.

The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
 
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use 
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").

Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
  PCI: Introduce pcibios_ignore_alignment_request
  powerpc/64: Enable pcibios_after_init hook on ppc64
  powerpc/pseries: Allow user-specified PCI resource alignment after
init

 arch/powerpc/include/asm/machdep.h |  6 --
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/kernel/pci_64.c   |  4 
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 drivers/pci/pci.c  |  9 +++--
 5 files changed, 46 insertions(+), 4 deletions(-)

-- 
2.20.1

[RESEND PATCH 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64

2019-05-27 Thread Shawn Anastasio

Enable the pcibios_after_init hook on all powerpc platforms.
This hook is executed at the end of pcibios_init and was previously
only available on CONFIG_PPC32.

Since it is useful and not inherently limited to 32-bit mode,
remove the limitation and allow it on all powerpc platforms.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h | 3 +--
 arch/powerpc/kernel/pci_64.c   | 4 
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2f0ca6560e47..2fbfaa9176ed 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -150,6 +150,7 @@ struct machdep_calls {
void(*init)(void);
 
void(*kgdb_map_scc)(void);
+#endif /* CONFIG_PPC32 */
 
/*
 * optional PCI "hooks"
@@ -157,8 +158,6 @@ struct machdep_calls {
/* Called at then very end of pcibios_init() */
void (*pcibios_after_init)(void);
 
-#endif /* CONFIG_PPC32 */
-
/* Called in indirect_* to avoid touching devices */
int (*pci_exclude_device)(struct pci_controller *, unsigned char, 
unsigned char);
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..fba7fe6e4a50 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -68,6 +68,10 @@ static int __init pcibios_init(void)
 
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
+   /* Call machine dependent post-init code */
+   if (ppc_md.pcibios_after_init)
+   ppc_md.pcibios_after_init();
+
return 0;
 }
 
-- 
2.20.1

[RESEND PATCH 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init

2019-05-27 Thread Shawn Anastasio

On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.

Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
 
resource_size_t (*pcibios_default_alignment)(void);
 
+   /* Called when determining PCI resource alignment */
+   int (*pcibios_ignore_alignment_request)(void);
+
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..1a6ded45a701 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
return 0;
 }
 
+resource_size_t pcibios_ignore_alignment_request(void)
+{
+   if (ppc_md.pcibios_ignore_alignment_request)
+   return ppc_md.pcibios_ignore_alignment_request();
+
+   /* Fall back to default method of checking PCI_PROBE_ONLY */
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..c6af2ed8ee0f 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
+int initial_pci_init_done; /* TRUE if initial pcibios init has completed */
+
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t 
pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
 }
 #endif
 
+static void pseries_after_init(void)
+{
+   initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+   if (initial_pci_init_done)
+   /*
+* Allow custom alignments after init for things
+* like PCI hotplugging.
+*/
+   return 0;
+
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 static void __init pSeries_setup_arch(void)
 {
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
}
 
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+   ppc_md.pcibios_after_init = pseries_after_init;
+   ppc_md.pcibios_ignore_alignment_request =
+   pseries_ignore_alignment_request;
 }
 
 static void pseries_panic(char *str)
-- 
2.20.1

[RESEND PATCH 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-27 Thread Shawn Anastasio

Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
 drivers/pci/pci.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
return 0;
 }
 
+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
p = resource_alignment_param;
if (!*p && !align)
goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
goto out;
}
 
-- 
2.20.1

[PATCH v2 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-27 Thread Shawn Anastasio

Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
 drivers/pci/pci.c   | 9 +++--
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
return 0;
 }
 
+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
p = resource_alignment_param;
if (!*p && !align)
goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
goto out;
}
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int 
active) {}
 int pcibios_alloc_irq(struct pci_dev *dev);
 void pcibios_free_irq(struct pci_dev *dev);
 resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);
 
 #ifdef CONFIG_HIBERNATE_CALLBACKS
 extern struct dev_pm_ops pcibios_pm_ops;
-- 
2.20.1

[PATCH v2 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init

2019-05-27 Thread Shawn Anastasio

On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.

Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
 
resource_size_t (*pcibios_default_alignment)(void);
 
+   /* Called when determining PCI resource alignment */
+   int (*pcibios_ignore_alignment_request)(void);
+
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..1a6ded45a701 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
return 0;
 }
 
+resource_size_t pcibios_ignore_alignment_request(void)
+{
+   if (ppc_md.pcibios_ignore_alignment_request)
+   return ppc_md.pcibios_ignore_alignment_request();
+
+   /* Fall back to default method of checking PCI_PROBE_ONLY */
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..07f03be02afe 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
+static int initial_pci_init_done; /* TRUE if initial pcibios init has 
completed */
+
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t 
pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
 }
 #endif
 
+static void pseries_after_init(void)
+{
+   initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+   if (initial_pci_init_done)
+   /*
+* Allow custom alignments after init for things
+* like PCI hotplugging.
+*/
+   return 0;
+
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 static void __init pSeries_setup_arch(void)
 {
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
}
 
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+   ppc_md.pcibios_after_init = pseries_after_init;
+   ppc_md.pcibios_ignore_alignment_request =
+   pseries_ignore_alignment_request;
 }
 
 static void pseries_panic(char *str)
-- 
2.20.1

[PATCH v2 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Shawn Anastasio

Changes from v1 to v2:
  - Fix function declaration warnings caught by sparse

Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of 
configuring the resources and should obey alignment requirements.

The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
 
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use 
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").

Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
  PCI: Introduce pcibios_ignore_alignment_request
  powerpc/64: Enable pcibios_after_init hook on ppc64
  powerpc/pseries: Allow user-specified PCI resource alignment after
init

 arch/powerpc/include/asm/machdep.h |  6 --
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/kernel/pci_64.c   |  4 
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 drivers/pci/pci.c  |  9 +++--
 include/linux/pci.h|  1 +
 6 files changed, 47 insertions(+), 4 deletions(-)

-- 
2.20.1

[PATCH v2 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64

2019-05-27 Thread Shawn Anastasio

Enable the pcibios_after_init hook on all powerpc platforms.
This hook is executed at the end of pcibios_init and was previously
only available on CONFIG_PPC32.

Since it is useful and not inherently limited to 32-bit mode,
remove the limitation and allow it on all powerpc platforms.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h | 3 +--
 arch/powerpc/kernel/pci_64.c   | 4 
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2f0ca6560e47..2fbfaa9176ed 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -150,6 +150,7 @@ struct machdep_calls {
void(*init)(void);
 
void(*kgdb_map_scc)(void);
+#endif /* CONFIG_PPC32 */
 
/*
 * optional PCI "hooks"
@@ -157,8 +158,6 @@ struct machdep_calls {
/* Called at then very end of pcibios_init() */
void (*pcibios_after_init)(void);
 
-#endif /* CONFIG_PPC32 */
-
/* Called in indirect_* to avoid touching devices */
int (*pci_exclude_device)(struct pci_controller *, unsigned char, 
unsigned char);
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..fba7fe6e4a50 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -68,6 +68,10 @@ static int __init pcibios_init(void)
 
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
+   /* Call machine dependent post-init code */
+   if (ppc_md.pcibios_after_init)
+   ppc_md.pcibios_after_init();
+
return 0;
 }
 
-- 
2.20.1

[PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init

2019-05-27 Thread Shawn Anastasio

On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.

Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
 
resource_size_t (*pcibios_default_alignment)(void);
 
+   /* Called when determining PCI resource alignment */
+   int (*pcibios_ignore_alignment_request)(void);
+
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..8e0d73b4c188 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
return 0;
 }
 
+int pcibios_ignore_alignment_request(void)
+{
+   if (ppc_md.pcibios_ignore_alignment_request)
+   return ppc_md.pcibios_ignore_alignment_request();
+
+   /* Fall back to default method of checking PCI_PROBE_ONLY */
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..07f03be02afe 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
+static int initial_pci_init_done; /* TRUE if initial pcibios init has 
completed */
+
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t 
pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
 }
 #endif
 
+static void pseries_after_init(void)
+{
+   initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+   if (initial_pci_init_done)
+   /*
+* Allow custom alignments after init for things
+* like PCI hotplugging.
+*/
+   return 0;
+
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 static void __init pSeries_setup_arch(void)
 {
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
}
 
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+   ppc_md.pcibios_after_init = pseries_after_init;
+   ppc_md.pcibios_ignore_alignment_request =
+   pseries_ignore_alignment_request;
 }
 
 static void pseries_panic(char *str)
-- 
2.20.1

[PATCH v3 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Shawn Anastasio

Changes from v2 to v3:
  - Fix wrong return type of ppc pcibios_ignore_alignment_request
(Not sure how my local compile didn't catch that!)

Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of 
configuring the resources and should obey alignment requirements.

The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
 
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use 
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").

Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
  PCI: Introduce pcibios_ignore_alignment_request
  powerpc/64: Enable pcibios_after_init hook on ppc64
  powerpc/pseries: Allow user-specified PCI resource alignment after
init

 arch/powerpc/include/asm/machdep.h |  6 --
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/kernel/pci_64.c   |  4 
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 drivers/pci/pci.c  |  9 +++--
 include/linux/pci.h|  1 +
 6 files changed, 47 insertions(+), 4 deletions(-)

-- 
2.20.1

[PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-27 Thread Shawn Anastasio

Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
 drivers/pci/pci.c   | 9 +++--
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
return 0;
 }
 
+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
p = resource_alignment_param;
if (!*p && !align)
goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
goto out;
}
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int 
active) {}
 int pcibios_alloc_irq(struct pci_dev *dev);
 void pcibios_free_irq(struct pci_dev *dev);
 resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);
 
 #ifdef CONFIG_HIBERNATE_CALLBACKS
 extern struct dev_pm_ops pcibios_pm_ops;
-- 
2.20.1

[PATCH v3 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64

2019-05-27 Thread Shawn Anastasio

Enable the pcibios_after_init hook on all powerpc platforms.
This hook is executed at the end of pcibios_init and was previously
only available on CONFIG_PPC32.

Since it is useful and not inherently limited to 32-bit mode,
remove the limitation and allow it on all powerpc platforms.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h | 3 +--
 arch/powerpc/kernel/pci_64.c   | 4 
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2f0ca6560e47..2fbfaa9176ed 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -150,6 +150,7 @@ struct machdep_calls {
void(*init)(void);
 
void(*kgdb_map_scc)(void);
+#endif /* CONFIG_PPC32 */
 
/*
 * optional PCI "hooks"
@@ -157,8 +158,6 @@ struct machdep_calls {
/* Called at then very end of pcibios_init() */
void (*pcibios_after_init)(void);
 
-#endif /* CONFIG_PPC32 */
-
/* Called in indirect_* to avoid touching devices */
int (*pci_exclude_device)(struct pci_controller *, unsigned char, 
unsigned char);
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..fba7fe6e4a50 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -68,6 +68,10 @@ static int __init pcibios_init(void)
 
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
+   /* Call machine dependent post-init code */
+   if (ppc_md.pcibios_after_init)
+   ppc_md.pcibios_after_init();
+
return 0;
 }
 
-- 
2.20.1

Re: [RESEND PATCH 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Shawn Anastasio

On 5/27/19 11:01 PM, Oliver wrote:

On Tue, May 28, 2019 at 8:56 AM Shawn Anastasio wrote:

Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of
configuring the resources and should obey alignment requirements.

Are you using hotplug to work around SLOF (the OF we use under qemu)
not aligning BARs to 64K? It looks like there is a commit in SLOF to
fix that
(https://git.qemu.org/?p=SLOF.git;a=commit;f=board-qemu/slof/pci-phb.fs;h=1903174472f8800caf50c959b304501b4c01153c).

No, my application actually requires PCI hotplug at run-time.

The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.

With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").

Can we make aligning the BARs to PAGE_SIZE the default behaviour? The
BAR assignment process is complex enough as-is so I'd rather we didn't
add another platform hack into the mix.

Absolutely. This will still require the existing changes so that the
custom alignment isn't flat-out ignored on pseries, but I can set

it to default to PAGE_SIZE as well, similar to how it's done on PowerNV.
I've just pushed a v3 to fix a typo and I'll incorporate this change
in v4.

Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
PCI: Introduce pcibios_ignore_alignment_request
powerpc/64: Enable pcibios_after_init hook on ppc64
powerpc/pseries: Allow user-specified PCI resource alignment after
init

--
2.20.1

Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-27 Thread Shawn Anastasio





On 5/28/19 12:36 AM, Oliver wrote:

On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio  wrote:


Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
  drivers/pci/pci.c   | 9 +++--
  include/linux/pci.h | 1 +
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
 return 0;
  }

+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
  static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
 p = resource_alignment_param;
 if (!*p && !align)
 goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
 align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
 goto out;
 }


I think the logic here is questionable to begin with. If the user has
explicitly requested re-aligning a resource via the command line then
we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...

Interesting, I wasn't aware that hotplugged devices are configured
by the hypervisor on PowerVM. That at least means that this patch is
wrong as-is since it won't handle that properly. Definitely seems like
there will need to be different behavior here depending on the hypervisor.

That being said, wouldn't PCI_PROBE_ONLY still be set on pseries/kvm
(at least for initial boot) to observe SLOF's original BAR assignments?
Perhaps it should be un-set after initial PCI init?



IIRC Sam Bobroff was looking at hotplug under pseries recently so he
might have something to add. He's sick at the moment, but I'll ask him
to take a look at this once he's back among the living


Good to know. I'll await his comments before continuing here.


diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int 
active) {}
  int pcibios_alloc_irq(struct pci_dev *dev);
  void pcibios_free_irq(struct pci_dev *dev);
  resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);

  #ifdef CONFIG_HIBERNATE_CALLBACKS
  extern struct dev_pm_ops pcibios_pm_ops;
--
2.20.1

Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-28 Thread Shawn Anastasio





On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:



On 28/05/2019 15:36, Oliver wrote:

On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio  wrote:


Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
  drivers/pci/pci.c   | 9 +++--
  include/linux/pci.h | 1 +
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
 return 0;
  }

+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
  static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
 p = resource_alignment_param;
 if (!*p && !align)
 goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
 align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
 goto out;
 }


I think the logic here is questionable to begin with. If the user has
explicitly requested re-aligning a resource via the command line then
we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...



The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
guest which receives an event from qemu (RAS_EPOW from
/proc/interrupts), fetches device tree chunks (and as I understand it -
they come with BARs from phyp but without from qemu) and writes "1" to
"/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:


Interesting. Does this mean that the PHYP hotplug path doesn't
call pci_assign_resource? If so it means the patch may not
break that platform after all, though it still may not be
the correct way of doing things.



[c6e6f960] [c05f62d4] pci_assign_resource+0x44/0x360

[c6e6fa10] [c05f8b54]
assign_requested_resources_sorted+0x84/0x110
[c6e6fa60] [c05f9540] __assign_resources_sorted+0xd0/0x750
[c6e6fb40] [c05fb2e0]
__pci_bus_assign_resources+0x80/0x280
[c6e6fc00] [c05fb95c]
pci_assign_unassigned_bus_resources+0xbc/0x100
[c6e6fc60] [c05e3d74] pci_rescan_bus+0x34/0x60

[c6e6fc90] [c05f1ef4] rescan_store+0x84/0xc0

[c6e6fcd0] [c068060c] bus_attr_store+0x3c/0x60

[c6e6fcf0] [c037853c] sysfs_kf_write+0x5c/0x80







IIRC Sam Bobroff was looking at hotplug under pseries recently so he
might have something to add. He's sick at the moment, but I'll ask him
to take a look at this once he's back among the living


diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int 
active) {}
  int pcibios_alloc_irq(struct pci_dev *dev);
  void pcibios_free_irq(struct pci_dev *dev);
  resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);

  #ifdef CONFIG_HIBERNATE_CALLBACKS
  extern struct dev_pm_ops pcibios_pm_ops;
--
2.20.1

Re: [EXTERNAL] Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-30 Thread Shawn Anastasio





On 5/30/19 1:55 AM, Sam Bobroff wrote:

On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote:

On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio  wrote:


Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
  drivers/pci/pci.c   | 9 +++--
  include/linux/pci.h | 1 +
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
 return 0;
  }

+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
  static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
 p = resource_alignment_param;
 if (!*p && !align)
 goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
 align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
 goto out;
 }


I think the logic here is questionable to begin with. If the user has
explicitly requested re-aligning a resource via the command line then
we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...

IIRC Sam Bobroff was looking at hotplug under pseries recently so he
might have something to add. He's sick at the moment, but I'll ask him
to take a look at this once he's back among the living


There seems to be some code already in the kernel that will disable
PCI_PROBE_ONLY based on a device tree property, so I did a quick test
today and it seems to work. Only a trivial tweak is needed in QEMU to
do it (have spapr_dt_chosen() add a node called "linux,pci-probe-only"
with a value of 0), and that would allow us to set it only for QEMU (and
not PowerVM) if that's what we want to do. Is that useful?

(I haven't done any real testing yet but the guest booted up OK.)


It was my understanding that PCI_PROBE_ONLY should actually be set
initially so that Linux uses SLOF's BAR assignments. The issue here
is that PCI_PROBE_ONLY shouldn't be honored after initial bringup
on KVM so that hotplugged PCI devices can have custom BAR alignments.

Of course, if there's no need to honor SLOF's initial assignments,
I assume disabling PCI_PROBE_ONLY would work fine. In fact, I'm
not entirely sure why it's done in the first place. Does anybody
know?

If there is actually a valid reason for preserving SLOF's initial
assignments, then it seems like the correct solution is to disable
PCI_PROBE_ONLY after initial PCI bringup or ignore it in
pci_specified_resource_alignment() like I do in this patch set.

Bjorn Helgaas also suggested marking individual resources provided
by SLOF/PHYP with IORESOURCE_PCI_FIXED which would remove the need
to use PCI_PROBE_ONLY altogether.

Any thoughts?

- Shawn

Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-30 Thread Shawn Anastasio


On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:



On 28/05/2019 17:39, Shawn Anastasio wrote:



On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:



On 28/05/2019 15:36, Oliver wrote:

On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio 
wrote:


Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI
resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
   drivers/pci/pci.c   | 9 +++--
   include/linux/pci.h | 1 +
   2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak
pcibios_default_alignment(void)
  return 0;
   }

+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
   #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
   static char
resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
   static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t
pci_specified_resource_alignment(struct pci_dev *dev,
  p = resource_alignment_param;
  if (!*p && !align)
  goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
  align = 0;
-   pr_info_once("PCI: Ignoring requested alignments
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
  goto out;
  }


I think the logic here is questionable to begin with. If the user has
explicitly requested re-aligning a resource via the command line then
we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...



The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
guest which receives an event from qemu (RAS_EPOW from
/proc/interrupts), fetches device tree chunks (and as I understand it -
they come with BARs from phyp but without from qemu) and writes "1" to
"/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:


Interesting. Does this mean that the PHYP hotplug path doesn't
call pci_assign_resource?



I'd expect dlpar_add_slot() to be called under phyp and eventually
pci_device_add() which (I think) may or may not trigger later reassignment.



If so it means the patch may not
break that platform after all, though it still may not be
the correct way of doing things.



We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
that (unless resource_alignment= is used) the pseries guest should just
walk through all allocated resources and leave them unchanged.


If we add a pcibios_default_alignment() implementation like was
suggested earlier, then it will behave as if the user has
specified resource_alignment= by default and SLOF's assignments
won't be honored (I think).

I guess it boils down to one question - is it important that we
observe SLOF's initial BAR assignments? If not, the device tree
modification that Sam found would work fine here. Otherwise,
we need a way to honor the initial assignments from SLOF while
still allowing custom alignments for hotplugged devices, either
by deferring to the platform code like I do here, unsetting
PCI_PROBE_ONLY in certain cases or by using IORESOURCE_PCI_FIXED
like Bjorn suggested.





[c6e6f960] [c05f62d4] pci_assign_resource+0x44/0x360

[c6e6fa10] [c05f8b54]
assign_requested_resources_sorted+0x84/0x110
[c6e6fa60] [c05f9540]
__assign_resources_sorted+0xd0/0x750
[c6e6fb40] [c05fb2e0]
__pci_bus_assign_resources+0x80/0x280
[c6e6fc00] [c05fb95c]
pci_assign_unassigned_bus_resources+0xbc/0x100
[c6e6fc60] [c05e3d74] pci_rescan_bus+0x34/0x60

[c6e6fc90] [c05f1ef4] rescan_store+0x84/0xc0

Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-06-02 Thread Shawn Anastasio





On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:



On 31/05/2019 08:49, Shawn Anastasio wrote:

On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:



On 28/05/2019 17:39, Shawn Anastasio wrote:



On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:



On 28/05/2019 15:36, Oliver wrote:

On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio 
wrote:


Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI
resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
    drivers/pci/pci.c   | 9 +++--
    include/linux/pci.h | 1 +
    2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak
pcibios_default_alignment(void)
   return 0;
    }

+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
    #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
    static char
resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
    static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t
pci_specified_resource_alignment(struct pci_dev *dev,
   p = resource_alignment_param;
   if (!*p && !align)
   goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
   align = 0;
-   pr_info_once("PCI: Ignoring requested alignments
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
   goto out;
   }


I think the logic here is questionable to begin with. If the user has
explicitly requested re-aligning a resource via the command line then
we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...



The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
guest which receives an event from qemu (RAS_EPOW from
/proc/interrupts), fetches device tree chunks (and as I understand it -
they come with BARs from phyp but without from qemu) and writes "1" to
"/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:


Interesting. Does this mean that the PHYP hotplug path doesn't
call pci_assign_resource?



I'd expect dlpar_add_slot() to be called under phyp and eventually
pci_device_add() which (I think) may or may not trigger later
reassignment.



If so it means the patch may not
break that platform after all, though it still may not be
the correct way of doing things.



We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
that (unless resource_alignment= is used) the pseries guest should just
walk through all allocated resources and leave them unchanged.


If we add a pcibios_default_alignment() implementation like was
suggested earlier, then it will behave as if the user has
specified resource_alignment= by default and SLOF's assignments
won't be honored (I think).



I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
tried booting with and without pci=resource_alignment= and I can see no
difference - BARs are still aligned to 64K as programmed in SLOF; if I
hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
them unchanged.



I guess it boils down to one question - is it important that we
observe SLOF's initial BAR assignments?


It isn't if it's SLOF but it is if it's phyp. It used to not
allow/support BAR reassignment and even if it does not, I'd rather avoid
touching them.


A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which
worked, but if I add an implementation of pcibios_default_alignment
which simply returns PAGE_SIZE, my VM fails to boot and many errors
from the virtio disk driver are printed to the console.

After some investigation

Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-06-03 Thread Shawn Anastasio





On 6/3/19 3:35 AM, Alexey Kardashevskiy wrote:



On 03/06/2019 15:02, Alexey Kardashevskiy wrote:



On 03/06/2019 12:23, Shawn Anastasio wrote:



On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:



On 31/05/2019 08:49, Shawn Anastasio wrote:

On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:



On 28/05/2019 17:39, Shawn Anastasio wrote:



On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:



On 28/05/2019 15:36, Oliver wrote:

On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio 
wrote:


Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI
resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
     drivers/pci/pci.c   | 9 +++--
     include/linux/pci.h | 1 +
     2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak
pcibios_default_alignment(void)
    return 0;
     }

+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
     #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
     static char
resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
     static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t
pci_specified_resource_alignment(struct pci_dev *dev,
    p = resource_alignment_param;
    if (!*p && !align)
    goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
    align = 0;
-   pr_info_once("PCI: Ignoring requested alignments
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested
alignments\n");
    goto out;
    }


I think the logic here is questionable to begin with. If the user
has
explicitly requested re-aligning a resource via the command line
then
we should probably do it even if PCI_PROBE_ONLY is set. When it
breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor
(PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on
that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do
it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...



The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
guest which receives an event from qemu (RAS_EPOW from
/proc/interrupts), fetches device tree chunks (and as I understand
it -
they come with BARs from phyp but without from qemu) and writes
"1" to
"/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:


Interesting. Does this mean that the PHYP hotplug path doesn't
call pci_assign_resource?



I'd expect dlpar_add_slot() to be called under phyp and eventually
pci_device_add() which (I think) may or may not trigger later
reassignment.



If so it means the patch may not
break that platform after all, though it still may not be
the correct way of doing things.



We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
that (unless resource_alignment= is used) the pseries guest should just
walk through all allocated resources and leave them unchanged.


If we add a pcibios_default_alignment() implementation like was
suggested earlier, then it will behave as if the user has
specified resource_alignment= by default and SLOF's assignments
won't be honored (I think).



I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
tried booting with and without pci=resource_alignment= and I can see no
difference - BARs are still aligned to 64K as programmed in SLOF; if I
hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
them unchanged.



I guess it boils down to one question - is it important that we
observe SLOF's initial BAR assignments?


It isn't if it's SLOF but it is if it's phyp. It used to not
allow/support BAR reassignment and even if it does not, I'd rather avoid
touching them.


A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which
worked, but if I add an implementation of pcibios_defau

Re: [PATCH kernel] powerpc/pci/of: Fix OF flags parsing for 64bit BARs

2019-06-05 Thread Shawn Anastasio


On 6/4/19 10:38 PM, Alexey Kardashevskiy wrote:

When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.

The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.

The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.

With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.

This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).

Signed-off-by: Alexey Kardashevskiy 
---

This code is there for ages (from 200x) hence no "Fixes:".

Ideally I want to enforce /chosen/linux,pci-probe-only in QEMU as
at the moment:
- pci=resource_alignment= alone does not do anything;
- /chosen/linux,pci-probe-only alone does not cause the kernel to
reassign resources;
- pci=resource_alignment= with /chosen/linux,pci-probe-only is broken
anyway.

---
  arch/powerpc/kernel/pci_of_scan.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 24191ea2d9a7..64ad92016b63 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+   if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+   flags |= IORESOURCE_MEM_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH



I have confirmed that this fixes the case with PCI_PROBE_ONLY
disabled and a ppc_md.pcibios_default_alignment implementation that
returns PAGE_SIZE.

Reviewed-by: Shawn Anastasio

Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-05 Thread Shawn Anastasio


On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:

This is an attempt to allow DMA masks between 32..59 which are not large
enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
on the max order, up to 40 is usually available.


This is based on v5.2-rc2.

Please comment. Thanks.


I have tested this patch set with an AMD GPU that's limited to <64bit
DMA (I believe it's 40 or 42 bit). It successfully allows the card to
operate without falling back to 32-bit DMA mode as it does without
the patches.

Relevant kernel log message:
```
[0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass
```

Tested-by: Shawn Anastasio

Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-11 Thread Shawn Anastasio


On 6/5/19 11:11 PM, Shawn Anastasio wrote:

On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:

This is an attempt to allow DMA masks between 32..59 which are not large
enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
on the max order, up to 40 is usually available.


This is based on v5.2-rc2.

Please comment. Thanks.


I have tested this patch set with an AMD GPU that's limited to <64bit
DMA (I believe it's 40 or 42 bit). It successfully allows the card to
operate without falling back to 32-bit DMA mode as it does without
the patches.

Relevant kernel log message:
```
[    0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass
```

Tested-by: Shawn Anastasio 


After a few days of further testing, I've started to run into stability
issues with the patch applied and used with an AMD GPU. Specifically,
the system sometimes spontaneously crashes. Not just EEH errors either,
the whole system shuts down in what looks like a checkstop.

Perhaps some subtle corruption is occurring?

Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-12 Thread Shawn Anastasio


On 6/12/19 1:16 AM, Oliver O'Halloran wrote:

On Wed, Jun 12, 2019 at 3:06 PM Shawn Anastasio  wrote:


On 6/5/19 11:11 PM, Shawn Anastasio wrote:

On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:

This is an attempt to allow DMA masks between 32..59 which are not large
enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
on the max order, up to 40 is usually available.


This is based on v5.2-rc2.

Please comment. Thanks.


I have tested this patch set with an AMD GPU that's limited to <64bit
DMA (I believe it's 40 or 42 bit). It successfully allows the card to
operate without falling back to 32-bit DMA mode as it does without
the patches.

Relevant kernel log message:
```
[0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass
```

Tested-by: Shawn Anastasio 


After a few days of further testing, I've started to run into stability
issues with the patch applied and used with an AMD GPU. Specifically,
the system sometimes spontaneously crashes. Not just EEH errors either,
the whole system shuts down in what looks like a checkstop.


Any specific workload? Checkstops are harder to debug without a system
in the failed state so we'd need to replicate that locally to get a
decent idea what's up.


I haven't been able to pinpoint the exact cause. The first time it
happened was after about 4 days of uptime while playing a 1080p
video in mpv. The second time was about 5 minutes after booting up
while restoring a firefox session.

Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-12 Thread Shawn Anastasio


On 6/12/19 2:07 AM, Alexey Kardashevskiy wrote:



On 12/06/2019 15:05, Shawn Anastasio wrote:

On 6/5/19 11:11 PM, Shawn Anastasio wrote:

On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:

This is an attempt to allow DMA masks between 32..59 which are not large
enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
on the max order, up to 40 is usually available.


This is based on v5.2-rc2.

Please comment. Thanks.


I have tested this patch set with an AMD GPU that's limited to <64bit
DMA (I believe it's 40 or 42 bit). It successfully allows the card to
operate without falling back to 32-bit DMA mode as it does without
the patches.

Relevant kernel log message:
```
[    0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass
```

Tested-by: Shawn Anastasio 


After a few days of further testing, I've started to run into stability
issues with the patch applied and used with an AMD GPU. Specifically,
the system sometimes spontaneously crashes. Not just EEH errors either,
the whole system shuts down in what looks like a checkstop.

Perhaps some subtle corruption is occurring?


Have you tried this?

https://patchwork.ozlabs.org/patch/1113506/


I have not. I'll give it a shot and try it out for a few days to see
if I'm able to reproduce the crashes.

Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-17 Thread Shawn Anastasio


On 6/12/19 2:15 PM, Shawn Anastasio wrote:

On 6/12/19 2:07 AM, Alexey Kardashevskiy wrote:



On 12/06/2019 15:05, Shawn Anastasio wrote:

On 6/5/19 11:11 PM, Shawn Anastasio wrote:

On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:
This is an attempt to allow DMA masks between 32..59 which are not 
large

enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
on the max order, up to 40 is usually available.


This is based on v5.2-rc2.

Please comment. Thanks.


I have tested this patch set with an AMD GPU that's limited to <64bit
DMA (I believe it's 40 or 42 bit). It successfully allows the card to
operate without falling back to 32-bit DMA mode as it does without
the patches.

Relevant kernel log message:
```
[    0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass
```

Tested-by: Shawn Anastasio 


After a few days of further testing, I've started to run into stability
issues with the patch applied and used with an AMD GPU. Specifically,
the system sometimes spontaneously crashes. Not just EEH errors either,
the whole system shuts down in what looks like a checkstop.

Perhaps some subtle corruption is occurring?


Have you tried this?

https://patchwork.ozlabs.org/patch/1113506/


I have not. I'll give it a shot and try it out for a few days to see
if I'm able to reproduce the crashes.


A few days later and I was able to reproduce the checkstop while
watching a video in mpv. At this point the system had ~4 day
uptime and this wasn't the first video I watched during that time.

This is with https://patchwork.ozlabs.org/patch/1113506/ applied, too.

Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-18 Thread Shawn Anastasio

On 6/18/19 1:39 AM, Alexey Kardashevskiy wrote:

On 18/06/2019 14:26, Shawn Anastasio wrote:

On 6/12/19 2:15 PM, Shawn Anastasio wrote:

On 6/12/19 2:07 AM, Alexey Kardashevskiy wrote:

On 12/06/2019 15:05, Shawn Anastasio wrote:

On 6/5/19 11:11 PM, Shawn Anastasio wrote:

On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:

This is an attempt to allow DMA masks between 32..59 which are not
large
enough to use either a PHB3 bypass mode or a sketchy bypass.
Depending
on the max order, up to 40 is usually available.

This is based on v5.2-rc2.

Please comment. Thanks.

I have tested this patch set with an AMD GPU that's limited to <64bit
DMA (I believe it's 40 or 42 bit). It successfully allows the card to
operate without falling back to 32-bit DMA mode as it does without
the patches.

Relevant kernel log message:
```
[    0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass
```

Tested-by: Shawn Anastasio 

After a few days of further testing, I've started to run into stability
issues with the patch applied and used with an AMD GPU. Specifically,
the system sometimes spontaneously crashes. Not just EEH errors either,
the whole system shuts down in what looks like a checkstop.

Perhaps some subtle corruption is occurring?

Have you tried this?

https://patchwork.ozlabs.org/patch/1113506/

I have not. I'll give it a shot and try it out for a few days to see
if I'm able to reproduce the crashes.

A few days later and I was able to reproduce the checkstop while
watching a video in mpv. At this point the system had ~4 day
uptime and this wasn't the first video I watched during that time.

This is with https://patchwork.ozlabs.org/patch/1113506/ applied, too.

Any logs left? What was the reason for the checkstop and what is the
hardware? "lscpu" and "lspci -vv" for the starter would help. Thanks,

The machine is a Talos II with 2x 8 core DD2.2 Sforza modules.
I've added the output of lscpu and lspci below. As for logs,
it doesn't seem there are any kernel logs of the event.
The opal-gard utility shows some error records which I have
also included below.

opal-gard:
```
$ sudo ./opal-gard show 1
Record ID:0x0001

Error ID: 0x900b
Error Type:   Fatal (0xe3)
Path Type: physical
>Sys, Instance #0
 >Node, Instance #0
  >Proc, Instance #1
   >EQ, Instance #0
>EX, Instance #0

$ sudo ./opal-gard show 2
Record ID:0x0002

Error ID: 0x9021
Error Type:   Fatal (0xe3)
Path Type: physical
>Sys, Instance #0
 >Node, Instance #0
  >Proc, Instance #1
   >EQ, Instance #2
>EX, Instance #1

```

lscpu:
```
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  52
On-line CPU(s) list: 0-3,8-31,36-47,52-63
Thread(s) per core:  4
Core(s) per socket:  6
Socket(s):   2
NUMA node(s):2
Model:   2.2 (pvr 004e 1202)
Model name:  POWER9, altivec supported
CPU max MHz: 3800.
CPU min MHz: 2154.
L1d cache:   32K
L1i cache:   32K
L2 cache:512K
L3 cache:10240K
NUMA node0 CPU(s):   0-3,8-31
NUMA node8 CPU(s):   36-47,52-63

```

lspci -vv:
Output at: https://upaste.anastas.io/IwVQzt

Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support

2020-08-17 Thread Shawn Anastasio


I'm a bit concerned about the removal of PROT_SAO.

From what I can see, a feature like this would be extremely useful for
emulating architectures with stronger memory models. QEMU's multi-
threaded TCG project in particular looks like it would be a good
candidate, since as far as I'm aware it is currently completely
unable to perform strong-on-weak emulation.

Without hardware support like SAO provides, the only way I could see
to achieve this would be by emitting tons of unnecessary and costly
memory barrier instructions.

I understand that ISA 3.1 and POWER10 have dropped SAO, but as a POWER9
user it seems a bit silly to have a potentially useful feature dropped
from the kernel just because a future processor doesn't support it.

Curious to hear more thoughts on this.

Regards,
Shawn

On 6/7/20 7:02 AM, Nicholas Piggin wrote:

ISA v3.1 does not support the SAO storage control attribute required to
implement PROT_SAO. PROT_SAO was used by specialised system software
(Lx86) that has been discontinued for about 7 years, and is not thought
to be used elsewhere, so removal should not cause problems.

We rather remove it than keep support for older processors, because
live migrating guest partitions to newer processors may not be possible
if SAO is in use.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/book3s/64/pgtable.h  |  8 ++--
  arch/powerpc/include/asm/cputable.h   |  9 ++--
  arch/powerpc/include/asm/kvm_book3s_64.h  |  3 +-
  arch/powerpc/include/asm/mman.h   | 24 +++
  arch/powerpc/include/asm/nohash/64/pgtable.h  |  2 -
  arch/powerpc/kernel/dt_cpu_ftrs.c |  2 +-
  arch/powerpc/mm/book3s64/hash_utils.c |  2 -
  include/linux/mm.h|  2 -
  include/trace/events/mmflags.h|  2 -
  mm/ksm.c  |  4 --
  tools/testing/selftests/powerpc/mm/.gitignore |  1 -
  tools/testing/selftests/powerpc/mm/Makefile   |  4 +-
  tools/testing/selftests/powerpc/mm/prot_sao.c | 42 ---
  13 files changed, 18 insertions(+), 87 deletions(-)
  delete mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index f17442c3a092..d9e92586f8dc 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -20,9 +20,13 @@
  #define _PAGE_RW  (_PAGE_READ | _PAGE_WRITE)
  #define _PAGE_RWX (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
  #define _PAGE_PRIVILEGED  0x8 /* kernel access only */
-#define _PAGE_SAO  0x00010 /* Strong access order */
+
+#define _PAGE_CACHE_CTL0x00030 /* Bits for the folowing cache 
modes */
+   /*  No bits set is normal cacheable memory */
+   /*  0x00010 unused, is SAO bit on radix POWER9 */
  #define _PAGE_NON_IDEMPOTENT  0x00020 /* non idempotent memory */
  #define _PAGE_TOLERANT0x00030 /* tolerant memory, cache 
inhibited */
+
  #define _PAGE_DIRTY   0x00080 /* C: page changed */
  #define _PAGE_ACCESSED0x00100 /* R: page referenced */
  /*
@@ -825,8 +829,6 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
  }
  
-#define _PAGE_CACHE_CTL	(_PAGE_SAO | _PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT)

-
  #define pgprot_noncached pgprot_noncached
  static inline pgprot_t pgprot_noncached(pgprot_t prot)
  {
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index bac2252c839e..c7e923ba 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -191,7 +191,6 @@ static inline void cpu_feature_keys_init(void) { }
  #define CPU_FTR_SPURR LONG_ASM_CONST(0x0100)
  #define CPU_FTR_DSCR  LONG_ASM_CONST(0x0200)
  #define CPU_FTR_VSX   LONG_ASM_CONST(0x0400)
-#define CPU_FTR_SAOLONG_ASM_CONST(0x0800)
  #define CPU_FTR_CP_USE_DCBTZ  LONG_ASM_CONST(0x1000)
  #define CPU_FTR_UNALIGNED_LD_STD  LONG_ASM_CONST(0x2000)
  #define CPU_FTR_ASYM_SMT  LONG_ASM_CONST(0x4000)
@@ -435,7 +434,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
+   CPU_FTR_DSCR | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | \
CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX | CPU_FTR_PKEY)
@@ -444,7 +443,7 @@ static inline void cpu_feature_

Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support

2020-08-18 Thread Shawn Anastasio


On 8/18/20 2:11 AM, Nicholas Piggin wrote> Very reasonable point.


The problem we're trying to get a handle on is live partition migration
where a running guest might be using SAO then get migrated to a P10. I
don't think we have a good way to handle this case. Potentially the
hypervisor could revoke the page tables if the guest is running in hash
mode and the guest kernel could be taught about that and sigbus the
process, but in radix the guest controls those page tables and the SAO
state and I don't think there's a way to cause it to take a fault.

I also don't know what the proprietary hypervisor does here.

We could add it back, default to n, or make it bare metal only, or
somehow try to block live migration to a later CPU without the faciliy.
I wouldn't be against that.



Admittedly I'm not too familiar with the specifics of live migration
or guest memory management, but restoring the functionality and adding
a way to prevent migration of SAO-using guests seems like a reasonable
choice to me. Would this be done with help from the guest using some
sort of infrastructure to signal to the hypervisor that SAO is in use,
or entirely on the hypervisor by e.g. scanning the through the process
table for SAO pages?


It would be very interesting to know how it performs in such a "real"
situation. I don't know how well POWER9 has optimised it -- it's
possible that it's not much better than putting lwsync after every load
or store.



This is definitely worth investigating in depth. That said, even if the
performance on P9 isn't super great, I think the feature could still be
useful, since it would offer more granularity than the sledgehammer
approach of emitting lwsync everywhere.

I'd be happy to put in some of the work required to get this to a point
where it can be reintroduced without breaking guest migration - I'd just
need some pointers on getting started with whatever approach is decided on.

Thanks,
Shawn

[PATCH 0/2] Reintroduce PROT_SAO

2020-08-20 Thread Shawn Anastasio

This set re-introduces the PROT_SAO prot flag removed in
Commit 5c9fa16e8abd ("powerpc/64s: Remove PROT_SAO support").

To address concerns regarding live migration of guests using SAO
to P10 hosts without SAO support, the flag is disabled by default
in LPARs. A new config option, PPC_PROT_SAO_LPAR was added to
allow users to explicitly enable it if they will not be running
in an environment where this is a conern.

Shawn Anastasio (2):
  Revert "powerpc/64s: Remove PROT_SAO support"
  powerpc/64s: Disallow PROT_SAO in LPARs by default

 arch/powerpc/Kconfig  | 12 ++
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  8 ++--
 arch/powerpc/include/asm/cputable.h   | 10 ++---
 arch/powerpc/include/asm/mman.h   | 31 --
 arch/powerpc/include/asm/nohash/64/pgtable.h  |  2 +
 arch/powerpc/include/uapi/asm/mman.h  |  2 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  2 +-
 arch/powerpc/mm/book3s64/hash_utils.c |  2 +
 include/linux/mm.h|  2 +
 include/trace/events/mmflags.h|  2 +
 mm/ksm.c  |  4 ++
 tools/testing/selftests/powerpc/mm/.gitignore |  1 +
 tools/testing/selftests/powerpc/mm/Makefile   |  4 +-
 tools/testing/selftests/powerpc/mm/prot_sao.c | 42 +++
 14 files changed, 107 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c

-- 
2.28.0

[PATCH 1/2] Revert "powerpc/64s: Remove PROT_SAO support"

2020-08-20 Thread Shawn Anastasio

This reverts commit 5c9fa16e8abd342ce04dc830c1ebb2a03abf6c05.

Since PROT_SAO can still be useful for certain classes of software,
reintroduce it. Concerns about guest migration for LPARs using SAO
will be addressed next.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  8 ++--
 arch/powerpc/include/asm/cputable.h   | 10 ++---
 arch/powerpc/include/asm/mman.h   | 26 ++--
 arch/powerpc/include/asm/nohash/64/pgtable.h  |  2 +
 arch/powerpc/include/uapi/asm/mman.h  |  2 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  2 +-
 arch/powerpc/mm/book3s64/hash_utils.c |  2 +
 include/linux/mm.h|  2 +
 include/trace/events/mmflags.h|  2 +
 mm/ksm.c  |  4 ++
 tools/testing/selftests/powerpc/mm/.gitignore |  1 +
 tools/testing/selftests/powerpc/mm/Makefile   |  4 +-
 tools/testing/selftests/powerpc/mm/prot_sao.c | 42 +++
 13 files changed, 90 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6de56c3b33c4..495fc0ccb453 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -20,13 +20,9 @@
 #define _PAGE_RW   (_PAGE_READ | _PAGE_WRITE)
 #define _PAGE_RWX  (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 #define _PAGE_PRIVILEGED   0x8 /* kernel access only */
-
-#define _PAGE_CACHE_CTL0x00030 /* Bits for the folowing cache 
modes */
-   /*  No bits set is normal cacheable memory */
-   /*  0x00010 unused, is SAO bit on radix POWER9 */
+#define _PAGE_SAO  0x00010 /* Strong access order */
 #define _PAGE_NON_IDEMPOTENT   0x00020 /* non idempotent memory */
 #define _PAGE_TOLERANT 0x00030 /* tolerant memory, cache inhibited */
-
 #define _PAGE_DIRTY0x00080 /* C: page changed */
 #define _PAGE_ACCESSED 0x00100 /* R: page referenced */
 /*
@@ -828,6 +824,8 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
 }
 
+#define _PAGE_CACHE_CTL(_PAGE_SAO | _PAGE_NON_IDEMPOTENT | 
_PAGE_TOLERANT)
+
 #define pgprot_noncached pgprot_noncached
 static inline pgprot_t pgprot_noncached(pgprot_t prot)
 {
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index fdddb822d564..f89205eff691 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -191,7 +191,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_SPURR  LONG_ASM_CONST(0x0100)
 #define CPU_FTR_DSCR   LONG_ASM_CONST(0x0200)
 #define CPU_FTR_VSXLONG_ASM_CONST(0x0400)
-// Free
LONG_ASM_CONST(0x0800)
+#define CPU_FTR_SAOLONG_ASM_CONST(0x0800)
 #define CPU_FTR_CP_USE_DCBTZ   LONG_ASM_CONST(0x1000)
 #define CPU_FTR_UNALIGNED_LD_STD   LONG_ASM_CONST(0x2000)
 #define CPU_FTR_ASYM_SMT   LONG_ASM_CONST(0x4000)
@@ -436,7 +436,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | CPU_FTR_ASYM_SMT | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | \
CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX )
@@ -445,7 +445,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
@@ -456,7 +456,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
@@ -474,7 +474,7 @@ static inline void

[PATCH 2/2] powerpc/64s: Disallow PROT_SAO in LPARs by default

2020-08-20 Thread Shawn Anastasio

Since migration of guests using SAO to ISA 3.1 hosts may cause issues,
disable PROT_SAO in LPARs by default and introduce a new Kconfig option
PPC_PROT_SAO_LPAR to allow users to enable it if desired.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/Kconfig| 12 
 arch/powerpc/include/asm/mman.h |  9 +++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce9..65bed1fdeaad 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -860,6 +860,18 @@ config PPC_SUBPAGE_PROT
 
  If unsure, say N here.
 
+config PPC_PROT_SAO_LPAR
+   bool "Support PROT_SAO mappings in LPARs"
+   depends on PPC_BOOK3S_64
+   help
+ This option adds support for PROT_SAO mappings from userspace
+ inside LPARs on supported CPUs.
+
+ This may cause issues when performing guest migration from
+ a CPU that supports SAO to one that does not.
+
+ If unsure, say N here.
+
 config PPC_COPRO_BASE
bool
 
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 4ba303ea27f5..7cb6d18f5cd6 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -40,8 +40,13 @@ static inline bool arch_validate_prot(unsigned long prot, 
unsigned long addr)
 {
if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
return false;
-   if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO))
-   return false;
+   if (prot & PROT_SAO) {
+   if (!cpu_has_feature(CPU_FTR_SAO))
+   return false;
+   if (firmware_has_feature(FW_FEATURE_LPAR) &&
+   !IS_ENABLED(CONFIG_PPC_PROT_SAO_LPAR))
+   return false;
+   }
return true;
 }
 #define arch_validate_prot arch_validate_prot
-- 
2.28.0

Re: [PATCH 2/2] powerpc/64s: Disallow PROT_SAO in LPARs by default

2020-08-21 Thread Shawn Anastasio

On 8/21/20 5:37 AM, Nicholas Piggin wrote:> I think this should be okay. 
Could you also update the selftest to skip

if we have PPC_FEATURE2_ARCH_3_1 set?


Sure. I'll send out a v2 shortly with another patch for this.


Thanks,
Nick


Thanks,
Shawn

[PATCH v2 0/3] Reintroduce PROT_SAO

2020-08-21 Thread Shawn Anastasio

Changes in v2:
- Update prot_sao selftest to skip ISA 3.1

This set re-introduces the PROT_SAO prot flag removed in
Commit 5c9fa16e8abd ("powerpc/64s: Remove PROT_SAO support").

To address concerns regarding live migration of guests using SAO
to P10 hosts without SAO support, the flag is disabled by default
in LPARs. A new config option, PPC_PROT_SAO_LPAR was added to
allow users to explicitly enable it if they will not be running
in an environment where this is a conern.

Shawn Anastasio (3):
  Revert "powerpc/64s: Remove PROT_SAO support"
  powerpc/64s: Disallow PROT_SAO in LPARs by default
  selftests/powerpc: Update PROT_SAO test to skip ISA 3.1

 arch/powerpc/Kconfig  | 12 ++
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  8 ++--
 arch/powerpc/include/asm/cputable.h   | 10 ++---
 arch/powerpc/include/asm/mman.h   | 31 +++--
 arch/powerpc/include/asm/nohash/64/pgtable.h  |  2 +
 arch/powerpc/include/uapi/asm/mman.h  |  2 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  2 +-
 arch/powerpc/mm/book3s64/hash_utils.c |  2 +
 include/linux/mm.h|  2 +
 include/trace/events/mmflags.h|  2 +
 mm/ksm.c  |  4 ++
 tools/testing/selftests/powerpc/mm/.gitignore |  1 +
 tools/testing/selftests/powerpc/mm/Makefile   |  4 +-
 tools/testing/selftests/powerpc/mm/prot_sao.c | 43 +++
 14 files changed, 108 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c

-- 
2.28.0

[PATCH v2 2/3] powerpc/64s: Disallow PROT_SAO in LPARs by default

2020-08-21 Thread Shawn Anastasio

Since migration of guests using SAO to ISA 3.1 hosts may cause issues,
disable PROT_SAO in LPARs by default and introduce a new Kconfig option
PPC_PROT_SAO_LPAR to allow users to enable it if desired.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/Kconfig| 12 
 arch/powerpc/include/asm/mman.h |  9 +++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce9..65bed1fdeaad 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -860,6 +860,18 @@ config PPC_SUBPAGE_PROT
 
  If unsure, say N here.
 
+config PPC_PROT_SAO_LPAR
+   bool "Support PROT_SAO mappings in LPARs"
+   depends on PPC_BOOK3S_64
+   help
+ This option adds support for PROT_SAO mappings from userspace
+ inside LPARs on supported CPUs.
+
+ This may cause issues when performing guest migration from
+ a CPU that supports SAO to one that does not.
+
+ If unsure, say N here.
+
 config PPC_COPRO_BASE
bool
 
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 4ba303ea27f5..7cb6d18f5cd6 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -40,8 +40,13 @@ static inline bool arch_validate_prot(unsigned long prot, 
unsigned long addr)
 {
if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
return false;
-   if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO))
-   return false;
+   if (prot & PROT_SAO) {
+   if (!cpu_has_feature(CPU_FTR_SAO))
+   return false;
+   if (firmware_has_feature(FW_FEATURE_LPAR) &&
+   !IS_ENABLED(CONFIG_PPC_PROT_SAO_LPAR))
+   return false;
+   }
return true;
 }
 #define arch_validate_prot arch_validate_prot
-- 
2.28.0

[PATCH v2 3/3] selftests/powerpc: Update PROT_SAO test to skip ISA 3.1

2020-08-21 Thread Shawn Anastasio

Since SAO support was removed from ISA 3.1, skip the
prot_sao test if PPC_FEATURE2_ARCH_3_1 is set.

Signed-off-by: Shawn Anastasio 
---
 tools/testing/selftests/powerpc/mm/prot_sao.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/mm/prot_sao.c 
b/tools/testing/selftests/powerpc/mm/prot_sao.c
index e2eed65b7735..e0cf8ebbf8cd 100644
--- a/tools/testing/selftests/powerpc/mm/prot_sao.c
+++ b/tools/testing/selftests/powerpc/mm/prot_sao.c
@@ -18,8 +18,9 @@ int test_prot_sao(void)
 {
char *p;
 
-   /* 2.06 or later should support SAO */
-   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+   /* SAO was introduced in 2.06 and removed in 3.1 */
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06) ||
+   have_hwcap2(PPC_FEATURE2_ARCH_3_1));
 
/*
 * Ensure we can ask for PROT_SAO.
-- 
2.28.0

[PATCH v2 1/3] Revert "powerpc/64s: Remove PROT_SAO support"

2020-08-21 Thread Shawn Anastasio

This reverts commit 5c9fa16e8abd342ce04dc830c1ebb2a03abf6c05.

Since PROT_SAO can still be useful for certain classes of software,
reintroduce it. Concerns about guest migration for LPARs using SAO
will be addressed next.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  8 ++--
 arch/powerpc/include/asm/cputable.h   | 10 ++---
 arch/powerpc/include/asm/mman.h   | 26 ++--
 arch/powerpc/include/asm/nohash/64/pgtable.h  |  2 +
 arch/powerpc/include/uapi/asm/mman.h  |  2 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  2 +-
 arch/powerpc/mm/book3s64/hash_utils.c |  2 +
 include/linux/mm.h|  2 +
 include/trace/events/mmflags.h|  2 +
 mm/ksm.c  |  4 ++
 tools/testing/selftests/powerpc/mm/.gitignore |  1 +
 tools/testing/selftests/powerpc/mm/Makefile   |  4 +-
 tools/testing/selftests/powerpc/mm/prot_sao.c | 42 +++
 13 files changed, 90 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6de56c3b33c4..495fc0ccb453 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -20,13 +20,9 @@
 #define _PAGE_RW   (_PAGE_READ | _PAGE_WRITE)
 #define _PAGE_RWX  (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 #define _PAGE_PRIVILEGED   0x8 /* kernel access only */
-
-#define _PAGE_CACHE_CTL0x00030 /* Bits for the folowing cache 
modes */
-   /*  No bits set is normal cacheable memory */
-   /*  0x00010 unused, is SAO bit on radix POWER9 */
+#define _PAGE_SAO  0x00010 /* Strong access order */
 #define _PAGE_NON_IDEMPOTENT   0x00020 /* non idempotent memory */
 #define _PAGE_TOLERANT 0x00030 /* tolerant memory, cache inhibited */
-
 #define _PAGE_DIRTY0x00080 /* C: page changed */
 #define _PAGE_ACCESSED 0x00100 /* R: page referenced */
 /*
@@ -828,6 +824,8 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
 }
 
+#define _PAGE_CACHE_CTL(_PAGE_SAO | _PAGE_NON_IDEMPOTENT | 
_PAGE_TOLERANT)
+
 #define pgprot_noncached pgprot_noncached
 static inline pgprot_t pgprot_noncached(pgprot_t prot)
 {
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index fdddb822d564..f89205eff691 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -191,7 +191,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_SPURR  LONG_ASM_CONST(0x0100)
 #define CPU_FTR_DSCR   LONG_ASM_CONST(0x0200)
 #define CPU_FTR_VSXLONG_ASM_CONST(0x0400)
-// Free
LONG_ASM_CONST(0x0800)
+#define CPU_FTR_SAOLONG_ASM_CONST(0x0800)
 #define CPU_FTR_CP_USE_DCBTZ   LONG_ASM_CONST(0x1000)
 #define CPU_FTR_UNALIGNED_LD_STD   LONG_ASM_CONST(0x2000)
 #define CPU_FTR_ASYM_SMT   LONG_ASM_CONST(0x4000)
@@ -436,7 +436,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | CPU_FTR_ASYM_SMT | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | \
CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX )
@@ -445,7 +445,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
@@ -456,7 +456,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
@@ -474,7 +474,7 @@ static inline void

[PATCH] powerpc/pseries: Add pcibios_default_alignment implementation

2020-08-21 Thread Shawn Anastasio

Implement pcibios_default_alignment for pseries so that
resources are page-aligned. The main benefit of this is being
able to map any resource from userspace via mechanisms like VFIO.

This is identical to powernv's implementation.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/platforms/pseries/pci.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pci.c 
b/arch/powerpc/platforms/pseries/pci.c
index 911534b89c85..6d922c096354 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -210,6 +210,11 @@ int pseries_pcibios_sriov_disable(struct pci_dev *pdev)
 }
 #endif
 
+static resource_size_t pseries_pcibios_default_alignment(void)
+{
+   return PAGE_SIZE;
+}
+
 static void __init pSeries_request_regions(void)
 {
if (!isa_io_base)
@@ -231,6 +236,8 @@ void __init pSeries_final_fixup(void)
 
eeh_show_enabled();
 
+   ppc_md.pcibios_default_alignment = pseries_pcibios_default_alignment;
+
 #ifdef CONFIG_PCI_IOV
ppc_md.pcibios_sriov_enable = pseries_pcibios_sriov_enable;
ppc_md.pcibios_sriov_disable = pseries_pcibios_sriov_disable;
-- 
2.28.0

63 matches

Mail list logo