Re: NXP P50XX/e5500: SMP doesn't work anymore with the latest Git kernel

2018-11-06 Thread Christian Zigotzky

On 07 November 2018 at 03:11AM, Scott Wood wrote:

On Sun, 2018-10-28 at 17:35 +0100, Christian Zigotzky wrote:

Hello,

SMP doesn't work anymore with the latest Git kernel (28/10/18 11:12AM
GMT) on my P5020 board and on virtual e5500 QEMU machines.

Board with P5020 dual core CPU:

[0.00] -
[0.00] phys_mem_size = 0x2
[0.00] dcache_bsize  = 0x40
[0.00] icache_bsize  = 0x40
[0.00] cpu_features  = 0x0003008003b4
[0.00]   possible= 0x0003009003b4
[0.00]   always  = 0x0003008003b4
[0.00] cpu_user_features = 0xcc008000 0x0800
[0.00] mmu_features  = 0x000a0010
[0.00] firmware_features = 0x
[0.00] -
[0.00] CoreNet Generic board

  ...

[0.002161] smp: Bringing up secondary CPUs ...
[0.002339] No cpu-release-addr for cpu 1
[0.002347] smp: failed starting cpu 1 (rc -2)
[0.002401] smp: Brought up 1 node, 1 CPU

Virtual e5500 quad core QEMU machine:

[0.026394] smp: Bringing up secondary CPUs ...
[0.027831] No cpu-release-addr for cpu 1
[0.027989] smp: failed starting cpu 1 (rc -2)
[0.030143] No cpu-release-addr for cpu 2
[0.030304] smp: failed starting cpu 2 (rc -2)
[0.032400] No cpu-release-addr for cpu 3
[0.032533] smp: failed starting cpu 3 (rc -2)
[0.033117] smp: Brought up 1 node, 1 CPU

QEMU command: ./qemu-system-ppc64 -M ppce500 -cpu e5500 -m 2048 -kernel
/home/christian/Downloads/vmlinux-4.20-alpha4-
AmigaOne_X1000_X5000/X5000_and_QEMU_e5500/uImage-4.20
-drive
format=raw,file=/home/christian/Downloads/MATE_PowerPC_Remix_2017_0.9.img,in
dex=0,if=virtio
-nic user,model=e1000 -append "rw root=/dev/vda" -device virtio-vga
-device virtio-mouse-pci -device virtio-keyboard-pci -usb -soundhw
es1370 -smp 4

.config:

...
CONFIG_SMP=y
CONFIG_NR_CPUS=4
...

Please test the latest Git kernel on your NXP P50XX boards.

I wasn't able to reproduce this with top-of-tree (commit 8053e5b93eca9b) with
either qemu (e5500 on ppce500) or a hardware t4240.

-Scott



Hi Scott,


Thanks for testing but your test was too late because Rob Herring fixed 
this issue last week. [1]



Cheers,

Christian


[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c961cb3be9064d1097ccc019390f8b5739daafc6




[PATCH] powerpc/io: Fix the IO workarounds code to work with Radix

2018-11-06 Thread Michael Ellerman
Back in 2006 Ben added some workarounds for a misbehaviour in the
Spider IO bridge used on early Cell machines, see commit
014da7ff47b5 ("[POWERPC] Cell "Spider" MMIO workarounds"). Later these
were made to be generic, ie. not tied specifically to Spider.

The code stashes a token in the high bits (59-48) of virtual addresses
used for IO (eg. returned from ioremap()). This works fine when using
the Hash MMU, but when we're using the Radix MMU the bits used for the
token overlap with some of the bits of the virtual address.

This is because the maximum virtual address is larger with Radix, up
to c00f, and in fact we use that high part of the address
range for ioremap(), see RADIX_KERN_IO_START.

As it happens the bits that are used overlap with the bits that
differentiate an IO address vs a linear map address. If the resulting
address lies outside the linear mapping we will crash (see below), if
not we just corrupt memory.

  virtio-pci :00:00.0: Using 64-bit direct DMA at offset 800
  Unable to handle kernel paging request for data at address 0xc0008014
  ...
  CFAR: c0626b98 DAR: c0008014 DSISR: 4200 IRQMASK: 0
  GPR00: c06c54fc c0003e523378 c16de600 
  GPR04: c00c8014 0007 0fff000a 0030
 
  ...
  NIP [c0626c5c] .iowrite8+0xec/0x100
  LR [c06c992c] .vp_reset+0x2c/0x90
  Call Trace:
.pci_bus_read_config_dword+0xc4/0x120 (unreliable)
.register_virtio_device+0x13c/0x1c0
.virtio_pci_probe+0x148/0x1f0
.local_pci_probe+0x68/0x140
.pci_device_probe+0x164/0x220
.really_probe+0x274/0x3b0
.driver_probe_device+0x80/0x170
.__driver_attach+0x14c/0x150
.bus_for_each_dev+0xb8/0x130
.driver_attach+0x34/0x50
.bus_add_driver+0x178/0x2f0
.driver_register+0x90/0x1a0
.__pci_register_driver+0x6c/0x90
.virtio_pci_driver_init+0x2c/0x40
.do_one_initcall+0x64/0x280
.kernel_init_freeable+0x36c/0x474
.kernel_init+0x24/0x160
.ret_from_kernel_thread+0x58/0x7c

This hasn't been a problem because CONFIG_PPC_IO_WORKAROUNDS which
enables this code is usually not enabled. It is only enabled when it's
selected by PPC_CELL_NATIVE which is only selected by
PPC_IBM_CELL_BLADE and that in turn depends on BIG_ENDIAN. So in order
to hit the bug you need to build a big endian kernel, with IBM Cell
Blade support enabled, as well as Radix MMU support, and then boot
that on Power9 using Radix MMU.

Still we can fix the bug, so let's do that. We simply use fewer bits
for the token, taking the union of the restrictions on the address
from both Hash and Radix, we end up with 8 bits we can use for the
token. The only user of the token is iowa_mem_find_bus() which only
supports 8 token values, so 8 bits is plenty for that.

Fixes: 566ca99af026 ("powerpc/mm/radix: Add dummy radix_enabled()")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/io.h | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 3ef40b703c4a..e746becd9d6f 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -268,19 +268,13 @@ extern void _memcpy_toio(volatile void __iomem *dest, 
const void *src,
  * their hooks, a bitfield is reserved for use by the platform near the
  * top of MMIO addresses (not PIO, those have to cope the hard way).
  *
- * This bit field is 12 bits and is at the top of the IO virtual
- * addresses PCI_IO_INDIRECT_TOKEN_MASK.
+ * The highest address in the kernel virtual space are:
  *
- * The kernel virtual space is thus:
+ *  d0003fff   # with Hash MMU
+ *  c00f   # with Radix MMU
  *
- *  0xD000 : vmalloc
- *  0xD800 : PCI PHB IO space
- *  0xD8008000 : ioremap
- *  0xDfff : end of ioremap region
- *
- * Since the top 4 bits are reserved as the region ID, we use thus
- * the next 12 bits and keep 4 bits available for the future if the
- * virtual address space is ever to be extended.
+ * The top 4 bits are reserved as the region ID on hash, leaving us 8 bits
+ * that can be used for the field.
  *
  * The direct IO mapping operations will then mask off those bits
  * before doing the actual access, though that only happen when
@@ -292,8 +286,8 @@ extern void _memcpy_toio(volatile void __iomem *dest, const 
void *src,
  */
 
 #ifdef CONFIG_PPC_INDIRECT_MMIO
-#define PCI_IO_IND_TOKEN_MASK  0x0ffful
-#define PCI_IO_IND_TOKEN_SHIFT 48
+#define PCI_IO_IND_TOKEN_SHIFT 52
+#define PCI_IO_IND_TOKEN_MASK  (0xfful << PCI_IO_IND_TOKEN_SHIFT)
 #define PCI_FIX_ADDR(addr) \
((PCI_IO_ADDR)(((unsigned long)(addr)) & ~PCI_IO_IND_TOKEN_MASK))
 #define PCI_GET_ADDR_TOKEN(addr)   \
-- 
2.17.2



Re: [PATCH kernel 3/5] powerpc/powernv: Detach npu struct from pnv_phb

2018-11-06 Thread David Gibson
On Mon, Oct 15, 2018 at 08:32:59PM +1100, Alexey Kardashevskiy wrote:
> The powernv PCI code stores NPU data in the pnv_phb struct. The latter
> is referenced by pci_controller::private_data. We are going to have NPU2
> support in the pseries platform as well but it does not store any
> private_data in in the pci_controller struct; and even if it did,
> it would be a different data structure.
> 
> This adds a global list of NPUs so each platform can register and use
> these in the same fashion.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/platforms/powernv/pci.h | 16 ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 71 
> +---
>  2 files changed, 57 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci.h 
> b/arch/powerpc/platforms/powernv/pci.h
> index 8b37b28..3b7617d 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -8,9 +8,6 @@
>  
>  struct pci_dn;
>  
> -/* Maximum possible number of ATSD MMIO registers per NPU */
> -#define NV_NMMU_ATSD_REGS 8
> -
>  enum pnv_phb_type {
>   PNV_PHB_IODA1   = 0,
>   PNV_PHB_IODA2   = 1,
> @@ -180,19 +177,6 @@ struct pnv_phb {
>   unsigned intdiag_data_size;
>   u8  *diag_data;
>  
> - /* Nvlink2 data */
> - struct npu {
> - int index;
> - __be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
> - unsigned int mmio_atsd_count;
> -
> - /* Bitmask for MMIO register usage */
> - unsigned long mmio_atsd_usage;
> -
> - /* Do we need to explicitly flush the nest mmu? */
> - bool nmmu_flush;
> - } npu;
> -
>   int p2p_target_count;
>  };
>  
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index 01402f9..cb2b4f9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -378,6 +378,25 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct 
> pnv_ioda_pe *npe)
>  /*
>   * NPU2 ATS
>   */
> +/* Maximum possible number of ATSD MMIO registers per NPU */
> +#define NV_NMMU_ATSD_REGS 8
> +
> +struct npu {
> + int index;
> + __be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
> + unsigned int mmio_atsd_count;
> +
> + /* Bitmask for MMIO register usage */
> + unsigned long mmio_atsd_usage;
> +
> + /* Do we need to explicitly flush the nest mmu? */
> + bool nmmu_flush;
> +
> + struct list_head next;
> +
> + struct pci_controller *hose;
> +};
> +
>  static struct {
>   /*
>* spinlock to protect initialisation of an npu_context for
> @@ -396,22 +415,27 @@ static struct {
>   uint64_t atsd_threshold;
>   struct dentry *atsd_threshold_dentry;
>  
> + struct list_head npu_list;
>  } npu2_devices;
>  
>  void pnv_npu2_devices_init(void)
>  {
>   memset(_devices, 0, sizeof(npu2_devices));
> + INIT_LIST_HEAD(_devices.npu_list);
>   spin_lock_init(_devices.context_lock);
>   npu2_devices.atsd_threshold = 2 * 1024 * 1024;
>  }
>  
>  static struct npu *npdev_to_npu(struct pci_dev *npdev)
>  {
> - struct pnv_phb *nphb;
> + struct pci_controller *hose = pci_bus_to_host(npdev->bus);
> + struct npu *npu;
>  
> - nphb = pci_bus_to_host(npdev->bus)->private_data;
> + list_for_each_entry(npu, _devices.npu_list, next)
> + if (hose == npu->hose)
> + return npu;
>  
> - return >npu;
> + return NULL;

If you're introducing the possibility this can fail, should you also
be introducing checks for NULL in the callers?

>  }
>  
>  /* Maximum number of nvlinks per npu */
> @@ -843,7 +867,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev 
> *gpdev,
>*/
>   WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], npdev);
>  
> - if (!nphb->npu.nmmu_flush) {
> + if (!npu->nmmu_flush) {

Looks like this hunk belongs in the patch which introduced npdev_to_npu().

>   /*
>* If we're not explicitly flushing ourselves we need to mark
>* the thread for global flushes
> @@ -967,6 +991,13 @@ int pnv_npu2_init(struct pnv_phb *phb)
>   struct pci_dev *gpdev;
>   static int npu_index;
>   uint64_t rc = 0;
> + struct pci_controller *hose = phb->hose;
> + struct npu *npu;
> + int ret;
> +
> + npu = kzalloc(sizeof(*npu), GFP_KERNEL);
> + if (!npu)
> + return -ENOMEM;
>  
>   if (!npu2_devices.atsd_threshold_dentry) {
>   npu2_devices.atsd_threshold_dentry = debugfs_create_x64(
> @@ -974,8 +1005,7 @@ int pnv_npu2_init(struct pnv_phb *phb)
>   _devices.atsd_threshold);
>   }
>  
> - phb->npu.nmmu_flush =
> - of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
> + npu->nmmu_flush = of_property_read_bool(hose->dn, 

Re: [PATCH kernel 2/5] powerpc/powernv/npu: Collect all static symbols under one struct

2018-11-06 Thread David Gibson
On Mon, Oct 15, 2018 at 08:32:58PM +1100, Alexey Kardashevskiy wrote:
> We are going to add a global list of NPUs in the system which is going
> to be yet another static symbol. Let's reorganise the code and put all
> static symbols into one struct for better tracking what is really needed
> for NPU (this might become a driver data some day).
> 
> Signed-off-by: Alexey Kardashevskiy 

I'm not entirely convinced this is worthwhile, but maybe it'll become
clearer with the later patches.  There are some nits though.

> ---
>  arch/powerpc/include/asm/pci.h|  1 +
>  arch/powerpc/platforms/powernv/npu-dma.c  | 77 
> ++-
>  arch/powerpc/platforms/powernv/pci-ioda.c |  2 +
>  3 files changed, 47 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> index 2af9ded..1a96075 100644
> --- a/arch/powerpc/include/asm/pci.h
> +++ b/arch/powerpc/include/asm/pci.h
> @@ -129,5 +129,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
>  
>  extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
>  extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
> +extern void pnv_npu2_devices_init(void);
>  
>  #endif /* __ASM_POWERPC_PCI_H */
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index 13e5153..01402f9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -22,20 +22,6 @@
>  #include "pci.h"
>  
>  /*
> - * spinlock to protect initialisation of an npu_context for a particular
> - * mm_struct.
> - */
> -static DEFINE_SPINLOCK(npu_context_lock);
> -
> -/*
> - * When an address shootdown range exceeds this threshold we invalidate the
> - * entire TLB on the GPU for the given PID rather than each specific address 
> in
> - * the range.
> - */
> -static uint64_t atsd_threshold = 2 * 1024 * 1024;
> -static struct dentry *atsd_threshold_dentry;
> -
> -/*
>   * Other types of TCE cache invalidation are not functional in the
>   * hardware.
>   */
> @@ -392,6 +378,33 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct 
> pnv_ioda_pe *npe)
>  /*
>   * NPU2 ATS
>   */
> +static struct {
> + /*
> +  * spinlock to protect initialisation of an npu_context for
> +  * a particular mm_struct.
> +  */
> + spinlock_t context_lock;
> +
> + /* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
> + int max_index;
> +
> + /*
> +  * When an address shootdown range exceeds this threshold we invalidate 
> the
> +  * entire TLB on the GPU for the given PID rather than each specific 
> address in
> +  * the range.
> +  */
> + uint64_t atsd_threshold;
> + struct dentry *atsd_threshold_dentry;
> +
> +} npu2_devices;

Even as a structu, it should be possible to statically initialize this.

> +
> +void pnv_npu2_devices_init(void)
> +{
> + memset(_devices, 0, sizeof(npu2_devices));

The memset() is unnecessary.  The static structure lives in the BSS,
which means it is already initialized to zeroes.

> + spin_lock_init(_devices.context_lock);
> + npu2_devices.atsd_threshold = 2 * 1024 * 1024;
> +}
> +
>  static struct npu *npdev_to_npu(struct pci_dev *npdev)
>  {
>   struct pnv_phb *nphb;
> @@ -404,9 +417,6 @@ static struct npu *npdev_to_npu(struct pci_dev *npdev)
>  /* Maximum number of nvlinks per npu */
>  #define NV_MAX_LINKS 6
>  
> -/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
> -static int max_npu2_index;
> -
>  struct npu_context {
>   struct mm_struct *mm;
>   struct pci_dev *npdev[NV_MAX_NPUS][NV_MAX_LINKS];
> @@ -472,7 +482,7 @@ static void mmio_invalidate_pid(struct mmio_atsd_reg 
> mmio_atsd_reg[NV_MAX_NPUS],
>   int i;
>   unsigned long launch;
>  
> - for (i = 0; i <= max_npu2_index; i++) {
> + for (i = 0; i <= npu2_devices.max_index; i++) {
>   if (mmio_atsd_reg[i].reg < 0)
>   continue;
>  
> @@ -503,7 +513,7 @@ static void mmio_invalidate_va(struct mmio_atsd_reg 
> mmio_atsd_reg[NV_MAX_NPUS],
>   int i;
>   unsigned long launch;
>  
> - for (i = 0; i <= max_npu2_index; i++) {
> + for (i = 0; i <= npu2_devices.max_index; i++) {
>   if (mmio_atsd_reg[i].reg < 0)
>   continue;
>  
> @@ -536,7 +546,7 @@ static void mmio_invalidate_wait(
>   int i, reg;
>  
>   /* Wait for all invalidations to complete */
> - for (i = 0; i <= max_npu2_index; i++) {
> + for (i = 0; i <= npu2_devices.max_index; i++) {
>   if (mmio_atsd_reg[i].reg < 0)
>   continue;
>  
> @@ -559,7 +569,7 @@ static void acquire_atsd_reg(struct npu_context 
> *npu_context,
>   struct npu *npu;
>   struct pci_dev *npdev;
>  
> - for (i = 0; i <= max_npu2_index; i++) {
> + for (i = 0; i <= npu2_devices.max_index; i++) {
>   

Re: [PATCH kernel 1/5] powerpc/powernv/npu: Add helper to access struct npu for NPU device

2018-11-06 Thread David Gibson
On Mon, Oct 15, 2018 at 08:32:57PM +1100, Alexey Kardashevskiy wrote:
> This step is to help removing the npu struct from pnv_phb so it
> can be used by pseries as well.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index 3a5c4ed..13e5153 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -389,6 +389,18 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct 
> pnv_ioda_pe *npe)
>   return gpe;
>  }
>  
> +/*
> + * NPU2 ATS
> + */
> +static struct npu *npdev_to_npu(struct pci_dev *npdev)
> +{
> + struct pnv_phb *nphb;
> +
> + nphb = pci_bus_to_host(npdev->bus)->private_data;
> +
> + return >npu;
> +}
> +
>  /* Maximum number of nvlinks per npu */
>  #define NV_MAX_LINKS 6
>  
> @@ -546,7 +558,6 @@ static void acquire_atsd_reg(struct npu_context 
> *npu_context,
>   int i, j;
>   struct npu *npu;
>   struct pci_dev *npdev;
> - struct pnv_phb *nphb;
>  
>   for (i = 0; i <= max_npu2_index; i++) {
>   mmio_atsd_reg[i].reg = -1;
> @@ -561,8 +572,7 @@ static void acquire_atsd_reg(struct npu_context 
> *npu_context,
>   if (!npdev)
>   continue;
>  
> - nphb = pci_bus_to_host(npdev->bus)->private_data;
> - npu = >npu;
> + npu = npdev_to_npu(npdev);
>   mmio_atsd_reg[i].npu = npu;
>   mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
>   while (mmio_atsd_reg[i].reg < 0) {
> @@ -749,7 +759,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev 
> *gpdev,
>   }
>  
>   nphb = pci_bus_to_host(npdev->bus)->private_data;
> - npu = >npu;
> + npu = npdev_to_npu(npdev);
>  
>   /*
>* Setup the NPU context table for a particular GPU. These need to be
> @@ -869,7 +879,7 @@ void pnv_npu2_destroy_context(struct npu_context 
> *npu_context,
>   return;
>  
>   nphb = pci_bus_to_host(npdev->bus)->private_data;
> - npu = >npu;
> + npu = npdev_to_npu(npdev);
>   nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
>   if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
>   _index)))

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: NXP P50XX/e5500: SMP doesn't work anymore with the latest Git kernel

2018-11-06 Thread Scott Wood
On Sun, 2018-10-28 at 17:35 +0100, Christian Zigotzky wrote:
> Hello,
> 
> SMP doesn't work anymore with the latest Git kernel (28/10/18 11:12AM 
> GMT) on my P5020 board and on virtual e5500 QEMU machines.
> 
> Board with P5020 dual core CPU:
> 
> [0.00] -
> [0.00] phys_mem_size = 0x2
> [0.00] dcache_bsize  = 0x40
> [0.00] icache_bsize  = 0x40
> [0.00] cpu_features  = 0x0003008003b4
> [0.00]   possible= 0x0003009003b4
> [0.00]   always  = 0x0003008003b4
> [0.00] cpu_user_features = 0xcc008000 0x0800
> [0.00] mmu_features  = 0x000a0010
> [0.00] firmware_features = 0x
> [0.00] -
> [0.00] CoreNet Generic board
> 
>  ...
> 
> [0.002161] smp: Bringing up secondary CPUs ...
> [0.002339] No cpu-release-addr for cpu 1
> [0.002347] smp: failed starting cpu 1 (rc -2)
> [0.002401] smp: Brought up 1 node, 1 CPU
> 
> Virtual e5500 quad core QEMU machine:
> 
> [0.026394] smp: Bringing up secondary CPUs ...
> [0.027831] No cpu-release-addr for cpu 1
> [0.027989] smp: failed starting cpu 1 (rc -2)
> [0.030143] No cpu-release-addr for cpu 2
> [0.030304] smp: failed starting cpu 2 (rc -2)
> [0.032400] No cpu-release-addr for cpu 3
> [0.032533] smp: failed starting cpu 3 (rc -2)
> [0.033117] smp: Brought up 1 node, 1 CPU
> 
> QEMU command: ./qemu-system-ppc64 -M ppce500 -cpu e5500 -m 2048 -kernel 
> /home/christian/Downloads/vmlinux-4.20-alpha4-
> AmigaOne_X1000_X5000/X5000_and_QEMU_e5500/uImage-4.20 
> -drive 
> format=raw,file=/home/christian/Downloads/MATE_PowerPC_Remix_2017_0.9.img,in
> dex=0,if=virtio 
> -nic user,model=e1000 -append "rw root=/dev/vda" -device virtio-vga 
> -device virtio-mouse-pci -device virtio-keyboard-pci -usb -soundhw 
> es1370 -smp 4
> 
> .config:
> 
> ...
> CONFIG_SMP=y
> CONFIG_NR_CPUS=4
> ...
> 
> Please test the latest Git kernel on your NXP P50XX boards.

I wasn't able to reproduce this with top-of-tree (commit 8053e5b93eca9b) with
either qemu (e5500 on ppce500) or a hardware t4240.

-Scott



[PATCH] KVM: PPC: Move and undef TRACE_INCLUDE_PATH/FILE

2018-11-06 Thread Scott Wood
TRACE_INCLUDE_PATH and TRACE_INCLUDE_FILE are used by
, so like that #include, they should
be outside #ifdef protection.

They also need to be #undefed before defining, in case multiple trace
headers are included by the same C file.  This became the case on
book3e after commit cf4a6085151a ("powerpc/mm: Add missing tracepoint for
tlbie"), leading to the following build error:

   CC  arch/powerpc/kvm/powerpc.o
In file included from arch/powerpc/kvm/powerpc.c:51:0:
arch/powerpc/kvm/trace.h:9:0: error: "TRACE_INCLUDE_PATH" redefined
[-Werror]
  #define TRACE_INCLUDE_PATH .
  ^
In file included from arch/powerpc/kvm/../mm/mmu_decl.h:25:0,
  from arch/powerpc/kvm/powerpc.c:48:
./arch/powerpc/include/asm/trace.h:224:0: note: this is the location of
the previous definition
  #define TRACE_INCLUDE_PATH asm
  ^
cc1: all warnings being treated as errors

Reported-by: Christian Zigotzky 
Signed-off-by: Scott Wood 
---
 arch/powerpc/kvm/trace.h   | 8 ++--
 arch/powerpc/kvm/trace_booke.h | 9 +++--
 arch/powerpc/kvm/trace_hv.h| 9 +++--
 arch/powerpc/kvm/trace_pr.h| 9 +++--
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 491b0f715d6b..ea1d7c808319 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -6,8 +6,6 @@
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm
-#define TRACE_INCLUDE_PATH .
-#define TRACE_INCLUDE_FILE trace
 
 /*
  * Tracepoint for guest mode entry.
@@ -120,4 +118,10 @@ TRACE_EVENT(kvm_check_requests,
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace
+
 #include 
diff --git a/arch/powerpc/kvm/trace_booke.h b/arch/powerpc/kvm/trace_booke.h
index ac640e81fdc5..3837842986aa 100644
--- a/arch/powerpc/kvm/trace_booke.h
+++ b/arch/powerpc/kvm/trace_booke.h
@@ -6,8 +6,6 @@
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm_booke
-#define TRACE_INCLUDE_PATH .
-#define TRACE_INCLUDE_FILE trace_booke
 
 #define kvm_trace_symbol_exit \
{0, "CRITICAL"}, \
@@ -218,4 +216,11 @@ TRACE_EVENT(kvm_booke_queue_irqprio,
 #endif
 
 /* This part must be outside protection */
+
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace_booke
+
 #include 
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index bcfe8a987f6a..8a1e3b0047f1 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -9,8 +9,6 @@
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm_hv
-#define TRACE_INCLUDE_PATH .
-#define TRACE_INCLUDE_FILE trace_hv
 
 #define kvm_trace_symbol_hcall \
{H_REMOVE,  "H_REMOVE"}, \
@@ -497,4 +495,11 @@ TRACE_EVENT(kvmppc_run_vcpu_exit,
 #endif /* _TRACE_KVM_HV_H */
 
 /* This part must be outside protection */
+
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace_hv
+
 #include 
diff --git a/arch/powerpc/kvm/trace_pr.h b/arch/powerpc/kvm/trace_pr.h
index 2f9a8829552b..46a46d328fbf 100644
--- a/arch/powerpc/kvm/trace_pr.h
+++ b/arch/powerpc/kvm/trace_pr.h
@@ -8,8 +8,6 @@
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm_pr
-#define TRACE_INCLUDE_PATH .
-#define TRACE_INCLUDE_FILE trace_pr
 
 TRACE_EVENT(kvm_book3s_reenter,
TP_PROTO(int r, struct kvm_vcpu *vcpu),
@@ -257,4 +255,11 @@ TRACE_EVENT(kvm_exit,
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
+
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace_pr
+
 #include 
-- 
2.17.1



Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-06 Thread Doug Anderson
Hi,

On Sat, Nov 3, 2018 at 3:45 AM Daniel Thompson
 wrote:
>
> On Wed, Oct 31, 2018 at 02:41:14PM -0700, Doug Anderson wrote:
> > > As mentioned in another part of the thread we can also add robustness
> > > by skipping a cpu where csd->flags != 0 (and adding an appropriately
> > > large comment regarding why). Doing the check directly is abusing
> > > internal knowledge that smp.c normally keeps to itself so an accessor
> > > of some kind would be needed.
> >
> > Sure.  I could add smp_async_func_finished() that just looked like:
> >
> > int smp_async_func_finished(call_single_data_t *csd)
> > {
> >   return !(csd->flags & CSD_FLAG_LOCK);
> > }
> >
> > My understanding of all the mutual exclusion / memory barrier concepts
> > employed by smp.c is pretty weak, though.  I'm hoping that it's safe
> > to just access the structure and check the bit directly.
> >
> > ...but do you think adding a generic accessor like this is better than
> > just keeping track of this in kgdb directly?  I could avoid the
> > accessor by adding a "rounding_up" member to "struct
> > debuggerinfo_struct" and doing something like this in roundup:
> >
> >   /* If it didn't round up last time, don't try again */
> >   if (kgdb_info[cpu].rounding_up)
> > continue
> >
> >   kgdb_info[cpu].rounding_up = true
> >   smp_call_function_single_async(cpu, csd);
> >
> > ...and then in kgdb_nmicallback() I could just add:
> >
> >   kgdb_info[cpu].rounding_up = false
> >
> > In that case we're not adding a generic accessor to smp.c that most
> > people should never use.
>
> Whilst it is very tempting to make a sarcastic reply here ("Of course! What
> kgdb really needs is yet another set of condition variables") I can't
> because I actually agree with the proposal. I don't really want kgdb to
> be too "special" especially when it doesn't need to be.
>
> Only thing to note is that rounding_up will not be manipulated within a
> common spin lock so you might have to invest a bit of thought to make
> sure any races between the master and slave as the slave CPU clears the
> flag are benign.

OK, so I've hopefully got all this thought through and posted v3.
I've put most of my thoughts in the patch descriptions themselves so
let's continue the discussion there.

-Doug


[PATCH v3 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-06 Thread Douglas Anderson
When I had lockdep turned on and dropped into kgdb I got a nice splat
on my system.  Specifically it hit:
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)

Specifically it looked like this:
  sysrq: SysRq : DEBUG
  [ cut here ]
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)
  WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
lockdep_hardirqs_on+0xf0/0x160
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
  pstate: 604003c9 (nZCv DAIF +PAN -UAO)
  pc : lockdep_hardirqs_on+0xf0/0x160
  ...
  Call trace:
   lockdep_hardirqs_on+0xf0/0x160
   trace_hardirqs_on+0x188/0x1ac
   kgdb_roundup_cpus+0x14/0x3c
   kgdb_cpu_enter+0x53c/0x5cc
   kgdb_handle_exception+0x180/0x1d4
   kgdb_compiled_brk_fn+0x30/0x3c
   brk_handler+0x134/0x178
   do_debug_exception+0xfc/0x178
   el1_dbg+0x18/0x78
   kgdb_breakpoint+0x34/0x58
   sysrq_handle_dbg+0x54/0x5c
   __handle_sysrq+0x114/0x21c
   handle_sysrq+0x30/0x3c
   qcom_geni_serial_isr+0x2dc/0x30c
  ...
  ...
  irq event stamp: ...45
  hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
  hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
  softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
  softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
  ---[ end trace adf21f830c46e638 ]---

Looking closely at it, it seems like a really bad idea to be calling
local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
like it could violate spinlock semantics and cause a deadlock.

Instead, let's use a private csd alongside
smp_call_function_single_async() to round up the other CPUs.  Using
smp_call_function_single_async() doesn't require interrupts to be
enabled so we can remove the offending bit of code.

In order to avoid duplicating this across all the architectures that
use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
to debug_core.c.

Looking at all the people who previously had copies of this code,
there were a few variants.  I've attempted to keep the variants
working like they used to.  Specifically:
* For arch/arc we passed NULL to kgdb_nmicallback() instead of
  get_irq_regs().
* For arch/mips there was a bit of extra code around
  kgdb_nmicallback()

NOTE: In this patch we will still get into trouble if we try to round
up a CPU that failed to round up before.  We'll try to round it up
again and potentially hang when we try to grab the csd lock.  That's
not new behavior but we'll still try to do better in a future patch.

Suggested-by: Daniel Thompson 
Signed-off-by: Douglas Anderson 
---

Changes in v3:
- No separate init call.
- Don't round up the CPU that is doing the rounding up.
- Add "#ifdef CONFIG_SMP" to match the rest of the file.
- Updated desc saying we don't solve the "failed to roundup" case.
- Document the ignored parameter.

Changes in v2:
- Removing irq flags separated from fixing lockdep splat.
- Don't use smp_call_function (Daniel).

 arch/arc/kernel/kgdb.c | 10 ++
 arch/arm/kernel/kgdb.c | 12 
 arch/arm64/kernel/kgdb.c   | 12 
 arch/hexagon/kernel/kgdb.c | 27 ---
 arch/mips/kernel/kgdb.c|  9 +
 arch/powerpc/kernel/kgdb.c |  4 ++--
 arch/sh/kernel/kgdb.c  | 12 
 include/linux/kgdb.h   | 15 +--
 kernel/debug/debug_core.c  | 35 +++
 9 files changed, 53 insertions(+), 83 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 0932851028e0..68d9fe4b5aa7 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -192,18 +192,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long 
ip)
instruction_pointer(regs) = ip;
 }
 
-static void kgdb_call_nmi_hook(void *ignored)
+void kgdb_call_nmi_hook(void *ignored)
 {
+   /* Default implementation passes get_irq_regs() but we don't */
kgdb_nmicallback(raw_smp_processor_id(), NULL);
 }
 
-void kgdb_roundup_cpus(void)
-{
-   local_irq_enable();
-   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-   local_irq_disable();
-}
-
 struct kgdb_arch arch_kgdb_ops = {
/* breakpoint instruction: TRAP_S 0x3 */
 #ifdef CONFIG_CPU_BIG_ENDIAN
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index f21077b077be..d9a69e941463 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -170,18 +170,6 @@ static struct undef_hook kgdb_compiled_brkpt_hook = {
.fn = kgdb_compiled_brk_fn
 };
 
-static void kgdb_call_nmi_hook(void *ignored)
-{
-   kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
-}
-
-void kgdb_roundup_cpus(void)
-{
-   local_irq_enable();
-   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-   local_irq_disable();
-}
-
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
 {
struct pt_regs *regs = args->regs;
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 

[PATCH v3 1/4] kgdb: Remove irq flags from roundup

2018-11-06 Thread Douglas Anderson
The function kgdb_roundup_cpus() was passed a parameter that was
documented as:

> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().

Nobody used those flags.  Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them.  So we can definitely remove the flags.

Signed-off-by: Douglas Anderson 
---

Changes in v3: None
Changes in v2:
- Removing irq flags separated from fixing lockdep splat.

 arch/arc/kernel/kgdb.c | 2 +-
 arch/arm/kernel/kgdb.c | 2 +-
 arch/arm64/kernel/kgdb.c   | 2 +-
 arch/hexagon/kernel/kgdb.c | 9 ++---
 arch/mips/kernel/kgdb.c| 2 +-
 arch/powerpc/kernel/kgdb.c | 2 +-
 arch/sh/kernel/kgdb.c  | 2 +-
 arch/sparc/kernel/smp_64.c | 2 +-
 arch/x86/kernel/kgdb.c | 9 ++---
 include/linux/kgdb.h   | 9 ++---
 kernel/debug/debug_core.c  | 2 +-
 11 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 9a3c34af2ae8..0932851028e0 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -197,7 +197,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), NULL);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe3dc61..f21077b077be 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -175,7 +175,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a20de58061a8..12c339ff6e75 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -289,7 +289,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 16c24b22d0b2..012e0e230ac2 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long 
pc)
 
 /**
  * kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
  *
  * On SMP systems, we need to get the attention of the other CPUs
  * and get them be in a known state.  This should do what is needed
  * to get the other CPUs to call kgdb_wait(). Note that on some arches,
  * the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
  *
  * On non-SMP systems, this is not called.
  */
@@ -139,7 +134,7 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index eb6c0d582626..2b05effc17b4 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -219,7 +219,7 @@ static void kgdb_call_nmi_hook(void *ignored)
set_fs(old_fs);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 59c578f865aa..b0e804844be0 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_SMP
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
smp_send_debugger_break();
 }
diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
index 4f04c6638a4d..cc57630f6bf2 100644
--- a/arch/sh/kernel/kgdb.c
+++ b/arch/sh/kernel/kgdb.c
@@ -319,7 +319,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();

[PATCH v3 0/4] kgdb: Fix kgdb_roundup_cpus()

2018-11-06 Thread Douglas Anderson
This series was originally part of the series ("serial: Finish kgdb on
qcom_geni; fix many lockdep splats w/ kgdb") but it made sense to
split it up.

It's believed that dropping into kgdb should be more robust once these
patches are applied.

Changes in v3:
- No separate init call.
- Don't round up the CPU that is doing the rounding up.
- Add "#ifdef CONFIG_SMP" to match the rest of the file.
- Updated desc saying we don't solve the "failed to roundup" case.
- Document the ignored parameter.
- Patch #3 and #4 are new.

Changes in v2:
- Removing irq flags separated from fixing lockdep splat.
- Removing irq flags separated from fixing lockdep splat.
- Don't use smp_call_function (Daniel).

Douglas Anderson (4):
  kgdb: Remove irq flags from roundup
  kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
  kgdb: Don't round up a CPU that failed rounding up before
  kdb: Don't back trace on a cpu that didn't round up

 arch/arc/kernel/kgdb.c | 10 ++-
 arch/arm/kernel/kgdb.c | 12 -
 arch/arm64/kernel/kgdb.c   | 12 -
 arch/hexagon/kernel/kgdb.c | 32 --
 arch/mips/kernel/kgdb.c|  9 +--
 arch/powerpc/kernel/kgdb.c |  6 ++---
 arch/sh/kernel/kgdb.c  | 12 -
 arch/sparc/kernel/smp_64.c |  2 +-
 arch/x86/kernel/kgdb.c |  9 ++-
 include/linux/kgdb.h   | 22 +--
 kernel/debug/debug_core.c  | 55 +-
 kernel/debug/debug_core.h  |  1 +
 kernel/debug/kdb/kdb_bt.c  | 11 +++-
 13 files changed, 88 insertions(+), 105 deletions(-)

-- 
2.19.1.930.g4563a0d9d0-goog



Re: System not booting since dm changes? (was Linux 4.20-rc1)

2018-11-06 Thread Michael Ellerman
Mike Snitzer  writes:
> On Mon, Nov 05 2018 at  5:25am -0500,
> Michael Ellerman  wrote:
>
>> Linus Torvalds  writes:
>> ...
>> > Mike Snitzer (1):
>> > device mapper updates
>> 
>> Hi Mike,
>> 
>> Replying here because I can't find the device-mapper pull or the patch
>> in question on LKML. I guess I should be subscribed to dm-devel.
>> 
>> We have a box that doesn't boot any more, bisect points at one of:
>> 
>>   cef6f55a9fb4 Mike Snitzer   dm table: require that request-based DM be 
>> layered on blk-mq devices 
>>   953923c09fe8 Mike Snitzer   dm: rename DM_TYPE_MQ_REQUEST_BASED to 
>> DM_TYPE_REQUEST_BASED 
>>   6a23e05c2fe3 Jens Axboe dm: remove legacy request-based IO path 
>> 
>> 
>> It's a Power8 system running Rawhide, it does have multipath, but I'm
>> told it was setup by the Fedora installer, ie. nothing fancy.
>> 
>> The symptom is the system can't find its root filesystem and drops into
>> the initramfs shell. The dmesg includes a bunch of errors like below:
>> 
>>   [   43.263460] localhost multipathd[1344]: sdb: fail to get serial
>>   [   43.268762] localhost multipathd[1344]: mpatha: failed in domap for 
>> addition of new path sdb
>>   [   43.268762] localhost multipathd[1344]: uevent trigger error
>>   [   43.282065] localhost kernel: device-mapper: table: table load 
>> rejected: not all devices are blk-mq request-stackable
> ...
>>
>> Any ideas what's going wrong here?
>
> "table load rejected: not all devices are blk-mq request-stackable"
> speaks to the fact that you aren't using blk-mq for scsi (aka scsi-mq).
>
> You need to use scsi_mod.use_blk_mq=Y on the kernel commandline (or set
> CONFIG_SCSI_MQ_DEFAULT in your kernel config)

Thanks.

Looks like CONFIG_SCSI_MQ_DEFAULT is default y, so new configs should
pick that up by default. We must have had an old .config that didn't get
that update.

cheers


Re: [RFC PATCH v2 00/14] New TM Model

2018-11-06 Thread Michael Neuling


> In fact, I was the one that identified this performance degradation issue,
> and reported to Adhemerval who kindly fixed it with
> f0458cf4f9ff3d870c43b624e6dccaaf657d5e83.
> 
> Anyway, I think we are safe here.

FWIW Agreed. PPC_FEATURE2_HTM_NOSC should be persevered by this series.

Mikey


Re: [PATCH 3/3] powerpc/mm/64s: Only use slbfee on CPUs that support it

2018-11-06 Thread Nicholas Piggin
On Tue,  6 Nov 2018 23:37:09 +1100
Michael Ellerman  wrote:

> The slbfee instruction was only added in ISA 2.05 (Power6), it's not
> supported on older CPUs. We don't have a CPU feature for that ISA
> version though, so just use the ISA 2.06 feature flag.
> 
> Fixes: e15a4fea4dee ("powerpc/64s/hash: Add some SLB debugging tests")
> Signed-off-by: Michael Ellerman 

Ugh, thank for fixing all that up. Looks good.

Thanks,
Nick

> ---
>  arch/powerpc/mm/slb.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index 457fd29448b1..b663a36f9ada 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -66,6 +66,9 @@ static void assert_slb_presence(bool present, unsigned long 
> ea)
>  
>   WARN_ON_ONCE(mfmsr() & MSR_EE);
>  
> + if (!cpu_has_feature(CPU_FTR_ARCH_206))
> + return;
> +
>   asm volatile(__PPC_SLBFEE_DOT(%0, %1) : "=r"(tmp) : "r"(ea) : "cr0");
>  
>   WARN_ON(present == (tmp == 0));
> -- 
> 2.17.2
> 



Re: [alsa-devel] Build regressions/improvements in v4.20-rc1 (sound/pci/hda/patch_ca0132.c)

2018-11-06 Thread Pierre-Louis Bossart





*** ERRORS ***

   + /kisskb/src/sound/pci/hda/patch_ca0132.c: error: implicit declaration of 
function 'pci_iomap' [-Werror=implicit-function-declaration]:  => 8799:3

sh4-all{mod,yes}config

Looks like d9b84a15892c0233 ("ALSA: hda: Fix implicit definition of
pci_iomap() on SH")
is not sufficient?

Different problem.  This is about "select":

config SND_SOC_ALL_CODECS
 tristate "Build all ASoC CODEC drivers"

That enables (sets):
 select SND_SOC_HDAC_HDA
which selects SND_HDA even though CONFIG_PCI is not enabled.

After SND_HDA is selected (above), the Kconfig symbols in
sound/pci/hda/Kconfig are available for enabling, so
SND_HDA_CODEC_CA0132 is enabled but will not build.

Thanks for looking into this!


One simple solution (but possibly too naive) is:

---
  sound/soc/codecs/Kconfig |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

--- lnx-420-rc1.orig/sound/soc/codecs/Kconfig
+++ lnx-420-rc1/sound/soc/codecs/Kconfig
@@ -82,7 +82,7 @@ config SND_SOC_ALL_CODECS
 select SND_SOC_ES7241
 select SND_SOC_GTM601
 select SND_SOC_HDAC_HDMI
-   select SND_SOC_HDAC_HDA
+   select SND_SOC_HDAC_HDA if PCI
 select SND_SOC_ICS43432
 select SND_SOC_INNO_RK3036
 select SND_SOC_ISABELLE if I2C

I guess that will work. There are already plenty of "select foo if bar" lines.
However, looking at what else can enable SND_HDA, I think it should be

 select SND_SOC_HDAC_HDA if SND_PCI || ARCH_TEGRA


This codec can only be used by the Skylake driver (and the upcoming SOF 
one). For Tegra this module will never be used unless they follow the 
same path of enabling ASoC to deal with the HDaudio codecs instead of 
the legacy.


Likewise HDAC_HDMI will only work on Intel platforms for now.



That still leaves the issue that pci_iomap() on SH should be an empty stub if
PCI is not available, like on other architectures.


I thought Mark Brown provided a fix to SH maintainers?




Re: [PATCH v11 10/26] mm: protect VMA modifications using VMA sequence count

2018-11-06 Thread Vinayak Menon
On 11/5/2018 11:52 PM, Laurent Dufour wrote:
> Le 05/11/2018 à 08:04, vinayak menon a écrit :
>> Hi Laurent,
>>
>> On Thu, May 17, 2018 at 4:37 PM Laurent Dufour
>>  wrote:
>>>
>>> The VMA sequence count has been introduced to allow fast detection of
>>> VMA modification when running a page fault handler without holding
>>> the mmap_sem.
>>>
>>> This patch provides protection against the VMA modification done in :
>>>  - madvise()
>>>  - mpol_rebind_policy()
>>>  - vma_replace_policy()
>>>  - change_prot_numa()
>>>  - mlock(), munlock()
>>>  - mprotect()
>>>  - mmap_region()
>>>  - collapse_huge_page()
>>>  - userfaultd registering services
>>>
>>> In addition, VMA fields which will be read during the speculative fault
>>> path needs to be written using WRITE_ONCE to prevent write to be split
>>> and intermediate values to be pushed to other CPUs.
>>>
>>> Signed-off-by: Laurent Dufour 
>>> ---
>>>   fs/proc/task_mmu.c |  5 -
>>>   fs/userfaultfd.c   | 17 +
>>>   mm/khugepaged.c    |  3 +++
>>>   mm/madvise.c   |  6 +-
>>>   mm/mempolicy.c | 51 
>>> ++-
>>>   mm/mlock.c | 13 -
>>>   mm/mmap.c  | 22 +-
>>>   mm/mprotect.c  |  4 +++-
>>>   mm/swap_state.c    |  8 ++--
>>>   9 files changed, 89 insertions(+), 40 deletions(-)
>>>
>>>   struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>>>  struct vm_fault *vmf)
>>> @@ -665,9 +669,9 @@ static inline void swap_ra_clamp_pfn(struct 
>>> vm_area_struct *vma,
>>>   unsigned long *start,
>>>   unsigned long *end)
>>>   {
>>> -   *start = max3(lpfn, PFN_DOWN(vma->vm_start),
>>> +   *start = max3(lpfn, PFN_DOWN(READ_ONCE(vma->vm_start)),
>>>    PFN_DOWN(faddr & PMD_MASK));
>>> -   *end = min3(rpfn, PFN_DOWN(vma->vm_end),
>>> +   *end = min3(rpfn, PFN_DOWN(READ_ONCE(vma->vm_end)),
>>>  PFN_DOWN((faddr & PMD_MASK) + PMD_SIZE));
>>>   }
>>>
>>> -- 
>>> 2.7.4
>>>
>>
>> I have got a crash on 4.14 kernel with speculative page faults enabled
>> and here is my analysis of the problem.
>> The issue was reported only once.
>
> Hi Vinayak,
>
> Thanks for reporting this.
>
>>
>> [23409.303395]  el1_da+0x24/0x84
>> [23409.303400]  __radix_tree_lookup+0x8/0x90
>> [23409.303407]  find_get_entry+0x64/0x14c
>> [23409.303410]  pagecache_get_page+0x5c/0x27c
>> [23409.303416]  __read_swap_cache_async+0x80/0x260
>> [23409.303420]  swap_vma_readahead+0x264/0x37c
>> [23409.303423]  swapin_readahead+0x5c/0x6c
>> [23409.303428]  do_swap_page+0x128/0x6e4
>> [23409.303431]  handle_pte_fault+0x230/0xca4
>> [23409.303435]  __handle_speculative_fault+0x57c/0x7c8
>> [23409.303438]  do_page_fault+0x228/0x3e8
>> [23409.303442]  do_translation_fault+0x50/0x6c
>> [23409.303445]  do_mem_abort+0x5c/0xe0
>> [23409.303447]  el0_da+0x20/0x24
>>
>> Process A accesses address ADDR (part of VMA A) and that results in a
>> translation fault.
>> Kernel enters __handle_speculative_fault to fix the fault.
>> Process A enters do_swap_page->swapin_readahead->swap_vma_readahead
>> from speculative path.
>> During this time, another process B which shares the same mm, does a
>> mprotect from another CPU which follows
>> mprotect_fixup->__split_vma, and it splits VMA A into VMAs A and B.
>> After the split, ADDR falls into VMA B, but process A is still using
>> VMA A.
>> Now ADDR is greater than VMA_A->vm_start and VMA_A->vm_end.
>> swap_vma_readahead->swap_ra_info uses start and end of vma to
>> calculate ptes and nr_pte, which goes wrong due to this and finally
>> resulting in wrong "entry" passed to
>> swap_vma_readahead->__read_swap_cache_async, and in turn causing
>> invalid swapper_space
>> being passed to __read_swap_cache_async->find_get_page, causing an abort.
>>
>> The fix I have tried is to cache vm_start and vm_end also in vmf and
>> use it in swap_ra_clamp_pfn. Let me know your thoughts on this. I can
>> send
>> the patch I am a using if you feel that is the right thing to do.
>
> I think the best would be to don't do swap readahead during the speculatvive 
> page fault. If the page is found in the swap cache, that's fine, but 
> otherwise, we should f    allback to the regular page fault.
>
> The attached -untested- patch is doing this, if you want to give it a try. 
> I'll review that for the next series.
>

Thanks Laurent. I and going to try this patch.

With this patch, since all non-SWP_SYNCHRONOUS_IO swapins result in 
non-speculative fault
and a retry, wouldn't this have an impact on some perf numbers ? If so, would 
caching start
and end be a better option ?

Also, would it make sense to move the FAULT_FLAG_SPECULATIVE check inside 
swapin_readahead,
in a way that  swap_cluster_readahead can take the speculative path 

Re: [RFC PATCH v2 00/14] New TM Model

2018-11-06 Thread Breno Leitao
hi Florian,

On 11/06/2018 04:32 PM, Florian Weimer wrote:
> * Breno Leitao:
> 
>> This  patchset for the hardware transactional memory (TM) subsystem
>> aims to avoid spending a lot of time on TM suspended mode in kernel
>> space.  It basically changes where the reclaim/recheckpoint will be
>> executed.
> 
> I assumed that we want to abort on every system call these days?
> 
> We have this commit in glibc:
> 
> commit f0458cf4f9ff3d870c43b624e6dccaaf657d5e83
> Author: Adhemerval Zanella 
> Date:   Mon Aug 27 09:42:50 2018 -0300
> 
> powerpc: Only enable TLE with PPC_FEATURE2_HTM_NOSC
> 
> Linux from 3.9 through 4.2 does not abort HTM transaction on syscalls,
> instead it suspend and resume it when leaving the kernel.  The
> side-effects of the syscall will always remain visible, even if the
> transaction is aborted.  This is an issue when transaction is used along
> with futex syscall, on pthread_cond_wait for instance, where the futex
> call might succeed but the transaction is rolled back leading the
> pthread_cond object in an inconsistent state.
> 
> Glibc used to prevent it by always aborting a transaction before issuing
> a syscall.  Linux 4.2 also decided to abort active transaction in
> syscalls which makes the glibc workaround superfluous.  Worse, glibc
> transaction abortion leads to a performance issue on recent kernels
> where the HTM state is saved/restore lazily (v4.9).  By aborting a
> transaction on every syscalls, regardless whether a transaction has being
> initiated before, GLIBS makes the kernel always save/restore HTM state
> (it can not even lazily disable it after a certain number of syscall
> iterations).
> 
> Because of this shortcoming, Transactional Lock Elision is just enabled
> when it has been explicitly set (either by tunables of by a configure
> switch) and if kernel aborts HTM transactions on syscalls
> (PPC_FEATURE2_HTM_NOSC).  It is reported that using simple benchmark [1],
> the context-switch is about 5% faster by not issuing a tabort in every
> syscall in newer kernels.
> 
> I wonder how the new TM model interacts with the assumption we currently
> have in glibc.

This new TM model is almost transparent to userspace. My patchset basically
affects where recheckpoint and reclaim happens inside kernel space, and
should not change userspace behavior.

I say "almost transparent" because it might cause some very specific
transactions to have a higher doom rate, see patch 14/14 for a more detailed
information, and also a reference for GLIBCs "tabort prior system calls"
behavior.

Regarding Adhemerval's patch, it is unaffected to this new model. Prior to
kernel 4.2, kernel was executing a syscall independently of the TM state,
which caused undesired side effect, thus GLIBC decision to abort a
transaction prior to calling a syscall.

Later, kernel system call mechanism was aware of the TM state, and this GLIBC
workaround was not necessary anymore.

More than that, this workaround started to cause  performance degradation on
context switches, mainly when TM facility became lazy enabled, i.e, the TM
facility mechanism would be enabled on demand (a task uses TM explicitly).
This happens because this "abort prior to every system call" workaround
started to trigger the TM facility to be enabled for every task that calls
system calls.

In fact, I was the one that identified this performance degradation issue,
and reported to Adhemerval who kindly fixed it with
f0458cf4f9ff3d870c43b624e6dccaaf657d5e83.

Anyway, I think we are safe here.

Thanks for bringing this up.
Breno









Re: [RFC PATCH v2 00/14] New TM Model

2018-11-06 Thread Florian Weimer
* Breno Leitao:

> This  patchset for the hardware transactional memory (TM) subsystem
> aims to avoid spending a lot of time on TM suspended mode in kernel
> space.  It basically changes where the reclaim/recheckpoint will be
> executed.

I assumed that we want to abort on every system call these days?

We have this commit in glibc:

commit f0458cf4f9ff3d870c43b624e6dccaaf657d5e83
Author: Adhemerval Zanella 
Date:   Mon Aug 27 09:42:50 2018 -0300

powerpc: Only enable TLE with PPC_FEATURE2_HTM_NOSC

Linux from 3.9 through 4.2 does not abort HTM transaction on syscalls,
instead it suspend and resume it when leaving the kernel.  The
side-effects of the syscall will always remain visible, even if the
transaction is aborted.  This is an issue when transaction is used along
with futex syscall, on pthread_cond_wait for instance, where the futex
call might succeed but the transaction is rolled back leading the
pthread_cond object in an inconsistent state.

Glibc used to prevent it by always aborting a transaction before issuing
a syscall.  Linux 4.2 also decided to abort active transaction in
syscalls which makes the glibc workaround superfluous.  Worse, glibc
transaction abortion leads to a performance issue on recent kernels
where the HTM state is saved/restore lazily (v4.9).  By aborting a
transaction on every syscalls, regardless whether a transaction has being
initiated before, GLIBS makes the kernel always save/restore HTM state
(it can not even lazily disable it after a certain number of syscall
iterations).

Because of this shortcoming, Transactional Lock Elision is just enabled
when it has been explicitly set (either by tunables of by a configure
switch) and if kernel aborts HTM transactions on syscalls
(PPC_FEATURE2_HTM_NOSC).  It is reported that using simple benchmark [1],
the context-switch is about 5% faster by not issuing a tabort in every
syscall in newer kernels.

I wonder how the new TM model interacts with the assumption we currently
have in glibc.

Thanks,
Florian


Re: [alsa-devel] Build regressions/improvements in v4.20-rc1 (sound/pci/hda/patch_ca0132.c)

2018-11-06 Thread Mark Brown
On Tue, Nov 06, 2018 at 08:56:39AM -0600, Pierre-Louis Bossart wrote:

> > However, looking at what else can enable SND_HDA, I think it should be

> >  select SND_SOC_HDAC_HDA if SND_PCI || ARCH_TEGRA

> This codec can only be used by the Skylake driver (and the upcoming SOF
> one). For Tegra this module will never be used unless they follow the same
> path of enabling ASoC to deal with the HDaudio codecs instead of the legacy.

> Likewise HDAC_HDMI will only work on Intel platforms for now.

_ALL_CODECS is a build coverage option, not a useful configuration
option.  We should be building things in as many different
configurations as possible so that it's easier for people to get build
coverage when making subsystem wide changes.


signature.asc
Description: PGP signature


Re: [alsa-devel] Build regressions/improvements in v4.20-rc1 (sound/pci/hda/patch_ca0132.c)

2018-11-06 Thread Takashi Iwai
On Tue, 06 Nov 2018 02:04:47 +0100,
Randy Dunlap wrote:
> 
> On 11/5/18 2:12 PM, Geert Uytterhoeven wrote:
> > On Mon, Nov 5, 2018 at 11:07 PM Geert Uytterhoeven  
> > wrote:
> >> Below is the list of build error/warning regressions/improvements in
> >> v4.20-rc1[1] compared to v4.19[2].
> >>
> >> Summarized:
> >>   - build errors: +3/-0
> >>   - build warnings: +449/-2712
> >>
> >> Happy fixing! ;-)
> >>
> >> Thanks to the linux-next team for providing the build service.
> >>
> >> [1] 
> >> http://kisskb.ellerman.id.au/kisskb/branch/linus/head/651022382c7f8da46cb4872a545ee1da6d097d2a/
> >>  (all 240 configs)
> >> [2] 
> >> http://kisskb.ellerman.id.au/kisskb/branch/linus/head/84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d/
> >>  (all 240 configs)
> >>
> >>
> >> *** ERRORS ***
> >>
> >>   + /kisskb/src/sound/pci/hda/patch_ca0132.c: error: implicit declaration 
> >> of function 'pci_iomap' [-Werror=implicit-function-declaration]:  => 8799:3
> > 
> > sh4-all{mod,yes}config
> > 
> > Looks like d9b84a15892c0233 ("ALSA: hda: Fix implicit definition of
> > pci_iomap() on SH")
> > is not sufficient?
> 
> Different problem.  This is about "select":
> 
> config SND_SOC_ALL_CODECS
>   tristate "Build all ASoC CODEC drivers"
> 
> That enables (sets):
>   select SND_SOC_HDAC_HDA
> which selects SND_HDA even though CONFIG_PCI is not enabled.

Actually it is OK to enable CONFIG_SND_HDA_CODEC_CA0132 without
CONFIG_PCI.  IIRC, there was a system like that, too.
The commit above should have covered the build failure on SH, but
apparently isn't enough for some arch setups, as it seems.

The cause is clear now: pci_iomap() is defined in
asm-generic/pci_iomap.h only when CONFIG_GENERIC_PCI_IOMAP is
defined.  Including asm/io.h doesn't help unless CONFIG_PCI is set.

Below is a quick fix for this.


thanks,

Takashi

-- 8< --

From: Takashi Iwai 
Subject: [PATCH] ALSA: hda/ca0132 - Yet more fix on build breakage without PCI
 support

The recent change in CA0132 codec driver for supporting more
Creative boards includes the pci_iomap() call to get the extra
register accesses.  This is supposed to work on all archs and setups,
by the implicit assumption that every arch would provide a dummy
function returning NULL when no PCI is available.  But the reality
bites, of course; as Geert's regular build test shows, some configs
(at least SH4 without CONFIG_PCI) leads to a build error due to the
implicit function declaration.

So this is another attempt to fix the issue: now we add an ifdef
CONFIG_PCI line, so that pci_iomap() won't be called unless PCI is
really usable.  This should fall back to the standard quirk again with
a warning.

Fixes: d9b84a15892c0233 ("ALSA: hda: Fix implicit definition of pci_iomap() on 
SH")
Reported-by: Geert Uytterhoeven 
Cc: 
Signed-off-by: Takashi Iwai 
---
 sound/pci/hda/patch_ca0132.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/sound/pci/hda/patch_ca0132.c b/sound/pci/hda/patch_ca0132.c
index 0a24037184c3..9ed808b45e75 100644
--- a/sound/pci/hda/patch_ca0132.c
+++ b/sound/pci/hda/patch_ca0132.c
@@ -8796,7 +8796,13 @@ static int patch_ca0132(struct hda_codec *codec)
}
 
if (spec->use_pci_mmio) {
+   /*
+* ifdef below needed due to lack of pci_iomap() decleration
+* for some archs when no PCI is defined
+*/
+#ifdef CONFIG_PCI
spec->mem_base = pci_iomap(codec->bus->pci, 2, 0xC20);
+#endif
if (spec->mem_base == NULL) {
codec_warn(codec, "pci_iomap failed! Setting quirk to 
QUIRK_NONE.");
spec->quirk = QUIRK_NONE;
-- 
2.19.1



Re: [alsa-devel] Build regressions/improvements in v4.20-rc1 (sound/pci/hda/patch_ca0132.c)

2018-11-06 Thread Geert Uytterhoeven
Hi Pierre,

On Tue, Nov 6, 2018 at 3:56 PM Pierre-Louis Bossart
 wrote:
> 
>  *** ERRORS ***
> 
> + /kisskb/src/sound/pci/hda/patch_ca0132.c: error: implicit 
>  declaration of function 'pci_iomap' 
>  [-Werror=implicit-function-declaration]:  => 8799:3
> >>> sh4-all{mod,yes}config
> >>>
> >>> Looks like d9b84a15892c0233 ("ALSA: hda: Fix implicit definition of
> >>> pci_iomap() on SH")
> >>> is not sufficient?
> >> Different problem.  This is about "select":
> >>
> >> config SND_SOC_ALL_CODECS
> >>  tristate "Build all ASoC CODEC drivers"
> >>
> >> That enables (sets):
> >>  select SND_SOC_HDAC_HDA
> >> which selects SND_HDA even though CONFIG_PCI is not enabled.
> >>
> >> After SND_HDA is selected (above), the Kconfig symbols in
> >> sound/pci/hda/Kconfig are available for enabling, so
> >> SND_HDA_CODEC_CA0132 is enabled but will not build.
> > Thanks for looking into this!
> >
> >> One simple solution (but possibly too naive) is:
> >>
> >> ---
> >>   sound/soc/codecs/Kconfig |2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> --- lnx-420-rc1.orig/sound/soc/codecs/Kconfig
> >> +++ lnx-420-rc1/sound/soc/codecs/Kconfig
> >> @@ -82,7 +82,7 @@ config SND_SOC_ALL_CODECS
> >>  select SND_SOC_ES7241
> >>  select SND_SOC_GTM601
> >>  select SND_SOC_HDAC_HDMI
> >> -   select SND_SOC_HDAC_HDA
> >> +   select SND_SOC_HDAC_HDA if PCI
> >>  select SND_SOC_ICS43432
> >>  select SND_SOC_INNO_RK3036
> >>  select SND_SOC_ISABELLE if I2C
> > I guess that will work. There are already plenty of "select foo if bar" 
> > lines.
> > However, looking at what else can enable SND_HDA, I think it should be
> >
> >  select SND_SOC_HDAC_HDA if SND_PCI || ARCH_TEGRA
>
> This codec can only be used by the Skylake driver (and the upcoming SOF
> one). For Tegra this module will never be used unless they follow the
> same path of enabling ASoC to deal with the HDaudio codecs instead of
> the legacy.
>
> Likewise HDAC_HDMI will only work on Intel platforms for now.
>
> >
> > That still leaves the issue that pci_iomap() on SH should be an empty stub 
> > if
> > PCI is not available, like on other architectures.
>
> I thought Mark Brown provided a fix to SH maintainers?

Indeed, https://patchwork.kernel.org/patch/10597409/

As usual, no response from the J^HSH maintainers :-(

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH] powerpc: mark 64-bit PD_HUGE constant as unsigned long

2018-11-06 Thread Daniel Axtens
When compiled for 64-bit, the PD_HUGE constant is a 64-bit integer.
Mark it as an unsigned long.

This squashes over a thousand sparse warnings on my minimal T4240RDB
(e6500, ppc64be) config, of the following 2 forms:

arch/powerpc/include/asm/hugetlb.h:52:49: warning: constant 0x8000 
is so big it is unsigned long
arch/powerpc/include/asm/nohash/pgtable.h:269:49: warning: constant 
0x8000 is so big it is unsigned long

Signed-off-by: Daniel Axtens 

---

Boot tested on a T4240RDB.
---
 arch/powerpc/include/asm/page.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f6a1265face2..9ea903221a9f 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -289,7 +289,7 @@ static inline bool pfn_valid(unsigned long pfn)
  * page tables at arbitrary addresses, this breaks and will have to change.
  */
 #ifdef CONFIG_PPC64
-#define PD_HUGE 0x8000
+#define PD_HUGE 0x8000UL
 #else
 #define PD_HUGE 0x8000
 #endif
-- 
2.17.1



[PATCH] powerpc/vas: Change to use DEFINE_SHOW_ATTRIBUTE macro

2018-11-06 Thread Yangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li 
---
 arch/powerpc/platforms/powernv/vas-debug.c | 28 --
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
index 4f7276ebdf9c..4d3929fbc08f 100644
--- a/arch/powerpc/platforms/powernv/vas-debug.c
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -30,7 +30,7 @@ static char *cop_to_str(int cop)
}
 }
 
-static int info_dbg_show(struct seq_file *s, void *private)
+static int info_show(struct seq_file *s, void *private)
 {
struct vas_window *window = s->private;
 
@@ -49,17 +49,7 @@ static int info_dbg_show(struct seq_file *s, void *private)
return 0;
 }
 
-static int info_dbg_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, info_dbg_show, inode->i_private);
-}
-
-static const struct file_operations info_fops = {
-   .open   = info_dbg_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
+DEFINE_SHOW_ATTRIBUTE(info);
 
 static inline void print_reg(struct seq_file *s, struct vas_window *win,
char *name, u32 reg)
@@ -67,7 +57,7 @@ static inline void print_reg(struct seq_file *s, struct 
vas_window *win,
seq_printf(s, "0x%016llx %s\n", read_hvwc_reg(win, name, reg), name);
 }
 
-static int hvwc_dbg_show(struct seq_file *s, void *private)
+static int hvwc_show(struct seq_file *s, void *private)
 {
struct vas_window *window = s->private;
 
@@ -115,17 +105,7 @@ static int hvwc_dbg_show(struct seq_file *s, void *private)
return 0;
 }
 
-static int hvwc_dbg_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, hvwc_dbg_show, inode->i_private);
-}
-
-static const struct file_operations hvwc_fops = {
-   .open   = hvwc_dbg_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
+DEFINE_SHOW_ATTRIBUTE(hvwc);
 
 void vas_window_free_dbgdir(struct vas_window *window)
 {
-- 
2.17.0



[RFC PATCH 14/14] selftests/powerpc: Adapt tm-syscall test to no suspend

2018-11-06 Thread Breno Leitao
The Documentation/powerpc/transactional_memory.txt says:

 "Syscalls made from within a suspended transaction are performed as normal
  and the transaction is not explicitly doomed by the kernel.  However,
  what the kernel does to perform the syscall may result in the transaction
  being doomed by the hardware."

With this new TM mechanism, the syscall will continue to be executed if the
syscall happens on a suspended syscall, but, it will always be doomed now,
because the transaction will be reclaimed and recheckpointed, which causes
the transaction to be doomed.

This test expects that the transaction is not doomed, calling
getppid_tm_suspended(), which has the following body:

FUNC_START(getppid_tm_suspended)
tbegin.
beq 1f
li  r0, __NR_getppid
tsuspend.
sc
tresume.
tend.
blr
1:
li  r3, -1
blr

This will never succeed and return the syscall output because tresume
will abort the transaction, and jumps to the failure handler, returning r3
:= -1.

This patch updates the test case to not assume that a syscall inside a
suspended transaction will not be doomed, because kernel entrace will doom
any suspended transaction now on.

Signed-off-by: Breno Leitao 
---
 tools/testing/selftests/powerpc/tm/tm-syscall.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c 
b/tools/testing/selftests/powerpc/tm/tm-syscall.c
index 454b965a2db3..1439a87eba3a 100644
--- a/tools/testing/selftests/powerpc/tm/tm-syscall.c
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
@@ -78,12 +78,6 @@ int tm_syscall(void)
timeradd(, , );
 
for (count = 0; timercmp(, , <); count++) {
-   /*
-* Test a syscall within a suspended transaction and verify
-* that it succeeds.
-*/
-   FAIL_IF(getppid_tm(true) == -1); /* Should succeed. */
-
/*
 * Test a syscall within an active transaction and verify that
 * it fails with the correct failure code.
-- 
2.19.0



[RFC PATCH 13/14] powerpc/tm: Do not restore TM without SPRs

2018-11-06 Thread Breno Leitao
Currently the signal context restore code enables the bit on the MSR
register without restoring the TM SPRs, which can cause undesired side
effects.

This is not correct because if TM is enabled in MSR, it means the TM SPR
registers are valid and updated, which is not correct here. In fact, the
live registers may contain previous' thread SPRs.

Functions check if the register values are valid or not through looking
if the facility is enabled at MSR, as MSR[TM] set means that the TM SPRs
are hot.

When just enabling MSR[TM] without updating the live SPRs, this can cause a
crash, since current TM SPR from previous thread will be saved on the
current thread, and might not have TEXASR[FS] set, for example.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/signal_64.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 487c3b6aa2e3..156b90e8ee78 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -478,8 +478,18 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
 * happened whilst in the signal handler and load_tm overflowed,
 * disabling the TM bit. In either case we can end up with an illegal
 * TM state leading to a TM Bad Thing when we return to userspace.
+*
+* Every time MSR_TM is enabled, mainly for the b) case, the TM SPRs
+* must be restored in the live registers, since the live registers
+* could contain garbage and later we want to read from live, since
+* MSR_TM is enabled, and MSR[TM] is what is used to check if the
+* TM SPRs live registers are valid or not.
 */
-   regs->msr |= MSR_TM;
+   if ((regs->msr & MSR_TM) == 0) {
+   regs->msr |= MSR_TM;
+   tm_enable();
+   tm_restore_sprs(>thread);
+   }
 
/* pull in MSR LE from user context */
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
-- 
2.19.0



[RFC PATCH 12/14] powerpc/tm: Restore transactional SPRs

2018-11-06 Thread Breno Leitao
In the previous TM code, trecheckpoint was being executed in the middle of
an exception, thus, SPRs were being restored to default kernel SPRs (as
DSCR) value after trecheckpoint was done.

With this current patchset, trecheckpoint is executed just before getting
to userspace, at ret_from_except_lite, for example. Thus, SPRs need to be
restored to transactional value.

In order to so, a function, called restore_sprs_after_recheckpoint(), was
created to restore the SPRs after recheckpoint. This function should be
called after trecheckpoint to restore transactional SPRs, otherwise the
SPRs will be clobbered (with checkpointed values) after recheckpoint. I
understand it is preferred to have it done as a C function instead of
embedding it in the ASM code.

This patch also changes tm_reclaim() that now always return with
transactional SPRs value (instead of checkpointed value), as NVGPRs.
Previously tm_reclaim() was returning with transactional NVGPRS and
checkpointed SPRS, intermixing them.  With the current patch, tm_reclaim()
just fill out thread->ck and thread->tm values, and return with
transactional values in a uniform way (for SPRS and NVGPRS). In this case,
a later save_spr() at switch_to() will not overwrite thread->sprs with
checkpointed values, but with transactional values.

Facility registers, as VEC and FP, continue to be clobbered after
tm_reclaim(), tho.

These are the SPRs that are checkpointed by the hardware: CR fields other
than CR0, LR, CTR, FPSCR, AMR, PPR, VRSAVE, VSCR, DSCR, and TAR.

This patch only cares about PPR, TAR and DSCR, because others SPRS either
volatiles, restored as part of facilities or not being handled currently as
AMR/CRs.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/asm-offsets.c |  4 
 arch/powerpc/kernel/process.c | 19 +++
 arch/powerpc/kernel/tm.S  | 19 ---
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9ffc72ded73a..93def2e23e68 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -109,6 +109,9 @@ int main(void)
OFFSET(THREAD_FPSAVEAREA, thread_struct, fp_save_area);
OFFSET(FPSTATE_FPSCR, thread_fp_state, fpscr);
OFFSET(THREAD_LOAD_FP, thread_struct, load_fp);
+   OFFSET(THREAD_DSCR, thread_struct, dscr);
+   OFFSET(THREAD_TAR, thread_struct, tar);
+
 #ifdef CONFIG_ALTIVEC
OFFSET(THREAD_VRSTATE, thread_struct, vr_state.vr);
OFFSET(THREAD_VRSAVEAREA, thread_struct, vr_save_area);
@@ -311,6 +314,7 @@ int main(void)
STACK_PT_REGS_OFFSET(ORIG_GPR3, orig_gpr3);
STACK_PT_REGS_OFFSET(RESULT, result);
STACK_PT_REGS_OFFSET(_TRAP, trap);
+   STACK_PT_REGS_OFFSET(_PPR, ppr);
 #ifndef CONFIG_PPC64
/*
 * The PowerPC 400-class & Book-E processors have neither the DAR
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 8a9c298928f9..3937408ff339 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -79,6 +79,9 @@
 #endif
 
 extern unsigned long _get_SP(void);
+static inline void save_sprs(struct thread_struct *t);
+static inline void restore_sprs_after_recheckpoint(struct thread_struct
+  *thread);
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 /*
@@ -883,6 +886,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, 
uint8_t cause)
thr->regs->msr);
giveup_all(container_of(thr, struct task_struct, thread));
 
+   /* Save SPRS before reclaim */
+   save_sprs(thr);
tm_reclaim(thr, cause);
 
/* Tag it so restore_tm_state will pay attention to this task */
@@ -939,6 +944,7 @@ void tm_recheckpoint(struct thread_struct *thread)
 
__tm_recheckpoint(thread);
 
+   restore_sprs_after_recheckpoint(thread);
local_irq_restore(flags);
 }
 
@@ -1166,6 +1172,19 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
thread_pkey_regs_restore(new_thread, old_thread);
 }
 
+static inline void restore_sprs_after_recheckpoint(struct thread_struct 
*thread)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (cpu_has_feature(CPU_FTR_DSCR))
+   mtspr(SPRN_DSCR, thread->dscr);
+
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   mtspr(SPRN_TAR, thread->tar);
+   mtspr(SPRN_FSCR, thread->fscr);
+   }
+#endif
+}
+
 #ifdef CONFIG_PPC_BOOK3S_64
 #define CP_SIZE 128
 static const u8 dummy_copy_buffer[CP_SIZE] __attribute__((aligned(CP_SIZE)));
diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 9fabdce255cd..e3090502061e 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -326,9 +326,18 @@ _GLOBAL(tm_reclaim)
mtlrr0
ld  r2, STK_GOT(r1)
 
-   /* Load CPU's default DSCR */
-   ld  r0, PACA_DSCR_DEFAULT(r13)
-   mtspr   SPRN_DSCR, 

[RFC PATCH 11/14] powerpc/tm: Save MSR to PACA before RFID

2018-11-06 Thread Breno Leitao
As other exit points, move SRR1 (MSR) into paca->tm_scratch, so, if
there is a TM Bad Thing in RFID, it is easy to understand what was the
SRR1 value being used.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/entry_64.S | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index a86619edf29d..19e460b893e8 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -296,6 +296,10 @@ BEGIN_FTR_SECTION
HMT_MEDIUM_LOW
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+   std r8, PACATMSCRATCH(r13)
+#endif
+
ld  r13,GPR13(r1)   /* only restore r13 if returning to usermode */
ld  r2,GPR2(r1)
ld  r1,GPR1(r1)
-- 
2.19.0



[RFC PATCH 10/14] powerpc/tm: Improve TM debug information

2018-11-06 Thread Breno Leitao
Add some debug information into the TM subsystem. When enable, now it
prints when there is a reclaim, recheckpoint or lazy TM disabling.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/process.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 849591bf0881..8a9c298928f9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -879,6 +879,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, 
uint8_t cause)
if (!MSR_TM_SUSPENDED(mfmsr()))
return;
 
+   TM_DEBUG("TM reclaim thread at 0x%lx, MSR=%lx\n", thr->regs->nip,
+   thr->regs->msr);
giveup_all(container_of(thr, struct task_struct, thread));
 
tm_reclaim(thr, cause);
@@ -921,6 +923,8 @@ void tm_recheckpoint(struct thread_struct *thread)
if (!(thread->regs->msr & MSR_TM))
return;
 
+   TM_DEBUG("TM recheckpoint at 0x%lx, MSR=%lx\n", thread->regs->nip,
+   thread->regs->msr);
/* We really can't be interrupted here as the TEXASR registers can't
 * change and later in the trecheckpoint code, we have a userspace R1.
 * So let's hard disable over this region.
@@ -1001,8 +1005,11 @@ static inline void __switch_to_tm(struct task_struct 
*prev,
 * that disables the TM and reenables the laziness
 * save/restore
 */
-   if (prev->thread.load_tm == 0)
+   if (prev->thread.load_tm == 0) {
prev->thread.regs->msr &= ~MSR_TM;
+   TM_DEBUG("Disabling TM facility for process %s 
(%lx)\n",
+prev->comm, prev->pid);
+   }
}
}
 
@@ -1052,6 +1059,7 @@ void restore_tm_state(struct pt_regs *regs)
if (!MSR_TM_ACTIVE(regs->msr))
return;
 
+   TM_DEBUG("Restore TM state at 0x%lx, MSR=%lx\n", regs->nip, regs->msr);
tm_enable();
/* The only place we recheckpoint */
tm_recheckpoint(>thread);
-- 
2.19.0



[RFC PATCH 09/14] powerpc/tm: Warn if state is transactional

2018-11-06 Thread Breno Leitao
Since every kernel entrance is calling TM_KERNEL_ENTRY, it is not
expected to arrive at this point with a suspended transaction.

If that is the case, cause a warning and reclaim the current thread in
order to avoid a TM Bad Thing.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/process.c | 7 +++
 arch/powerpc/kernel/signal.c  | 2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 73872f751b33..849591bf0881 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1752,11 +1752,10 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/*
-* Clear any transactional state, we're exec()ing. The cause is
-* not important as there will never be a recheckpoint so it's not
-* user visible.
+* It is a bug if the transaction was not reclaimed until this
+* point. Warn us and try to workaround it calling tm_reclaim().
 */
-   if (MSR_TM_SUSPENDED(mfmsr()))
+   if (WARN_ON(MSR_TM_SUSPENDED(mfmsr(
tm_reclaim_current(0);
 #endif
 
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index b3e8db376ecd..cbaccc2be0fb 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -203,7 +203,7 @@ unsigned long get_tm_stackpointer(struct task_struct *tsk)
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
BUG_ON(tsk != current);
 
-   if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
+   if (WARN_ON(MSR_TM_ACTIVE(mfmsr( {
tm_reclaim_current(TM_CAUSE_SIGNAL);
if (MSR_TM_TRANSACTIONAL(tsk->thread.regs->msr))
return tsk->thread.ckpt_regs.gpr[1];
-- 
2.19.0



[RFC PATCH 08/14] powerpc/tm: Recheckpoint at exit path

2018-11-06 Thread Breno Leitao
In the past, TIF_RESTORE_TM was being handled with the rest of the TIF workers,
but, that was too early, and can cause some IRQ to be replayed in suspended
state (after recheckpoint).

This patch moves TIF_RESTORE_TM handler to as late as possible, it also
forces the IRQ to be disabled, and it will continue to be until RFID, so,
no IRQ will be replayed at all. I.e, if trecheckpoint happens, it will RFID
to userspace.

This makes TIF_RESTORE_TM a special case that should not be handled
similarly to the _TIF_USER_WORK_MASK.

Since _TIF_RESTORE_TM is not part of _TIF_USER_WORK_MASK anymore, we
need to force system_call_exit to continue to leaves through
fast_exception_return, so, we add the flags together with
_TIF_USER_WORK_MASK at system_call_exist path.

If this flag is set at system_call_exit, it means that recheckpoint
will be called, and doing it through fast_exception_return is the only
way to do so.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/include/asm/thread_info.h |  2 +-
 arch/powerpc/kernel/entry_64.S | 23 ++-
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 544cac0474cb..2835d60bc9ef 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -139,7 +139,7 @@ void arch_setup_new_exec(void);
 
 #define _TIF_USER_WORK_MASK(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
 _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
-_TIF_RESTORE_TM | _TIF_PATCH_PENDING | \
+_TIF_PATCH_PENDING | \
 _TIF_FSCHECK)
 #define _TIF_PERSYSCALL_MASK   (_TIF_RESTOREALL|_TIF_NOERROR)
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 17484ebda66c..a86619edf29d 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -255,7 +255,7 @@ system_call_exit:
 
ld  r9,TI_FLAGS(r12)
li  r11,-MAX_ERRNO
-   andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
+   andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK
 |_TIF_RESTORE_TM)
bne-.Lsyscall_exit_work
 
andi.   r0,r8,MSR_FP
@@ -784,14 +784,6 @@ _GLOBAL(ret_from_except_lite)
SCHEDULE_USER
b   ret_from_except_lite
 2:
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-   andi.   r0,r4,_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM
-   bne 3f  /* only restore TM if nothing else to do */
-   addir3,r1,STACK_FRAME_OVERHEAD
-   bl  restore_tm_state
-   b   restore
-3:
-#endif
bl  save_nvgprs
/*
 * Use a non volatile GPR to save and restore our thread_info flags
@@ -938,6 +930,19 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 */
.globl  fast_exception_return
 fast_exception_return:
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+   CURRENT_THREAD_INFO(r4, r1)
+   ld  r4,TI_FLAGS(r4)
+   andi.   r0,r4,_TIF_RESTORE_TM
+   beq 22f
+   ld  r4,_MSR(r1) /* TODO: MSR[!PR] shouldn't be here */
+   andi.   r0,r4,MSR_PR
+   beq 22f  /* Skip if Kernel thread */
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  restore_tm_state
+22:
+#endif
ld  r3,_MSR(r1)
ld  r4,_CTR(r1)
ld  r0,_LINK(r1)
-- 
2.19.0



[RFC PATCH 07/14] powerpc/tm: Do not reclaim on ptrace

2018-11-06 Thread Breno Leitao
Make sure that we are not suspended on ptrace and that the registers were
already reclaimed.

Since the data was already reclaimed, there is nothing to be done here
except to restore the SPRs.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/ptrace.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index afb819f4ca68..4faf0612d58c 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -136,9 +136,19 @@ static void flush_tmregs_to_thread(struct task_struct *tsk)
if ((!cpu_has_feature(CPU_FTR_TM)) || (tsk != current))
return;
 
-   if (MSR_TM_SUSPENDED(mfmsr())) {
-   tm_reclaim_current(TM_CAUSE_SIGNAL);
-   } else {
+   if (WARN_ON(MSR_TM_SUSPENDED(mfmsr( {
+   /*
+* We should never be here in suspended state. This is a
+* bug!
+*/
+   tm_reclaim_current(0x99);
+   }
+   if (tsk->thread.regs->msr & MSR_TM) {
+   /*
+* Only flush TM SPRs to the thread if TM was enabled,
+* otherwise (TM lazily disabled), the thread already
+* contains the latest SPR value
+*/
tm_enable();
tm_save_sprs(&(tsk->thread));
}
-- 
2.19.0



[RFC PATCH 05/14] powerpc/tm: Refactor the __switch_to_tm code

2018-11-06 Thread Breno Leitao
__switch_to_tm is the function that switches between two tasks which might
have TM enabled. This function is clearly split in two parts, the task that
is leaving the CPU, known as 'prev' and the task that is being scheduled,
known as 'new'.

It starts checking if the previous task had TM enable, if so, it increases
the load_tm (this is the only place we increment load_tm). It also saves
the TM SPRs here.

If the previous task was scheduled out with a transaction active, the
failure cause needs to be updated, since it might contain the failure cause
that caused the exception, as TM_CAUSE_MISC. In this case, since there was
a context switch, overwrite the failure cause.

If the previous task has overflowed load_tm, disable TM, putting the
facility save/restore lazy mechanism at lazy mode.

Regarding the 'new' task being scheduled, restoring TM SPRs is enough if
the task had TM enabled when it was de-scheduled. (Checking if a
recheckpoint would be required will be done later, at restore_tm_state()
stage.)

On top of that, both tm_reclaim_task() and tm_recheckpoint_new_task()
functions are not used anymore, removing them.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/process.c | 167 --
 1 file changed, 78 insertions(+), 89 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1842fd96b123..73872f751b33 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -912,48 +912,6 @@ void tm_reclaim_current(uint8_t cause)
tm_reclaim_thread(>thread, cause);
 }
 
-static inline void tm_reclaim_task(struct task_struct *tsk)
-{
-   /* We have to work out if we're switching from/to a task that's in the
-* middle of a transaction.
-*
-* In switching we need to maintain a 2nd register state as
-* oldtask->thread.ckpt_regs.  We tm_reclaim(oldproc); this saves the
-* checkpointed (tbegin) state in ckpt_regs, ckfp_state and
-* ckvr_state
-*
-* We also context switch (save) TFHAR/TEXASR/TFIAR in here.
-*/
-   struct thread_struct *thr = >thread;
-
-   if (!thr->regs)
-   return;
-
-   if (!MSR_TM_ACTIVE(thr->regs->msr))
-   goto out_and_saveregs;
-
-   WARN_ON(tm_suspend_disabled);
-
-   TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, "
-"ccr=%lx, msr=%lx, trap=%lx)\n",
-tsk->pid, thr->regs->nip,
-thr->regs->ccr, thr->regs->msr,
-thr->regs->trap);
-
-   tm_reclaim_thread(thr, TM_CAUSE_RESCHED);
-
-   TM_DEBUG("--- tm_reclaim on pid %d complete\n",
-tsk->pid);
-
-out_and_saveregs:
-   /* Always save the regs here, even if a transaction's not active.
-* This context-switches a thread's TM info SPRs.  We do it here to
-* be consistent with the restore path (in recheckpoint) which
-* cannot happen later in _switch().
-*/
-   tm_save_sprs(thr);
-}
-
 extern void __tm_recheckpoint(struct thread_struct *thread);
 
 void tm_recheckpoint(struct thread_struct *thread)
@@ -980,59 +938,91 @@ void tm_recheckpoint(struct thread_struct *thread)
local_irq_restore(flags);
 }
 
-static inline void tm_recheckpoint_new_task(struct task_struct *new)
+static void tm_change_failure_cause(struct task_struct *task, uint8_t cause)
 {
-   if (!cpu_has_feature(CPU_FTR_TM))
-   return;
-
-   /* Recheckpoint the registers of the thread we're about to switch to.
-*
-* If the task was using FP, we non-lazily reload both the original and
-* the speculative FP register states.  This is because the kernel
-* doesn't see if/when a TM rollback occurs, so if we take an FP
-* unavailable later, we are unable to determine which set of FP regs
-* need to be restored.
-*/
-   if (!tm_enabled(new))
-   return;
-
-   if (!MSR_TM_ACTIVE(new->thread.regs->msr)){
-   tm_restore_sprs(>thread);
-   return;
-   }
-   /* Recheckpoint to restore original checkpointed register state. */
-   TM_DEBUG("*** tm_recheckpoint of pid %d (new->msr 0x%lx)\n",
-new->pid, new->thread.regs->msr);
-
-   tm_recheckpoint(>thread);
-
-   /*
-* The checkpointed state has been restored but the live state has
-* not, ensure all the math functionality is turned off to trigger
-* restore_math() to reload.
-*/
-   new->thread.regs->msr &= ~(MSR_FP | MSR_VEC | MSR_VSX);
-
-   TM_DEBUG("*** tm_recheckpoint of pid %d complete "
-"(kernel msr 0x%lx)\n",
-new->pid, mfmsr());
+   task->thread.tm_texasr &= ~TEXASR_FC;
+   task->thread.tm_texasr |= (unsigned long) cause << TEXASR_FC_LG;
 }
 
 static inline void __switch_to_tm(struct task_struct *prev,
struct task_struct *new)
 {
-   if 

[RFC PATCH 06/14] powerpc/tm: Do not recheckpoint at sigreturn

2018-11-06 Thread Breno Leitao
Do not recheckpoint at signal code return. Just make sure TIF_RESTORE_TM is
set, which will restore on the exit to userspace by restore_tm_state.

All the FP and VEC lazy restore was already done by tm_reclaim_current(),
where it checked if FP/VEC was set, and filled out the ckfp and ckvr
registers area to the expected value.

The load_fp_state() and load_vr_state() is being called to restore the
clobbered registers after tm_recheckpoint(). This is not necessary anymore,
since restore_tm_state() does it already, so, this is only required to be
done after trecheckpoint(), and the only place calling it is
restore_tm_state().

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/signal_32.c | 38 +
 arch/powerpc/kernel/signal_64.c | 30 --
 2 files changed, 19 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index e6474a45cef5..046030d7921e 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -850,28 +850,11 @@ static long restore_tm_user_regs(struct pt_regs *regs,
return 1;
/* Pull in the MSR TM bits from the user context */
regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr_hi & MSR_TS_MASK);
-   /* Now, recheckpoint.  This loads up all of the checkpointed (older)
-* registers, including FP and V[S]Rs.  After recheckpointing, the
-* transactional versions should be loaded.
-*/
-   tm_enable();
+
/* Make sure the transaction is marked as failed */
current->thread.tm_texasr |= TEXASR_FS;
-   /* This loads the checkpointed FP/VEC state, if used */
-   tm_recheckpoint(>thread);
-
-   /* This loads the speculative FP/VEC state, if used */
-   msr_check_and_set(msr & (MSR_FP | MSR_VEC));
-   if (msr & MSR_FP) {
-   load_fp_state(>thread.fp_state);
-   regs->msr |= (MSR_FP | current->thread.fpexc_mode);
-   }
-#ifdef CONFIG_ALTIVEC
-   if (msr & MSR_VEC) {
-   load_vr_state(>thread.vr_state);
-   regs->msr |= MSR_VEC;
-   }
-#endif
+   /* Make sure restore_tm_state will be called */
+   set_thread_flag(TIF_RESTORE_TM);
 
return 0;
 }
@@ -1156,16 +1139,15 @@ SYSCALL_DEFINE0(rt_sigreturn)
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/*
-* If there is a transactional state then throw it away.
-* The purpose of a sigreturn is to destroy all traces of the
-* signal frame, this includes any transactional state created
-* within in. We only check for suspended as we can never be
-* active in the kernel, we are active, there is nothing better to
-* do than go ahead and Bad Thing later.
-* The cause is not important as there will never be a
+* If the transaction is active at this point, it means that
+* TM_KERNEL_ENTRY was not invoked properly and it is a bug.
+* If this is the case, print a warning and try to work around,
+* calling tm_reclaim_current() to discard the footprint.
+*
+* The failure cause is not important as there will never be a
 * recheckpoint so it's not user visible.
 */
-   if (MSR_TM_SUSPENDED(mfmsr()))
+   if (WARN_ON(MSR_TM_SUSPENDED(mfmsr(
tm_reclaim_current(0);
 
if (__get_user(tmp, _sf->uc.uc_link))
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 83d51bf586c7..487c3b6aa2e3 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -569,21 +569,10 @@ static long restore_tm_sigcontexts(struct task_struct 
*tsk,
}
}
 #endif
-   tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
-   /* This loads the checkpointed FP/VEC state, if used */
-   tm_recheckpoint(>thread);
-
-   msr_check_and_set(msr & (MSR_FP | MSR_VEC));
-   if (msr & MSR_FP) {
-   load_fp_state(>thread.fp_state);
-   regs->msr |= (MSR_FP | tsk->thread.fpexc_mode);
-   }
-   if (msr & MSR_VEC) {
-   load_vr_state(>thread.vr_state);
-   regs->msr |= MSR_VEC;
-   }
+   /* Guarantee that restore_tm_state() will be called */
+   set_thread_flag(TIF_RESTORE_TM);
 
return err;
 }
@@ -717,16 +706,15 @@ SYSCALL_DEFINE0(rt_sigreturn)
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/*
-* If there is a transactional state then throw it away.
-* The purpose of a sigreturn is to destroy all traces of the
-* signal frame, this includes any transactional state created
-* within in. We only check for suspended as we can never be
-* active in the kernel, we are active, there is nothing better to
-* do than go ahead and Bad Thing later.
-* The cause is not important as there will never 

[RFC PATCH 04/14] powerpc/tm: Always set TIF_RESTORE_TM on reclaim

2018-11-06 Thread Breno Leitao
If the task data was reclaimed (through TM_KERNEL_ENTRY), then it needs to
be recheckpointed later, once exiting to userspace. The recheckpoint is
done by restore_tm_state() function, which is called on our way to
userspace if the task has the TIF_RESTORE_TM flag set.

This patch makes sure the task has TIF_RESTORE_TM tag set every time there
is a reclaim, so, a recheckpoint will be executed later.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/process.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index c7e758a42b8f..1842fd96b123 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -883,6 +883,9 @@ static void tm_reclaim_thread(struct thread_struct *thr, 
uint8_t cause)
 
tm_reclaim(thr, cause);
 
+   /* Tag it so restore_tm_state will pay attention to this task */
+   set_thread_flag(TIF_RESTORE_TM);
+
/*
 * If we are in a transaction and FP is off then we can't have
 * used FP inside that transaction. Hence the checkpointed
-- 
2.19.0



[RFC PATCH 03/14] powerpc/tm: Recheckpoint when exiting from kernel

2018-11-06 Thread Breno Leitao
This is the only place we are going to recheckpoint now. Now the task
needs to have TIF_RESTORE_TM flag set, which will get into
restore_tm_state() at exception exit path, and execute the recheckpoint
depending on the MSR.

Every time a task is required to recheckpoint, or just have the TM SPRs
restore, the TIF_RESTORE_TM flag should be set and the task MSR should
properly be in a transactional state, which will be checked by
restore_tm_state().

After the facility registers are recheckpointed, they are clobbered with
the values that were recheckpointed (and are now also in the checkpoint
area).

If facility is enabled at MSR that is being returned to user space, then
the facility registers need to be restored, otherwise userspace will see
invalid values.

This patch simplify the restore_tm_state() to just restore the facility
registers that are enabled when returning to userspace, i.e. the MSR will
be the same that will be put into SRR1, which will be the MSR after RFID.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/process.c | 38 ---
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 4d5322cfad25..c7e758a42b8f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1049,8 +1049,6 @@ static inline void __switch_to_tm(struct task_struct 
*prev,
  */
 void restore_tm_state(struct pt_regs *regs)
 {
-   unsigned long msr_diff;
-
/*
 * This is the only moment we should clear TIF_RESTORE_TM as
 * it is here that ckpt_regs.msr and pt_regs.msr become the same
@@ -1061,19 +1059,35 @@ void restore_tm_state(struct pt_regs *regs)
if (!MSR_TM_ACTIVE(regs->msr))
return;
 
-   msr_diff = current->thread.ckpt_regs.msr & ~regs->msr;
-   msr_diff &= MSR_FP | MSR_VEC | MSR_VSX;
+   tm_enable();
+   /* The only place we recheckpoint */
+   tm_recheckpoint(>thread);
 
-   /* Ensure that restore_math() will restore */
-   if (msr_diff & MSR_FP)
-   current->thread.load_fp = 1;
+   /*
+* Restore the facility registers that were clobbered during
+* recheckpoint.
+*/
+   if (regs->msr & MSR_FP) {
+   /*
+* Using load_fp_state() instead of restore_fp() because we
+* want to force the restore, independent of
+* tsk->thread.load_fp. Same for other cases below.
+*/
+   load_fp_state(>thread.fp_state);
+   }
 #ifdef CONFIG_ALTIVEC
-   if (cpu_has_feature(CPU_FTR_ALTIVEC) && msr_diff & MSR_VEC)
-   current->thread.load_vec = 1;
+   if (cpu_has_feature(CPU_FTR_ALTIVEC) && regs->msr & MSR_VEC)
+   load_vr_state(>thread.vr_state);
+#endif
+#ifdef CONFIG_VSX
+   if (cpu_has_feature(CPU_FTR_VSX) && regs->msr & MSR_VSX) {
+   /*
+* If VSX is enabled, it is expected that VEC and FP are
+* also enabled and already restored the full register set.
+* Cause a warning if that is not the case.
+*/
+   WARN_ON(!(regs->msr & MSR_VEC) || !(regs->msr & MSR_FP)); }
 #endif
-   restore_math(regs);
-
-   regs->msr |= msr_diff;
 }
 
 #else
-- 
2.19.0



[RFC PATCH 02/14] powerpc/tm: Reclaim on unavailable exception

2018-11-06 Thread Breno Leitao
If there is a FP/VEC/Altivec touch inside a transaction and the facility is
disabled, then a facility unavailable exception is raised and ends up
calling {fp,vec,vsx}_unavailable_tm, which was reclaiming and
recheckpointing.

This is not required anymore, since the checkpointed state was reclaimed in
the exception entrance, and it will be recheckpointed by restore_tm_state
later.

Adding a WARN_ON() warning if we hit the _unavailable_tm() in suspended
mode, i.e, the reclaim was not executed somehow in the trap entrance, and
this is a bug.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/include/asm/exception-64s.h |  4 
 arch/powerpc/kernel/exceptions-64s.S |  3 +++
 arch/powerpc/kernel/traps.c  | 22 --
 3 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 931a74ba037b..80f01d5683c3 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -711,6 +711,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 * Soft disable the IRQs, otherwise it might cause a CPU hang.  \
 */ \
RECONCILE_IRQ_STATE(r10, r11);  \
+   /*  \
+* Although this cause will be set initially, it might be   \
+* updated later, once the exception is better understood   \
+*/ \
li  r3, cause;  \
bl  tm_reclaim_current; \
li  r3, 1;  /* Reclaim happened */  \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 5c685a46202d..47e05b09eed6 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -786,6 +786,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 2: /* User process was in a transaction */
bl  save_nvgprs
+   TM_KERNEL_ENTRY(TM_CAUSE_FAC_UNAV)
RECONCILE_IRQ_STATE(r10, r11)
addir3,r1,STACK_FRAME_OVERHEAD
bl  fp_unavailable_tm
@@ -1128,6 +1129,7 @@ BEGIN_FTR_SECTION
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 2: /* User process was in a transaction */
bl  save_nvgprs
+   TM_KERNEL_ENTRY(TM_CAUSE_FAC_UNAV)
RECONCILE_IRQ_STATE(r10, r11)
addir3,r1,STACK_FRAME_OVERHEAD
bl  altivec_unavailable_tm
@@ -1164,6 +1166,7 @@ BEGIN_FTR_SECTION
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 2: /* User process was in a transaction */
bl  save_nvgprs
+   TM_KERNEL_ENTRY(TM_CAUSE_FAC_UNAV)
RECONCILE_IRQ_STATE(r10, r11)
addir3,r1,STACK_FRAME_OVERHEAD
bl  vsx_unavailable_tm
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 9a86572db1ef..e74b735e974c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1742,23 +1742,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
  * transaction, and probably retry but now with FP enabled.  So the
  * checkpointed FP registers need to be loaded.
 */
-   tm_reclaim_current(TM_CAUSE_FAC_UNAV);
-
-   /*
-* Reclaim initially saved out bogus (lazy) FPRs to ckfp_state, and
-* then it was overwrite by the thr->fp_state by tm_reclaim_thread().
-*
-* At this point, ck{fp,vr}_state contains the exact values we want to
-* recheckpoint.
-*/
+   WARN_ON(MSR_TM_SUSPENDED(mfmsr()));
 
/* Enable FP for the task: */
current->thread.load_fp = 1;
-
-   /*
-* Recheckpoint all the checkpointed ckpt, ck{fp, vr}_state registers.
-*/
-   tm_recheckpoint(>thread);
 }
 
 void altivec_unavailable_tm(struct pt_regs *regs)
@@ -1770,10 +1757,10 @@ void altivec_unavailable_tm(struct pt_regs *regs)
TM_DEBUG("Vector Unavailable trap whilst transactional at 0x%lx,"
 "MSR=%lx\n",
 regs->nip, regs->msr);
-   tm_reclaim_current(TM_CAUSE_FAC_UNAV);
+   WARN_ON(MSR_TM_SUSPENDED(mfmsr()));
current->thread.load_vec = 1;
-   tm_recheckpoint(>thread);
current->thread.used_vr = 1;
+
 }
 
 void vsx_unavailable_tm(struct pt_regs *regs)
@@ -1792,12 +1779,11 @@ void vsx_unavailable_tm(struct pt_regs *regs)
current->thread.used_vsr = 1;
 
/* This reclaims FP and/or VR regs if they're already enabled */
-   tm_reclaim_current(TM_CAUSE_FAC_UNAV);
+   WARN_ON(MSR_TM_SUSPENDED(mfmsr()));
 
current->thread.load_vec = 1;
current->thread.load_fp = 1;
 
-   tm_recheckpoint(>thread);
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-- 
2.19.0



[RFC PATCH 01/14] powerpc/tm: Reclaim transaction on kernel entry

2018-11-06 Thread Breno Leitao
This patch creates a macro that will be invoked on all entrance to the
kernel, so, in kernel space the transaction will be completely reclaimed
and not suspended anymore.

This patchset checks if we are coming from PR, if not, skip. This is useful
when there is a irq_replay() being called after recheckpoint, when the IRQ
is re-enable. In this case, we do not want to re-reclaim and
re-recheckpoint, thus, if not coming from PR, skip it completely.

This macro does not care about TM SPR also, it will only be saved and
restore in the context switch code now on.

This macro will return 0 or 1 in r3 register, to specify if a reclaim was
executed or not.

This patchset is based on initial work done by Cyril:
https://patchwork.ozlabs.org/cover/875341/

Signed-off-by: Breno Leitao 
---
 arch/powerpc/include/asm/exception-64s.h | 46 
 arch/powerpc/kernel/entry_64.S   | 10 ++
 arch/powerpc/kernel/exceptions-64s.S | 12 +--
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..931a74ba037b 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -36,6 +36,7 @@
  */
 #include 
 #include 
+#include 
 
 /* PACA save area offsets (exgen, exmc, etc) */
 #define EX_R9  0
@@ -677,10 +678,54 @@ BEGIN_FTR_SECTION \
beqlppc64_runlatch_on_trampoline;   \
 END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+
+/*
+ * This macro will reclaim a transaction if called when coming from userspace
+ * (MSR.PR = 1) and if the transaction state is active or suspended.
+ *
+ * Since we don't want to reclaim when coming from kernel, for instance after
+ * a trechkpt. or a IRQ replay, the live MSR is not useful and instead of it 
the
+ * MSR from thread stack is used to check the MSR.PR bit.
+ * This macro has one argument which is the cause that will be used by 
treclaim.
+ * and returns in r3 '1' if the reclaim happens or '0' if reclaim didn't
+ * happen, which is useful to know what registers were clobbered.
+ *
+ * NOTE: If addition registers are clobbered here, make sure the callee
+ * function restores them before proceeding.
+ */
+#define TM_KERNEL_ENTRY(cause) \
+   ld  r3, _MSR(r1);   \
+   andi.   r0, r3, MSR_PR; /* Coming from userspace? */\
+   beq 1f; /* Skip reclaim if MSR.PR != 1 */   \
+   rldicl. r0, r3, (64-MSR_TM_LG), 63; /* Is TM enabled? */\
+   beq 1f; /* Skip reclaim if TM is off */ \
+   rldicl. r0, r3, (64-MSR_TS_LG), 62; /* Is active */ \
+   beq 1f; /* Skip reclaim if neither */   \
+   /*  \
+* If there is a transaction active or suspended, save the  \
+* non-volatile GPRs if they are not already saved. \
+*/ \
+   bl  save_nvgprs;\
+   /*  \
+* Soft disable the IRQs, otherwise it might cause a CPU hang.  \
+*/ \
+   RECONCILE_IRQ_STATE(r10, r11);  \
+   li  r3, cause;  \
+   bl  tm_reclaim_current; \
+   li  r3, 1;  /* Reclaim happened */  \
+   b   2f; \
+1: li  r3, 0;  /* Reclaim didn't happen */ \
+2:
+#else
+#define TM_KERNEL_ENTRY(cause)
+#endif
+
 #define EXCEPTION_COMMON(area, trap, label, hdlr, ret, additions) \
EXCEPTION_PROLOG_COMMON(trap, area);\
/* Volatile regs are potentially clobbered here */  \
additions;  \
+   TM_KERNEL_ENTRY(TM_CAUSE_MISC); \
addir3,r1,STACK_FRAME_OVERHEAD; \
bl  hdlr;   \
b   ret
@@ -695,6 +740,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
EXCEPTION_PROLOG_COMMON_3(trap);\
/* Volatile regs are potentially clobbered here */  \
additions;  \
+   TM_KERNEL_ENTRY(TM_CAUSE_MISC); \
addir3,r1,STACK_FRAME_OVERHEAD; \
bl  hdlr
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 

[RFC PATCH v2 00/14] New TM Model

2018-11-06 Thread Breno Leitao
This  patchset for the hardware transactional memory (TM) subsystem
aims to avoid spending a lot of time on TM suspended mode in kernel
space.  It basically changes where the reclaim/recheckpoint will be
executed.

The hardware is designed so once a CPU enters in transactional state it
uses a footprint area to track down the loads/stores performed in
transaction so it can be verified later to decide if a conflict happened
due to some change done in that state by another thread.  If a transaction
is active in userspace and there is an exception that takes the CPU to the
kernel space, the CPU moves the transaction state to suspended state but
does not discard the registers (GPR,VEC,VSX,FP) from footprint area,
although the memory footprint might be discarded.

POWER9 has a known problem [1, 2] and does not have enough room in
footprint area for several transactions to be suspended at the same time
on various CPUs leading to CPU stalls.

This new model, together with a future 'fake userspace suspended'
implementation may workaround POWER9 hardware issue.

This patchset aims to reclaim the checkpointed registers as soon as the
kernel is invoked, in the beginning of the exception handlers, thus freeing
room to other CPUs enter in suspended mode for a short period of time as
soon as possible, avoiding too many CPUs in suspended state that can cause
the CPUs to stall. The same mechanism is done on kernel exit, doing a
recheckpoint as late as possible (which will reload the checkpointed
registers into CPU's checkpoint area) at the exception return path.

The way to achieve this goal is creating a macro (TM_KERNEL_ENTRY) which
will check if userspace was in an active transaction just after getting
into kernel space and reclaim the transaction if that's the case. Thus all
exception handlers will call this macro as soon as possible.

All exceptions should reclaim (if necessary) at this stage and only
recheckpoint if the task is tagged as TIF_RESTORE_TM (i.e. was in
transactional state before being interrupted), which will be done at
restore_tm_state().

Hence, by allowing the CPU to be in suspended state for only a brief period
it's possible to create the initial infrastructure that copes with the TM
hardware limitations.

This patchset was tested in different scenarios using different test
suites, as the kernel selftests, OpenJDK TM tests, and htm-torture [3], in the
following configuration:

 * POWER8/pseries LE and BE
 * POWER8/powernv LE
 * POWER9/pseries LE 
 * POWER8/powernv LE hosting KVM guests running TM tests

This patchset is based on initial work done by Cyril Bur:
https://patchwork.ozlabs.org/cover/875341/

V1 patchset URL: https://patchwork.ozlabs.org/cover/969173/

Major Change from v1:
 
 * restore_tm_state() being called later at the kernel exit path, so, there is
   no way to replay any IRQ, which will be done with TM in suspended state.
   This is mostly described in the 'Recheckpoint at exit path' patch.

 * No neeed to force TEXASR[FS] bit explicitly. This was required because
   in a very specific case, TEXASR SPR was not being restored properly but
   MSR[TM] was set. Fixed in patch 'Do not restore TM without SPRs'.

 * All treclaim/trechkpoint have a WARN_ON() if not called on kernel
   entrance or exit path. tm_reclaim() is only called by TM_KERNEL_ENTRY
   and tm_recheckpoint is only called by restore_tm_state(). All the rest
   causes a warning.
 
Regards,
Breno

[1] Documentation/powerpc/transactional_memory.txt
[2] commit 4bb3c7a0208fc13ca70598efd109901a7cd45ae7
[3] https://github.com/leitao/htm_torture/

Breno Leitao (14):
  powerpc/tm: Reclaim transaction on kernel entry
  powerpc/tm: Reclaim on unavailable exception
  powerpc/tm: Recheckpoint when exiting from kernel
  powerpc/tm: Always set TIF_RESTORE_TM on reclaim
  powerpc/tm: Refactor the __switch_to_tm code
  powerpc/tm: Do not recheckpoint at sigreturn
  powerpc/tm: Do not reclaim on ptrace
  powerpc/tm: Recheckpoint at exit path
  powerpc/tm: Warn if state is transactional
  powerpc/tm: Improve TM debug information
  powerpc/tm: Save MSR to PACA before RFID
  powerpc/tm: Restore transactional SPRs
  powerpc/tm: Do not restore TM without SPRs
  selftests/powerpc: Adapt tm-syscall test to no suspend

 arch/powerpc/include/asm/exception-64s.h  |  50 
 arch/powerpc/include/asm/thread_info.h|   2 +-
 arch/powerpc/kernel/asm-offsets.c |   4 +
 arch/powerpc/kernel/entry_64.S|  37 ++-
 arch/powerpc/kernel/exceptions-64s.S  |  15 +-
 arch/powerpc/kernel/process.c | 242 ++
 arch/powerpc/kernel/ptrace.c  |  16 +-
 arch/powerpc/kernel/signal.c  |   2 +-
 arch/powerpc/kernel/signal_32.c   |  38 +--
 arch/powerpc/kernel/signal_64.c   |  42 ++-
 arch/powerpc/kernel/tm.S  |  19 +-
 arch/powerpc/kernel/traps.c   |  22 +-
 .../testing/selftests/powerpc/tm/tm-syscall.c 

[PATCH 3/3] powerpc/mm/64s: Only use slbfee on CPUs that support it

2018-11-06 Thread Michael Ellerman
The slbfee instruction was only added in ISA 2.05 (Power6), it's not
supported on older CPUs. We don't have a CPU feature for that ISA
version though, so just use the ISA 2.06 feature flag.

Fixes: e15a4fea4dee ("powerpc/64s/hash: Add some SLB debugging tests")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/slb.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 457fd29448b1..b663a36f9ada 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -66,6 +66,9 @@ static void assert_slb_presence(bool present, unsigned long 
ea)
 
WARN_ON_ONCE(mfmsr() & MSR_EE);
 
+   if (!cpu_has_feature(CPU_FTR_ARCH_206))
+   return;
+
asm volatile(__PPC_SLBFEE_DOT(%0, %1) : "=r"(tmp) : "r"(ea) : "cr0");
 
WARN_ON(present == (tmp == 0));
-- 
2.17.2



[PATCH 2/3] powerpc/mm/64s: Use PPC_SLBFEE macro

2018-11-06 Thread Michael Ellerman
Old toolchains don't know about slbfee and break the build, eg:
  {standard input}:37: Error: Unrecognized opcode: `slbfee.'

Fix it by using the macro version. We need to add an underscore
version that takes raw register numbers from the inline asm, rather
than our Rx macros.

Fixes: e15a4fea4dee ("powerpc/64s/hash: Add some SLB debugging tests")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/ppc-opcode.h | 2 ++
 arch/powerpc/mm/slb.c | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 6093bc8f74e5..a6e9e314c707 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -493,6 +493,8 @@
__PPC_RS(t) | __PPC_RA0(a) | 
__PPC_RB(b))
 #define PPC_SLBFEE_DOT(t, b)   stringify_in_c(.long PPC_INST_SLBFEE | \
__PPC_RT(t) | __PPC_RB(b))
+#define __PPC_SLBFEE_DOT(t, b) stringify_in_c(.long PPC_INST_SLBFEE |  \
+  ___PPC_RT(t) | ___PPC_RB(b))
 #define PPC_ICBT(c,a,b)stringify_in_c(.long PPC_INST_ICBT | \
   __PPC_CT(c) | __PPC_RA0(a) | __PPC_RB(b))
 /* PASemi instructions */
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index f3e002ee457b..457fd29448b1 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -65,7 +66,7 @@ static void assert_slb_presence(bool present, unsigned long 
ea)
 
WARN_ON_ONCE(mfmsr() & MSR_EE);
 
-   asm volatile("slbfee. %0, %1" : "=r"(tmp) : "r"(ea) : "cr0");
+   asm volatile(__PPC_SLBFEE_DOT(%0, %1) : "=r"(tmp) : "r"(ea) : "cr0");
 
WARN_ON(present == (tmp == 0));
 #endif
-- 
2.17.2



[PATCH 1/3] powerpc/mm/64s: Consolidate SLB assertions

2018-11-06 Thread Michael Ellerman
The code for assert_slb_exists() and assert_slb_notexists() is almost
identical, except for the polarity of the WARN_ON(). In a future patch
we'll need to modify this code, so consolidate it now into a single
function.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/slb.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index c3fdf2969d9f..f3e002ee457b 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -58,7 +58,7 @@ static inline unsigned long mk_vsid_data(unsigned long ea, 
int ssize,
return __mk_vsid_data(get_kernel_vsid(ea, ssize), ssize, flags);
 }
 
-static void assert_slb_exists(unsigned long ea)
+static void assert_slb_presence(bool present, unsigned long ea)
 {
 #ifdef CONFIG_DEBUG_VM
unsigned long tmp;
@@ -66,19 +66,8 @@ static void assert_slb_exists(unsigned long ea)
WARN_ON_ONCE(mfmsr() & MSR_EE);
 
asm volatile("slbfee. %0, %1" : "=r"(tmp) : "r"(ea) : "cr0");
-   WARN_ON(tmp == 0);
-#endif
-}
 
-static void assert_slb_notexists(unsigned long ea)
-{
-#ifdef CONFIG_DEBUG_VM
-   unsigned long tmp;
-
-   WARN_ON_ONCE(mfmsr() & MSR_EE);
-
-   asm volatile("slbfee. %0, %1" : "=r"(tmp) : "r"(ea) : "cr0");
-   WARN_ON(tmp != 0);
+   WARN_ON(present == (tmp == 0));
 #endif
 }
 
@@ -114,7 +103,7 @@ static inline void create_shadowed_slbe(unsigned long ea, 
int ssize,
 */
slb_shadow_update(ea, ssize, flags, index);
 
-   assert_slb_notexists(ea);
+   assert_slb_presence(false, ea);
asm volatile("slbmte  %0,%1" :
 : "r" (mk_vsid_data(ea, ssize, flags)),
   "r" (mk_esid_data(ea, ssize, index))
@@ -137,7 +126,7 @@ void __slb_restore_bolted_realmode(void)
   "r" (be64_to_cpu(p->save_area[index].esid)));
}
 
-   assert_slb_exists(local_paca->kstack);
+   assert_slb_presence(true, local_paca->kstack);
 }
 
 /*
@@ -185,7 +174,7 @@ void slb_flush_and_restore_bolted(void)
 :: "r" (be64_to_cpu(p->save_area[KSTACK_INDEX].vsid)),
"r" (be64_to_cpu(p->save_area[KSTACK_INDEX].esid))
 : "memory");
-   assert_slb_exists(get_paca()->kstack);
+   assert_slb_presence(true, get_paca()->kstack);
 
get_paca()->slb_cache_ptr = 0;
 
@@ -443,9 +432,9 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
ea = (unsigned long)
get_paca()->slb_cache[i] << SID_SHIFT;
/*
-* Could assert_slb_exists here, but hypervisor
-* or machine check could have come in and
-* removed the entry at this point.
+* Could assert_slb_presence(true) here, but
+* hypervisor or machine check could have come
+* in and removed the entry at this point.
 */
 
slbie_data = ea;
@@ -676,7 +665,7 @@ static long slb_insert_entry(unsigned long ea, unsigned 
long context,
 * User preloads should add isync afterwards in case the kernel
 * accesses user memory before it returns to userspace with rfid.
 */
-   assert_slb_notexists(ea);
+   assert_slb_presence(false, ea);
asm volatile("slbmte %0, %1" : : "r" (vsid_data), "r" (esid_data));
 
barrier();
-- 
2.17.2



Re: Build regressions/improvements in v4.20-rc1 (sound/pci/hda/patch_ca0132.c)

2018-11-06 Thread Geert Uytterhoeven
Hi Randy,

On Tue, Nov 6, 2018 at 2:06 AM Randy Dunlap  wrote:
> On 11/5/18 2:12 PM, Geert Uytterhoeven wrote:
> > On Mon, Nov 5, 2018 at 11:07 PM Geert Uytterhoeven  
> > wrote:
> >> Below is the list of build error/warning regressions/improvements in
> >> v4.20-rc1[1] compared to v4.19[2].
> >>
> >> Summarized:
> >>   - build errors: +3/-0
> >>   - build warnings: +449/-2712
> >>
> >> Happy fixing! ;-)
> >>
> >> Thanks to the linux-next team for providing the build service.
> >>
> >> [1] 
> >> http://kisskb.ellerman.id.au/kisskb/branch/linus/head/651022382c7f8da46cb4872a545ee1da6d097d2a/
> >>  (all 240 configs)
> >> [2] 
> >> http://kisskb.ellerman.id.au/kisskb/branch/linus/head/84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d/
> >>  (all 240 configs)
> >>
> >>
> >> *** ERRORS ***
> >>
> >>   + /kisskb/src/sound/pci/hda/patch_ca0132.c: error: implicit declaration 
> >> of function 'pci_iomap' [-Werror=implicit-function-declaration]:  => 8799:3
> >
> > sh4-all{mod,yes}config
> >
> > Looks like d9b84a15892c0233 ("ALSA: hda: Fix implicit definition of
> > pci_iomap() on SH")
> > is not sufficient?
>
> Different problem.  This is about "select":
>
> config SND_SOC_ALL_CODECS
> tristate "Build all ASoC CODEC drivers"
>
> That enables (sets):
> select SND_SOC_HDAC_HDA
> which selects SND_HDA even though CONFIG_PCI is not enabled.
>
> After SND_HDA is selected (above), the Kconfig symbols in
> sound/pci/hda/Kconfig are available for enabling, so
> SND_HDA_CODEC_CA0132 is enabled but will not build.

Thanks for looking into this!

> One simple solution (but possibly too naive) is:
>
> ---
>  sound/soc/codecs/Kconfig |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- lnx-420-rc1.orig/sound/soc/codecs/Kconfig
> +++ lnx-420-rc1/sound/soc/codecs/Kconfig
> @@ -82,7 +82,7 @@ config SND_SOC_ALL_CODECS
> select SND_SOC_ES7241
> select SND_SOC_GTM601
> select SND_SOC_HDAC_HDMI
> -   select SND_SOC_HDAC_HDA
> +   select SND_SOC_HDAC_HDA if PCI
> select SND_SOC_ICS43432
> select SND_SOC_INNO_RK3036
> select SND_SOC_ISABELLE if I2C

I guess that will work. There are already plenty of "select foo if bar" lines.
However, looking at what else can enable SND_HDA, I think it should be

select SND_SOC_HDAC_HDA if SND_PCI || ARCH_TEGRA

That still leaves the issue that pci_iomap() on SH should be an empty stub if
PCI is not available, like on other architectures.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds