Re: [PATCH V2 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update

2018-11-28 Thread Aneesh Kumar K.V

On 11/29/18 3:40 AM, Andrew Morton wrote:

On Wed, 28 Nov 2018 20:04:37 +0530 "Aneesh Kumar K.V" 
 wrote:


Signed-off-by: Aneesh Kumar K.V 


Some explanation of the motivation would be useful.


I will update the commit message.





  include/linux/hugetlb.h | 18 ++
  mm/hugetlb.c|  8 +---
  2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 087fd5f48c91..e2a3b0c854eb 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -543,6 +543,24 @@ static inline void set_huge_swap_pte_at(struct mm_struct 
*mm, unsigned long addr
set_huge_pte_at(mm, addr, ptep, pte);
  }
  #endif
+
+#ifndef huge_ptep_modify_prot_start
+static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
+{
+   return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+}
+#endif


#define huge_ptep_modify_prot_start huge_ptep_modify_prot_start


+#ifndef huge_ptep_modify_prot_commit
+static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep,
+   pte_t old_pte, pte_t pte)
+{
+   set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+}
+#endif


#define huge_ptep_modify_prot_commit huge_ptep_modify_prot_commit




Will update.

-aneesh



[PATCH AUTOSEL 4.14 30/35] ibmvnic: Fix RX queue buffer cleanup

2018-11-28 Thread Sasha Levin
From: Thomas Falcon 

[ Upstream commit b7cdec3d699db2e5985ad39de0f25d3b6111928e ]

The wrong index is used when cleaning up RX buffer objects during release
of RX queues. Update to use the correct index counter.

Signed-off-by: Thomas Falcon 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 5c7134ccc1fd..14c53ed5cca6 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -457,8 +457,8 @@ static void release_rx_pools(struct ibmvnic_adapter 
*adapter)
 
for (j = 0; j < rx_pool->size; j++) {
if (rx_pool->rx_buff[j].skb) {
-   dev_kfree_skb_any(rx_pool->rx_buff[i].skb);
-   rx_pool->rx_buff[i].skb = NULL;
+   dev_kfree_skb_any(rx_pool->rx_buff[j].skb);
+   rx_pool->rx_buff[j].skb = NULL;
}
}
 
-- 
2.17.1



[PATCH AUTOSEL 4.19 62/68] ibmvnic: Update driver queues after change in ring size support

2018-11-28 Thread Sasha Levin
From: Thomas Falcon 

[ Upstream commit 5bf032ef08e6a110edc1e3bfb3c66a208fb55125 ]

During device reset, queue memory is not being updated to accommodate
changes in ring buffer sizes supported by backing hardware. Track
any differences in ring buffer sizes following the reset and update
queue memory when possible.

Signed-off-by: Thomas Falcon 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index f1d4d7a1278b..5ab21a1b5444 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1737,6 +1737,7 @@ static int do_reset(struct ibmvnic_adapter *adapter,
struct ibmvnic_rwi *rwi, u32 reset_state)
 {
u64 old_num_rx_queues, old_num_tx_queues;
+   u64 old_num_rx_slots, old_num_tx_slots;
struct net_device *netdev = adapter->netdev;
int i, rc;
 
@@ -1748,6 +1749,8 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 
old_num_rx_queues = adapter->req_rx_queues;
old_num_tx_queues = adapter->req_tx_queues;
+   old_num_rx_slots = adapter->req_rx_add_entries_per_subcrq;
+   old_num_tx_slots = adapter->req_tx_entries_per_subcrq;
 
ibmvnic_cleanup(netdev);
 
@@ -1810,7 +1813,11 @@ static int do_reset(struct ibmvnic_adapter *adapter,
if (rc)
return rc;
} else if (adapter->req_rx_queues != old_num_rx_queues ||
-  adapter->req_tx_queues != old_num_tx_queues) {
+  adapter->req_tx_queues != old_num_tx_queues ||
+  adapter->req_rx_add_entries_per_subcrq !=
+   old_num_rx_slots ||
+  adapter->req_tx_entries_per_subcrq !=
+   old_num_tx_slots) {
release_rx_pools(adapter);
release_tx_pools(adapter);
release_napi(adapter);
-- 
2.17.1



[PATCH AUTOSEL 4.19 61/68] ibmvnic: Fix RX queue buffer cleanup

2018-11-28 Thread Sasha Levin
From: Thomas Falcon 

[ Upstream commit b7cdec3d699db2e5985ad39de0f25d3b6111928e ]

The wrong index is used when cleaning up RX buffer objects during release
of RX queues. Update to use the correct index counter.

Signed-off-by: Thomas Falcon 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index a646de07cbdc..f1d4d7a1278b 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -485,8 +485,8 @@ static void release_rx_pools(struct ibmvnic_adapter 
*adapter)
 
for (j = 0; j < rx_pool->size; j++) {
if (rx_pool->rx_buff[j].skb) {
-   dev_kfree_skb_any(rx_pool->rx_buff[i].skb);
-   rx_pool->rx_buff[i].skb = NULL;
+   dev_kfree_skb_any(rx_pool->rx_buff[j].skb);
+   rx_pool->rx_buff[j].skb = NULL;
}
}
 
-- 
2.17.1



[PATCH AUTOSEL 4.19 49/68] net/ibmnvic: Fix deadlock problem in reset

2018-11-28 Thread Sasha Levin
From: Juliet Kim 

[ Upstream commit a5681e20b541a507c7d4fd48ae4a4040d32ee1ef ]

This patch changes to use rtnl_lock only during a reset to avoid
deadlock that could occur when a thread operating close is holding
rtnl_lock and waiting for reset_lock acquired by another thread,
which is waiting for rtnl_lock in order to set the number of tx/rx
queues during a reset.

Also, we now setting the number of tx/rx queues during a soft reset
for failover or LPM events.

Signed-off-by: Juliet Kim 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 59 +++---
 drivers/net/ethernet/ibm/ibmvnic.h |  2 +-
 2 files changed, 22 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 7661064c815b..a646de07cbdc 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1103,20 +1103,15 @@ static int ibmvnic_open(struct net_device *netdev)
return 0;
}
 
-   mutex_lock(>reset_lock);
-
if (adapter->state != VNIC_CLOSED) {
rc = ibmvnic_login(netdev);
-   if (rc) {
-   mutex_unlock(>reset_lock);
+   if (rc)
return rc;
-   }
 
rc = init_resources(adapter);
if (rc) {
netdev_err(netdev, "failed to initialize resources\n");
release_resources(adapter);
-   mutex_unlock(>reset_lock);
return rc;
}
}
@@ -1124,8 +1119,6 @@ static int ibmvnic_open(struct net_device *netdev)
rc = __ibmvnic_open(netdev);
netif_carrier_on(netdev);
 
-   mutex_unlock(>reset_lock);
-
return rc;
 }
 
@@ -1269,10 +1262,8 @@ static int ibmvnic_close(struct net_device *netdev)
return 0;
}
 
-   mutex_lock(>reset_lock);
rc = __ibmvnic_close(netdev);
ibmvnic_cleanup(netdev);
-   mutex_unlock(>reset_lock);
 
return rc;
 }
@@ -1820,20 +1811,15 @@ static int do_reset(struct ibmvnic_adapter *adapter,
return rc;
} else if (adapter->req_rx_queues != old_num_rx_queues ||
   adapter->req_tx_queues != old_num_tx_queues) {
-   adapter->map_id = 1;
release_rx_pools(adapter);
release_tx_pools(adapter);
-   rc = init_rx_pools(netdev);
-   if (rc)
-   return rc;
-   rc = init_tx_pools(netdev);
-   if (rc)
-   return rc;
-
release_napi(adapter);
-   rc = init_napi(adapter);
+   release_vpd_data(adapter);
+
+   rc = init_resources(adapter);
if (rc)
return rc;
+
} else {
rc = reset_tx_pools(adapter);
if (rc)
@@ -1917,17 +1903,8 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter,
adapter->state = VNIC_PROBED;
return 0;
}
-   /* netif_set_real_num_xx_queues needs to take rtnl lock here
-* unless wait_for_reset is set, in which case the rtnl lock
-* has already been taken before initializing the reset
-*/
-   if (!adapter->wait_for_reset) {
-   rtnl_lock();
-   rc = init_resources(adapter);
-   rtnl_unlock();
-   } else {
-   rc = init_resources(adapter);
-   }
+
+   rc = init_resources(adapter);
if (rc)
return rc;
 
@@ -1986,13 +1963,21 @@ static void __ibmvnic_reset(struct work_struct *work)
struct ibmvnic_rwi *rwi;
struct ibmvnic_adapter *adapter;
struct net_device *netdev;
+   bool we_lock_rtnl = false;
u32 reset_state;
int rc = 0;
 
adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset);
netdev = adapter->netdev;
 
-   mutex_lock(>reset_lock);
+   /* netif_set_real_num_xx_queues needs to take rtnl lock here
+* unless wait_for_reset is set, in which case the rtnl lock
+* has already been taken before initializing the reset
+*/
+   if (!adapter->wait_for_reset) {
+   rtnl_lock();
+   we_lock_rtnl = true;
+   }
reset_state = adapter->state;
 
rwi = get_next_rwi(adapter);
@@ -2020,12 +2005,11 @@ static void __ibmvnic_reset(struct work_struct *work)
if (rc) {
netdev_dbg(adapter->netdev, "Reset failed\n");
free_all_rwi(adapter);
-   mutex_unlock(>reset_lock);
-   return;
}
 
adapter->resetting = 

Re: [PATCH] powerpc/boot: Copy serial.c in Makefile

2018-11-28 Thread Daniel Axtens
Right, so as both 0-day and snowpatch tell me, this patch is wrong.

It turns out that this:
>  $(obj)/serial.c: $(obj)/autoconf.h
> + $(Q)cp $< $@
is identical to:
cp arch/powerpc/boot/autoconf.h arch/powerpc/boot/serial.c

(Clearly my make mastery is inadequate.)

Amusingly this which works for my 64e uImage but obviously not for
anything that actually needs code from serial.c.

Further analysis suggests that making with -j1 triggers the issue, but
everything works with -j2 and above. That would make sense with the
timeline of when I discovered the issue because I changed my build
script to not build in parallel.

Regards,
Daniel




[PATCH v3 4/4] powerpc: generate uapi header and system call table files

2018-11-28 Thread Firoz Khan
System call table generation script must be run to gener-
ate unistd_32/64.h and syscall_table_32/64/c32/spu.h files.
This patch will have changes which will invokes the script.

This patch will generate unistd_32/64.h and syscall_table-
_32/64/c32/spu.h files by the syscall table generation
script invoked by parisc/Makefile and the generated files
against the removed files must be identical.

The generated uapi header file will be included in uapi/-
asm/unistd.h and generated system call table header file
will be included by kernel/systbl.S file.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/Makefile   |   3 +
 arch/powerpc/include/asm/Kbuild |   4 +
 arch/powerpc/include/asm/systbl.h   | 395 
 arch/powerpc/include/uapi/asm/Kbuild|   2 +
 arch/powerpc/include/uapi/asm/unistd.h  | 392 +--
 arch/powerpc/kernel/Makefile|  10 -
 arch/powerpc/kernel/systbl.S|  36 +--
 arch/powerpc/kernel/systbl_chk.c|  60 -
 arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
 9 files changed, 27 insertions(+), 892 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/systbl.h
 delete mode 100644 arch/powerpc/kernel/systbl_chk.c

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 8a2ce14..34897191 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -402,6 +402,9 @@ archclean:
 
 archprepare: checkbin
 
+archheaders:
+   $(Q)$(MAKE) $(build)=arch/powerpc/kernel/syscalls all
+
 ifdef CONFIG_STACKPROTECTOR
 prepare: stack_protector_prepare
 
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 3196d22..77ff7fb 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -1,3 +1,7 @@
+generated-y += syscall_table_32.h
+generated-y += syscall_table_64.h
+generated-y += syscall_table_c32.h
+generated-y += syscall_table_spu.h
 generic-y += div64.h
 generic-y += export.h
 generic-y += irq_regs.h
diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
deleted file mode 100644
index c4321b9..000
--- a/arch/powerpc/include/asm/systbl.h
+++ /dev/null
@@ -1,395 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * List of powerpc syscalls. For the meaning of the _SPU suffix see
- * arch/powerpc/platforms/cell/spu_callbacks.c
- */
-
-SYSCALL(restart_syscall)
-SYSCALL(exit)
-PPC_SYS(fork)
-SYSCALL_SPU(read)
-SYSCALL_SPU(write)
-COMPAT_SYS_SPU(open)
-SYSCALL_SPU(close)
-SYSCALL_SPU(waitpid)
-SYSCALL_SPU(creat)
-SYSCALL_SPU(link)
-SYSCALL_SPU(unlink)
-COMPAT_SYS(execve)
-SYSCALL_SPU(chdir)
-COMPAT_SYS_SPU(time)
-SYSCALL_SPU(mknod)
-SYSCALL_SPU(chmod)
-SYSCALL_SPU(lchown)
-SYSCALL(ni_syscall)
-OLDSYS(stat)
-COMPAT_SYS_SPU(lseek)
-SYSCALL_SPU(getpid)
-COMPAT_SYS(mount)
-SYSX(sys_ni_syscall,sys_oldumount,sys_oldumount)
-SYSCALL_SPU(setuid)
-SYSCALL_SPU(getuid)
-COMPAT_SYS_SPU(stime)
-COMPAT_SYS(ptrace)
-SYSCALL_SPU(alarm)
-OLDSYS(fstat)
-SYSCALL(pause)
-COMPAT_SYS(utime)
-SYSCALL(ni_syscall)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(access)
-SYSCALL_SPU(nice)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(sync)
-SYSCALL_SPU(kill)
-SYSCALL_SPU(rename)
-SYSCALL_SPU(mkdir)
-SYSCALL_SPU(rmdir)
-SYSCALL_SPU(dup)
-SYSCALL_SPU(pipe)
-COMPAT_SYS_SPU(times)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(brk)
-SYSCALL_SPU(setgid)
-SYSCALL_SPU(getgid)
-SYSCALL(signal)
-SYSCALL_SPU(geteuid)
-SYSCALL_SPU(getegid)
-SYSCALL(acct)
-SYSCALL(umount)
-SYSCALL(ni_syscall)
-COMPAT_SYS_SPU(ioctl)
-COMPAT_SYS_SPU(fcntl)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(setpgid)
-SYSCALL(ni_syscall)
-SYSX(sys_ni_syscall,sys_olduname,sys_olduname)
-SYSCALL_SPU(umask)
-SYSCALL_SPU(chroot)
-COMPAT_SYS(ustat)
-SYSCALL_SPU(dup2)
-SYSCALL_SPU(getppid)
-SYSCALL_SPU(getpgrp)
-SYSCALL_SPU(setsid)
-SYS32ONLY(sigaction)
-SYSCALL_SPU(sgetmask)
-SYSCALL_SPU(ssetmask)
-SYSCALL_SPU(setreuid)
-SYSCALL_SPU(setregid)
-SYS32ONLY(sigsuspend)
-SYSX(sys_ni_syscall,compat_sys_sigpending,sys_sigpending)
-SYSCALL_SPU(sethostname)
-COMPAT_SYS_SPU(setrlimit)
-SYSX(sys_ni_syscall,compat_sys_old_getrlimit,sys_old_getrlimit)
-COMPAT_SYS_SPU(getrusage)
-COMPAT_SYS_SPU(gettimeofday)
-COMPAT_SYS_SPU(settimeofday)
-SYSCALL_SPU(getgroups)
-SYSCALL_SPU(setgroups)
-SYSX(sys_ni_syscall,sys_ni_syscall,ppc_select)
-SYSCALL_SPU(symlink)
-OLDSYS(lstat)
-SYSCALL_SPU(readlink)
-SYSCALL(uselib)
-SYSCALL(swapon)
-SYSCALL(reboot)
-SYSX(sys_ni_syscall,compat_sys_old_readdir,sys_old_readdir)
-SYSCALL_SPU(mmap)
-SYSCALL_SPU(munmap)
-COMPAT_SYS_SPU(truncate)
-COMPAT_SYS_SPU(ftruncate)
-SYSCALL_SPU(fchmod)
-SYSCALL_SPU(fchown)
-SYSCALL_SPU(getpriority)
-SYSCALL_SPU(setpriority)
-SYSCALL(ni_syscall)
-COMPAT_SYS(statfs)
-COMPAT_SYS(fstatfs)
-SYSCALL(ni_syscall)
-COMPAT_SYS_SPU(socketcall)
-SYSCALL_SPU(syslog)
-COMPAT_SYS_SPU(setitimer)
-COMPAT_SYS_SPU(getitimer)
-COMPAT_SYS_SPU(newstat)
-COMPAT_SYS_SPU(newlstat)
-COMPAT_SYS_SPU(newfstat)
-SYSX(sys_ni_syscall,sys_uname,sys_uname)

[PATCH v3 3/4] powerpc: add system call table generation support

2018-11-28 Thread Firoz Khan
The system call tables are in different format in all
architecture and it will be difficult to manually add or
modify the system calls in the respective files. To make
it easy by keeping a script and which will generate the
uapi header and syscall table file. This change will also
help to unify the implementation across all architectures.

The system call table generation script is added in
syscalls directory which contain the script to generate
both uapi header file and system call table files.
The syscall.tbl file will be the input for the scripts.

syscall.tbl contains the list of available system calls
along with system call number and corresponding entry point.
Add a new system call in this architecture will be possible
by adding new entry in the syscall.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.

syscallhdr.sh and syscalltbl.sh will generate uapi header-
unistd_32/64.h and syscall_table_32/64/c32/spu.h files
respectively. File syscall_table_32/64/c32/spu.h is incl-
uded by syscall.S - the real system call table. Both *.sh
files will parse the content syscall.tbl to generate the
header and table files.

ARM, s390 and x86 architecuture does have similar support.
I leverage their implementation to come up with a generic
solution.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/kernel/syscalls/Makefile  |  63 +
 arch/powerpc/kernel/syscalls/syscall.tbl   | 427 +
 arch/powerpc/kernel/syscalls/syscallhdr.sh |  36 +++
 arch/powerpc/kernel/syscalls/syscalltbl.sh |  36 +++
 4 files changed, 562 insertions(+)
 create mode 100644 arch/powerpc/kernel/syscalls/Makefile
 create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh

diff --git a/arch/powerpc/kernel/syscalls/Makefile 
b/arch/powerpc/kernel/syscalls/Makefile
new file mode 100644
index 000..27b4895
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/Makefile
@@ -0,0 +1,63 @@
+# SPDX-License-Identifier: GPL-2.0
+kapi := arch/$(SRCARCH)/include/generated/asm
+uapi := arch/$(SRCARCH)/include/generated/uapi/asm
+
+_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')  \
+ $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)')
+
+syscall := $(srctree)/$(src)/syscall.tbl
+syshdr := $(srctree)/$(src)/syscallhdr.sh
+systbl := $(srctree)/$(src)/syscalltbl.sh
+
+quiet_cmd_syshdr = SYSHDR  $@
+  cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@'   \
+  '$(syshdr_abis_$(basetarget))'   \
+  '$(syshdr_pfx_$(basetarget))'\
+  '$(syshdr_offset_$(basetarget))'
+
+quiet_cmd_systbl = SYSTBL  $@
+  cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@'   \
+  '$(systbl_abis_$(basetarget))'   \
+  '$(systbl_abi_$(basetarget))'\
+  '$(systbl_offset_$(basetarget))'
+
+syshdr_abis_unistd_32 := common,nospu,32
+$(uapi)/unistd_32.h: $(syscall) $(syshdr)
+   $(call if_changed,syshdr)
+
+syshdr_abis_unistd_64 := common,nospu,64
+$(uapi)/unistd_64.h: $(syscall) $(syshdr)
+   $(call if_changed,syshdr)
+
+systbl_abis_syscall_table_32 := common,nospu,32
+systbl_abi_syscall_table_32 := 32
+$(kapi)/syscall_table_32.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abis_syscall_table_64 := common,nospu,64
+systbl_abi_syscall_table_64 := 64
+$(kapi)/syscall_table_64.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abis_syscall_table_c32 := common,nospu,32
+systbl_abi_syscall_table_c32 := c32
+$(kapi)/syscall_table_c32.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abis_syscall_table_spu := common,spu
+systbl_abi_syscall_table_spu := spu
+$(kapi)/syscall_table_spu.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+uapisyshdr-y   += unistd_32.h unistd_64.h
+kapisyshdr-y   += syscall_table_32.h   \
+  syscall_table_64.h   \
+  syscall_table_c32.h  \
+  syscall_table_spu.h
+
+targets+= $(uapisyshdr-y) $(kapisyshdr-y)
+
+PHONY += all
+all: $(addprefix $(uapi)/,$(uapisyshdr-y))
+all: $(addprefix $(kapi)/,$(kapisyshdr-y))
+   @:
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
new file mode 100644
index 000..db3bbb8
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -0,0 +1,427 @@
+# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+#
+# system call numbers and entry vectors for powerpc
+#
+# The format is:
+# 
+#
+# The  can be common, spu, nospu, 64, or 32 for this file.
+#
+0  nospu   restart_syscall sys_restart_syscall

[PATCH v3 2/4] powerpc: move macro definition from asm/systbl.h

2018-11-28 Thread Firoz Khan
Move the macro definition for compat_sys_sigsuspend from
asm/systbl.h to the file which it is getting included.

One of the patch in this patch series is generating uapi
header and syscall table files. In order to come up with
a common implimentation across all architecture, we need
to do this change.

This change will simplify the implementation of system
call table generation script and help to come up a common
implementation across all architecture.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/include/asm/systbl.h | 1 -
 arch/powerpc/kernel/systbl.S  | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 01b5171..c4321b9 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -76,7 +76,6 @@
 SYSCALL_SPU(ssetmask)
 SYSCALL_SPU(setreuid)
 SYSCALL_SPU(setregid)
-#define compat_sys_sigsuspend sys_sigsuspend
 SYS32ONLY(sigsuspend)
 SYSX(sys_ni_syscall,compat_sys_sigpending,sys_sigpending)
 SYSCALL_SPU(sethostname)
diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S
index 919a327..9ff1913 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.S
@@ -47,4 +47,5 @@
 .globl sys_call_table
 sys_call_table:
 
+#define compat_sys_sigsuspend  sys_sigsuspend
 #include 
-- 
1.9.1



[PATCH v3 1/4] powerpc: add __NR_syscalls along with NR_syscalls

2018-11-28 Thread Firoz Khan
NR_syscalls macro holds the number of system call exist
in powerpc architecture. We have to change the value of
NR_syscalls, if we add or delete a system call.

One of the patch in this patch series has a script which
will generate a uapi header based on syscall.tbl file.
The syscall.tbl file contains the number of system call
information. So we have two option to update NR_syscalls
value.

1. Update NR_syscalls in asm/unistd.h manually by count-
   ing the no.of system calls. No need to update NR_sys-
   calls until we either add a new system call or delete
   existing system call.

2. We can keep this feature in above mentioned script,
   that will count the number of syscalls and keep it in
   a generated file. In this case we don't need to expli-
   citly update NR_syscalls in asm/unistd.h file.

The 2nd option will be the recommended one. For that, I
added the __NR_syscalls macro in uapi/asm/unistd.h along
with NR_syscalls asm/unistd.h. The macro __NR_syscalls
also added for making the name convention same across all
architecture. While __NR_syscalls isn't strictly part of
the uapi, having it as part of the generated header to
simplifies the implementation. We also need to enclose
this macro with #ifdef __KERNEL__ to avoid side effects.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/include/asm/unistd.h  | 3 +--
 arch/powerpc/include/uapi/asm/unistd.h | 5 -
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index b0de85b..a3c35e6 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -11,8 +11,7 @@
 
 #include 
 
-
-#define NR_syscalls389
+#define NR_syscalls__NR_syscalls
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index 985534d..7195868 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -10,7 +10,6 @@
 #ifndef _UAPI_ASM_POWERPC_UNISTD_H_
 #define _UAPI_ASM_POWERPC_UNISTD_H_
 
-
 #define __NR_restart_syscall 0
 #define __NR_exit1
 #define __NR_fork2
@@ -401,4 +400,8 @@
 #define __NR_rseq  387
 #define __NR_io_pgetevents 388
 
+#ifdef __KERNEL__
+#define __NR_syscalls  389
+#endif
+
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
1.9.1



[PATCH v3 0/4] powerpc: system call table generation support

2018-11-28 Thread Firoz Khan
The purpose of this patch series is, we can easily
add/modify/delete system call table support by cha-
nging entry in syscall.tbl file instead of manually
changing many files. The other goal is to unify the 
system call table generation support implementation 
across all the architectures. 

The system call tables are in different format in 
all architecture. It will be difficult to manually
add, modify or delete the system calls in the resp-
ective files manually. To make it easy by keeping a 
script and which'll generate uapi header file and 
syscall table file.

syscall.tbl contains the list of available system 
calls along with system call number and correspond-
ing entry point. Add a new system call in this arch-
itecture will be possible by adding new entry in 
the syscall.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.
- spu entry name, if required.

ARM, s390 and x86 architecuture does exist the sim-
ilar support. I leverage their implementation to 
come up with a generic solution.

I have done the same support for work for alpha, 
ia64, m68k, microblaze, mips, parisc, sh, sparc, 
and xtensa. Below mentioned git repository contains
more details about the workflow.

https://github.com/frzkhn/system_call_table_generator/

Finally, this is the ground work to solve the Y2038
issue. We need to add two dozen of system calls to 
solve Y2038 issue. So this patch series will help to
add new system calls easily by adding new entry in the
syscall.tbl.

changes since v2:
 - modified/optimized the syscall.tbl to avoid duplicate
   for the spu entries.
 - updated the syscalltbl.sh to meet the above point.

changes since v1:
 - optimized/updated the syscall table generation 
   scripts.
 - fixed all mixed indentation issues in syscall.tbl.
 - added "comments" in syscall_*.tbl.
 - changed from generic-y to generated-y in Kbuild.

Firoz Khan (4):
  powerpc: add __NR_syscalls along with NR_syscalls
  powerpc: move macro definition from asm/systbl.h
  powerpc: add system call table generation support
  powerpc: generate uapi header and system call table files

 arch/powerpc/Makefile   |   3 +
 arch/powerpc/include/asm/Kbuild |   4 +
 arch/powerpc/include/asm/systbl.h   | 396 --
 arch/powerpc/include/asm/unistd.h   |   3 +-
 arch/powerpc/include/uapi/asm/Kbuild|   2 +
 arch/powerpc/include/uapi/asm/unistd.h  | 389 +
 arch/powerpc/kernel/Makefile|  10 -
 arch/powerpc/kernel/syscalls/Makefile   |  63 
 arch/powerpc/kernel/syscalls/syscall.tbl| 427 
 arch/powerpc/kernel/syscalls/syscallhdr.sh  |  36 +++
 arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
 arch/powerpc/kernel/systbl.S|  37 +--
 arch/powerpc/kernel/systbl_chk.c|  60 
 arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
 14 files changed, 591 insertions(+), 892 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/systbl.h
 create mode 100644 arch/powerpc/kernel/syscalls/Makefile
 create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
 delete mode 100644 arch/powerpc/kernel/systbl_chk.c

-- 
1.9.1



[PATCH 5/6] powerpc/eeh: Improve recovery of passed-through devices

2018-11-28 Thread Sam Bobroff
Currently, the EEH recovery process considers passed-through devices
as if they were not EEH-aware, which can cause them to be removed as
part of recovery.  Because device removal requires cooperation from
the guest, this may lead to the process stalling or deadlocking.
Also, if devices are removed on the host side, they will be removed
from their IOMMU group, making recovery in the guest impossible.

Therefore, alter the recovery process so that passed-through devices
are not removed but are instead left frozen (and marked isolated)
until the guest performs it's own recovery.  If firmware thaws a
passed-through PE because it's parent PE has been thawed (because it
was not passed through), re-freeze it.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h |  2 +-
 arch/powerpc/include/asm/ppc-pci.h |  2 +-
 arch/powerpc/kernel/eeh.c  | 47 +++---
 arch/powerpc/kernel/eeh_driver.c   | 32 +---
 drivers/vfio/vfio_spapr_eeh.c  |  6 ++--
 5 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 2ff123f745cc..0b655810f32d 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -300,7 +300,7 @@ void eeh_dev_release(struct pci_dev *pdev);
 struct eeh_pe *eeh_iommu_group_to_pe(struct iommu_group *group);
 int eeh_pe_set_option(struct eeh_pe *pe, int option);
 int eeh_pe_get_state(struct eeh_pe *pe);
-int eeh_pe_reset(struct eeh_pe *pe, int option);
+int eeh_pe_reset(struct eeh_pe *pe, int option, bool include_passed);
 int eeh_pe_configure(struct eeh_pe *pe);
 int eeh_pe_inject_err(struct eeh_pe *pe, int type, int func,
  unsigned long addr, unsigned long mask);
diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 08e094eaeccf..f191ef0d2a0a 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -53,7 +53,7 @@ void eeh_addr_cache_rmv_dev(struct pci_dev *dev);
 struct eeh_dev *eeh_addr_cache_get_dev(unsigned long addr);
 void eeh_slot_error_detail(struct eeh_pe *pe, int severity);
 int eeh_pci_enable(struct eeh_pe *pe, int function);
-int eeh_pe_reset_full(struct eeh_pe *pe);
+int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed);
 void eeh_save_bars(struct eeh_dev *edev);
 int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
 int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 052512e58b05..df02f55fdfa1 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -877,6 +877,24 @@ static void *eeh_set_dev_freset(struct eeh_dev *edev, void 
*flag)
return NULL;
 }
 
+static void eeh_pe_refreeze_passed(struct eeh_pe *root)
+{
+   struct eeh_pe *pe;
+   int state;
+
+   eeh_for_each_pe(root, pe) {
+   if (eeh_pe_passed(pe)) {
+   state = eeh_ops->get_state(pe, NULL);
+   if (state &
+  (EEH_STATE_MMIO_ACTIVE | EEH_STATE_MMIO_ENABLED)) {
+   pr_info("EEH: Passed-through PE PHB#%x-PE#%x 
was thawed by reset, re-freezing for safety.\n",
+   pe->phb->global_number, pe->addr);
+   eeh_pe_set_option(pe, EEH_OPT_FREEZE_PE);
+   }
+   }
+   }
+}
+
 /**
  * eeh_pe_reset_full - Complete a full reset process on the indicated PE
  * @pe: EEH PE
@@ -889,7 +907,7 @@ static void *eeh_set_dev_freset(struct eeh_dev *edev, void 
*flag)
  *
  * This function will attempt to reset a PE three times before failing.
  */
-int eeh_pe_reset_full(struct eeh_pe *pe)
+int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed)
 {
int reset_state = (EEH_PE_RESET | EEH_PE_CFG_BLOCKED);
int type = EEH_RESET_HOT;
@@ -911,11 +929,11 @@ int eeh_pe_reset_full(struct eeh_pe *pe)
 
/* Make three attempts at resetting the bus */
for (i = 0; i < 3; i++) {
-   ret = eeh_pe_reset(pe, type);
+   ret = eeh_pe_reset(pe, type, include_passed);
if (ret)
break;
 
-   ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE);
+   ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE, include_passed);
if (ret)
break;
 
@@ -936,6 +954,12 @@ int eeh_pe_reset_full(struct eeh_pe *pe)
__func__, state, pe->phb->global_number, pe->addr, (i + 
1));
}
 
+   /* Resetting the PE may have unfrozen child PEs. If those PEs have been
+* (potentially) passed through to a guest, re-freeze them:
+*/
+   if (!include_passed)
+   eeh_pe_refreeze_passed(pe);
+
eeh_pe_state_clear(pe, reset_state, true);
return ret;
 }
@@ -1611,13 +1635,12 @@ int 

[PATCH 2/6] powerpc/eeh: remove sw_state from eeh_unfreeze_pe()

2018-11-28 Thread Sam Bobroff
eeh_unfreeze_pe() performs two operations: unfreezing a PE (which may
cause firmware to unfreeze child PEs as well) and de-isolating the PE
and it's children.

To simplify this and support future work, separate out the
de-isolation and perform it at the call sites (when necessary).

There should be no change in behaviour.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h   |  2 +-
 arch/powerpc/kernel/eeh.c| 18 ++
 arch/powerpc/kernel/eeh_driver.c |  2 +-
 arch/powerpc/kernel/eeh_sysfs.c  |  3 ++-
 4 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 8b596d096ebe..2ff123f745cc 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -293,7 +293,7 @@ void eeh_add_device_late(struct pci_dev *);
 void eeh_add_device_tree_late(struct pci_bus *);
 void eeh_add_sysfs_files(struct pci_bus *);
 void eeh_remove_device(struct pci_dev *);
-int eeh_unfreeze_pe(struct eeh_pe *pe, bool sw_state);
+int eeh_unfreeze_pe(struct eeh_pe *pe);
 int eeh_pe_reset_and_recover(struct eeh_pe *pe);
 int eeh_dev_open(struct pci_dev *pdev);
 void eeh_dev_release(struct pci_dev *pdev);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 6cae6b56ffd6..ac8e69ee93a7 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -823,7 +823,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum 
pcie_reset_state stat
switch (state) {
case pcie_deassert_reset:
eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
-   eeh_unfreeze_pe(pe, false);
+   eeh_unfreeze_pe(pe);
if (!(pe->type & EEH_PE_VF))
eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
eeh_pe_dev_traverse(pe, eeh_restore_dev_state, dev);
@@ -1309,7 +1309,7 @@ void eeh_remove_device(struct pci_dev *dev)
edev->mode &= ~EEH_DEV_SYSFS;
 }
 
-int eeh_unfreeze_pe(struct eeh_pe *pe, bool sw_state)
+int eeh_unfreeze_pe(struct eeh_pe *pe)
 {
int ret;
 
@@ -1327,10 +1327,6 @@ int eeh_unfreeze_pe(struct eeh_pe *pe, bool sw_state)
return ret;
}
 
-   /* Clear software isolated state */
-   if (sw_state && (pe->state & EEH_PE_ISOLATED))
-   eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
-
return ret;
 }
 
@@ -1382,7 +1378,10 @@ static int eeh_pe_change_owner(struct eeh_pe *pe)
}
}
 
-   return eeh_unfreeze_pe(pe, true);
+   ret = eeh_unfreeze_pe(pe);
+   if (!ret)
+   eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
+   return ret;
 }
 
 /**
@@ -1639,7 +1638,10 @@ static int eeh_pe_reenable_devices(struct eeh_pe *pe)
}
 
/* The PE is still in frozen state */
-   return eeh_unfreeze_pe(pe, true);
+   ret = eeh_unfreeze_pe(pe);
+   if (!ret)
+   eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
+   return ret;
 }
 
 
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index aa86a42d98f2..0109d5d7fe63 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -598,7 +598,7 @@ static int eeh_clear_pe_frozen_state(struct eeh_pe *root)
 
eeh_for_each_pe(root, pe) {
for (i = 0; i < 3; i++)
-   if (!eeh_unfreeze_pe(pe, false))
+   if (!eeh_unfreeze_pe(pe))
break;
if (i >= 3)
return -EIO;
diff --git a/arch/powerpc/kernel/eeh_sysfs.c b/arch/powerpc/kernel/eeh_sysfs.c
index deed906dd8f1..0731d2f01dd9 100644
--- a/arch/powerpc/kernel/eeh_sysfs.c
+++ b/arch/powerpc/kernel/eeh_sysfs.c
@@ -82,8 +82,9 @@ static ssize_t eeh_pe_state_store(struct device *dev,
if (!(edev->pe->state & EEH_PE_ISOLATED))
return count;
 
-   if (eeh_unfreeze_pe(edev->pe, true))
+   if (eeh_unfreeze_pe(edev->pe))
return -EIO;
+   eeh_pe_state_clear(edev->pe, EEH_PE_ISOLATED);
 
return count;
 }
-- 
2.19.0.2.gcad72f5712



[PATCH 3/6] powerpc/eeh: Add include_passed to eeh_pe_state_clear()

2018-11-28 Thread Sam Bobroff
Add a parameter to eeh_pe_state_clear() that allows passed-through PEs
to be excluded. Update callers to always pass true so that there is no
change in behaviour.

Also refactor to use direct traversal, to allow the removal of some
boilerplate.

This is to prepare for follow-up work for passed-through devices.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/ppc-pci.h |  2 +-
 arch/powerpc/kernel/eeh.c  | 18 
 arch/powerpc/kernel/eeh_driver.c   | 20 -
 arch/powerpc/kernel/eeh_pe.c   | 68 +-
 arch/powerpc/kernel/eeh_sysfs.c|  2 +-
 5 files changed, 50 insertions(+), 60 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index f67da277d652..08e094eaeccf 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -59,7 +59,7 @@ int rtas_write_config(struct pci_dn *, int where, int size, 
u32 val);
 int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
 void eeh_pe_state_mark(struct eeh_pe *pe, int state);
 void eeh_pe_mark_isolated(struct eeh_pe *pe);
-void eeh_pe_state_clear(struct eeh_pe *pe, int state);
+void eeh_pe_state_clear(struct eeh_pe *pe, int state, bool include_passed);
 void eeh_pe_state_mark_with_cfg(struct eeh_pe *pe, int state);
 void eeh_pe_dev_mode_mark(struct eeh_pe *pe, int mode);
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index ac8e69ee93a7..052512e58b05 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -825,13 +825,13 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, 
enum pcie_reset_state stat
eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
eeh_unfreeze_pe(pe);
if (!(pe->type & EEH_PE_VF))
-   eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
+   eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED, true);
eeh_pe_dev_traverse(pe, eeh_restore_dev_state, dev);
-   eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
+   eeh_pe_state_clear(pe, EEH_PE_ISOLATED, true);
break;
case pcie_hot_reset:
eeh_pe_mark_isolated(pe);
-   eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
+   eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED, true);
eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE);
eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev);
if (!(pe->type & EEH_PE_VF))
@@ -840,7 +840,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum 
pcie_reset_state stat
break;
case pcie_warm_reset:
eeh_pe_mark_isolated(pe);
-   eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
+   eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED, true);
eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE);
eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev);
if (!(pe->type & EEH_PE_VF))
@@ -848,7 +848,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum 
pcie_reset_state stat
eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
break;
default:
-   eeh_pe_state_clear(pe, EEH_PE_ISOLATED | EEH_PE_CFG_BLOCKED);
+   eeh_pe_state_clear(pe, EEH_PE_ISOLATED | EEH_PE_CFG_BLOCKED, 
true);
return -EINVAL;
};
 
@@ -936,7 +936,7 @@ int eeh_pe_reset_full(struct eeh_pe *pe)
__func__, state, pe->phb->global_number, pe->addr, (i + 
1));
}
 
-   eeh_pe_state_clear(pe, reset_state);
+   eeh_pe_state_clear(pe, reset_state, true);
return ret;
 }
 
@@ -1380,7 +1380,7 @@ static int eeh_pe_change_owner(struct eeh_pe *pe)
 
ret = eeh_unfreeze_pe(pe);
if (!ret)
-   eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
+   eeh_pe_state_clear(pe, EEH_PE_ISOLATED, true);
return ret;
 }
 
@@ -1640,7 +1640,7 @@ static int eeh_pe_reenable_devices(struct eeh_pe *pe)
/* The PE is still in frozen state */
ret = eeh_unfreeze_pe(pe);
if (!ret)
-   eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
+   eeh_pe_state_clear(pe, EEH_PE_ISOLATED, true);
return ret;
 }
 
@@ -1668,7 +1668,7 @@ int eeh_pe_reset(struct eeh_pe *pe, int option)
switch (option) {
case EEH_RESET_DEACTIVATE:
ret = eeh_ops->reset(pe, option);
-   eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED);
+   eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED, true);
if (ret)
break;
 
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 0109d5d7fe63..b2687c14dc40 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -603,7 +603,7 @@ static int eeh_clear_pe_frozen_state(struct eeh_pe *root)
if (i >= 3)

[PATCH 6/6] powerpc/eeh: Correct retries in eeh_pe_reset_full()

2018-11-28 Thread Sam Bobroff
Currently, eeh_pe_reset_full() will only attempt to reset a PE more
than once if activating the reset state and deactivating it both
succeed, but later polling shows that it hasn't become active.

Change this so that it will try up to three times for any reason other
than an unrecoverable slot error and adjust the message generation so
that it's clear weather the reset has ultimately succeeded or failed.
This allows the reset to succeed in some situations where it would
currently fail.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index df02f55fdfa1..528caf857428 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -912,7 +912,7 @@ int eeh_pe_reset_full(struct eeh_pe *pe, bool 
include_passed)
int reset_state = (EEH_PE_RESET | EEH_PE_CFG_BLOCKED);
int type = EEH_RESET_HOT;
unsigned int freset = 0;
-   int i, state, ret;
+   int i, state = 0, ret;
 
/*
 * Determine the type of reset to perform - hot or fundamental.
@@ -930,28 +930,32 @@ int eeh_pe_reset_full(struct eeh_pe *pe, bool 
include_passed)
/* Make three attempts at resetting the bus */
for (i = 0; i < 3; i++) {
ret = eeh_pe_reset(pe, type, include_passed);
-   if (ret)
-   break;
-
-   ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE, include_passed);
-   if (ret)
-   break;
+   if (!ret)
+   ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE,
+  include_passed);
+   if (ret) {
+   ret = -EIO;
+   pr_warn("EEH: Failure %d resetting PHB#%x-PE#%x 
(attempt %d)\n\n",
+   state, pe->phb->global_number, pe->addr, i + 1);
+   continue;
+   }
+   if (i)
+   pr_warn("EEH: PHB#%x-PE#%x: Successful reset (attempt 
%d)\n",
+   pe->phb->global_number, pe->addr, i + 1);
 
/* Wait until the PE is in a functioning state */
state = eeh_wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
if (state < 0) {
-   pr_warn("%s: Unrecoverable slot failure on 
PHB#%x-PE#%x",
-   __func__, pe->phb->global_number, pe->addr);
+   pr_warn("EEH: Unrecoverable slot failure on 
PHB#%x-PE#%x",
+   pe->phb->global_number, pe->addr);
ret = -ENOTRECOVERABLE;
break;
}
if (eeh_state_active(state))
break;
-
-   /* Set error in case this is our last attempt */
-   ret = -EIO;
-   pr_warn("%s: Failure %d resetting PHB#%x-PE#%x\n (%d)\n",
-   __func__, state, pe->phb->global_number, pe->addr, (i + 
1));
+   else
+   pr_warn("EEH: PHB#%x-PE#%x: Slot inactive after reset: 
0x%x (attempt %d)\n",
+   pe->phb->global_number, pe->addr, state, i + 1);
}
 
/* Resetting the PE may have unfrozen child PEs. If those PEs have been
-- 
2.19.0.2.gcad72f5712



[PATCH 0/6] powerpc/eeh: Improve recovery of passed-through devices

2018-11-28 Thread Sam Bobroff
Hello,

Here are changes that allow EEH to successfully recover after a failure that
affects of both host and guest devices. This happens, for example, when a PHB
containing passed-through devices is fenced. (Failures that include only
passed-through devices are ignored by the host.)

Currently, when an error affects both passed-through and un-passed-through
devices, the passed-through devices are treated as if their driver was not EEH
aware. This causes them to be hot-unplugged as part of recovery.

The hot unplug request is forwarded to the guest which checks the device status
before releasing the device. Because the host is recovering the device, it
reports the device status as EEH_STATE_UNAVAILABLE which causes the guest to
wait for the device to become available. This deadlocks the recovery process.

This change causes the host to recover it's own devices but leave
passed-through devices frozen until the guest performs it's own recovery. (They
are not removed.) If the guest detects the error and begins recovery itself,
waiting for the device state to change away from EEH_STATE_UNAVAILABLE causes
it to wait until the host has finished it's recovery and the guest's subsequent
recovery can then succeed.

Note that resetting a PE may implicitly thaw both it and child PEs, and to
prevent the device from being accidentally used by the guest (which may be
unaware of the failure and reset) when in this state, we re-freeze those
devices. This does leave a small window of opportunity but that will need to be
addressed with a firmware change.

I've also included a fix to the reset function (the last patch), because
without it some scenarios still fail. An example is injecting an error into
a PHB and then exiting a guest that contains passed-through devices from that
PHB so that an EEH event is raised during the process of passing the device
back to the host.

Cheers,
Sam.

Sam Bobroff (6):
  powerpc/eeh: Cleanup eeh_pe_clear_frozen_state()
  powerpc/eeh: remove sw_state from eeh_unfreeze_pe()
  powerpc/eeh: Add include_passed to eeh_pe_state_clear()
  powerpc/eeh: Add include_passed to eeh_clear_pe_frozen_state()
  powerpc/eeh: Improve recovery of passed-through devices
  powerpc/eeh: Correct retries in eeh_pe_reset_full()

 arch/powerpc/include/asm/eeh.h |   4 +-
 arch/powerpc/include/asm/ppc-pci.h |   4 +-
 arch/powerpc/kernel/eeh.c  | 103 +++--
 arch/powerpc/kernel/eeh_driver.c   |  86 ++--
 arch/powerpc/kernel/eeh_pe.c   |  68 ---
 arch/powerpc/kernel/eeh_sysfs.c|   3 +-
 drivers/vfio/vfio_spapr_eeh.c  |   6 +-
 7 files changed, 140 insertions(+), 134 deletions(-)

-- 
2.19.0.2.gcad72f5712



[PATCH 4/6] powerpc/eeh: Add include_passed to eeh_clear_pe_frozen_state()

2018-11-28 Thread Sam Bobroff
Add a parameter to eeh_clear_pe_frozen_state() that allows
passed-through PEs to be excluded. Update callers to always pass true
so that there is no change in behaviour.

This is to prepare for follow-up work for passed-through devices.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh_driver.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index b2687c14dc40..61c177ebb230 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -591,19 +591,21 @@ static void *eeh_pe_detach_dev(struct eeh_pe *pe, void 
*userdata)
  * PE reset (for 3 times), we try to clear the frozen state
  * for 3 times as well.
  */
-static int eeh_clear_pe_frozen_state(struct eeh_pe *root)
+static int eeh_clear_pe_frozen_state(struct eeh_pe *root, bool include_passed)
 {
struct eeh_pe *pe;
int i;
 
eeh_for_each_pe(root, pe) {
-   for (i = 0; i < 3; i++)
-   if (!eeh_unfreeze_pe(pe))
-   break;
-   if (i >= 3)
-   return -EIO;
+   if (include_passed || !eeh_pe_passed(pe)) {
+   for (i = 0; i < 3; i++)
+   if (!eeh_unfreeze_pe(pe))
+   break;
+   if (i >= 3)
+   return -EIO;
+   }
}
-   eeh_pe_state_clear(root, EEH_PE_ISOLATED, true);
+   eeh_pe_state_clear(root, EEH_PE_ISOLATED, include_passed);
return 0;
 }
 
@@ -629,7 +631,7 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
}
 
/* Unfreeze the PE */
-   ret = eeh_clear_pe_frozen_state(pe);
+   ret = eeh_clear_pe_frozen_state(pe, true);
if (ret) {
eeh_pe_state_clear(pe, EEH_PE_RECOVERING, true);
return ret;
@@ -702,7 +704,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
eeh_pe_restore_bars(pe);
 
/* Clear frozen state */
-   rc = eeh_clear_pe_frozen_state(pe);
+   rc = eeh_clear_pe_frozen_state(pe, true);
if (rc) {
pci_unlock_rescan_remove();
return rc;
-- 
2.19.0.2.gcad72f5712



[PATCH 1/6] powerpc/eeh: Cleanup eeh_pe_clear_frozen_state()

2018-11-28 Thread Sam Bobroff
The 'clear_sw_state' parameter for eeh_pe_clear_frozen_state() is
redundant because it has no effect (except in the rare case of a
hardware error part way through unfreezing a tree of PEs, where it
would dangerously allow partial de-isolation before returning
failure).

It is passed down to __eeh_pe_clear_frozen_state(), and from there to
eeh_unfreeze_pe(), where it causes EEH_PE_ISOLATED to be removed
from the state of each PE during the traversal.  However, when the
traversal finishes, EEH_PE_ISOLATED is unconditionally removed by a
call to eeh_pe_state_clear() regardless of the parameter's value.

So remove the flag and pass false to eeh_unfreeze_pe() (to avoid the
rare case described above, as it was before the flag was introduced).
Also, perform the recursion directly in the function and eliminate a
bit of boilerplate.

There should be no change in functionality, except as mentioned above.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh_driver.c | 40 +++-
 1 file changed, 13 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 9446248eb6b8..aa86a42d98f2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -591,34 +591,20 @@ static void *eeh_pe_detach_dev(struct eeh_pe *pe, void 
*userdata)
  * PE reset (for 3 times), we try to clear the frozen state
  * for 3 times as well.
  */
-static void *__eeh_clear_pe_frozen_state(struct eeh_pe *pe, void *flag)
+static int eeh_clear_pe_frozen_state(struct eeh_pe *root)
 {
-   bool clear_sw_state = *(bool *)flag;
-   int i, rc = 1;
-
-   for (i = 0; rc && i < 3; i++)
-   rc = eeh_unfreeze_pe(pe, clear_sw_state);
+   struct eeh_pe *pe;
+   int i;
 
-   /* Stop immediately on any errors */
-   if (rc) {
-   pr_warn("%s: Failure %d unfreezing PHB#%x-PE#%x\n",
-   __func__, rc, pe->phb->global_number, pe->addr);
-   return (void *)pe;
+   eeh_for_each_pe(root, pe) {
+   for (i = 0; i < 3; i++)
+   if (!eeh_unfreeze_pe(pe, false))
+   break;
+   if (i >= 3)
+   return -EIO;
}
-
-   return NULL;
-}
-
-static int eeh_clear_pe_frozen_state(struct eeh_pe *pe,
-bool clear_sw_state)
-{
-   void *rc;
-
-   rc = eeh_pe_traverse(pe, __eeh_clear_pe_frozen_state, _sw_state);
-   if (!rc)
-   eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
-
-   return rc ? -EIO : 0;
+   eeh_pe_state_clear(root, EEH_PE_ISOLATED);
+   return 0;
 }
 
 int eeh_pe_reset_and_recover(struct eeh_pe *pe)
@@ -643,7 +629,7 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
}
 
/* Unfreeze the PE */
-   ret = eeh_clear_pe_frozen_state(pe, true);
+   ret = eeh_clear_pe_frozen_state(pe);
if (ret) {
eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
return ret;
@@ -716,7 +702,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
eeh_pe_restore_bars(pe);
 
/* Clear frozen state */
-   rc = eeh_clear_pe_frozen_state(pe, false);
+   rc = eeh_clear_pe_frozen_state(pe);
if (rc) {
pci_unlock_rescan_remove();
return rc;
-- 
2.19.0.2.gcad72f5712



Re: [PATCH kernel] powerpc/powernv/npu: Remove unused headers and a macro.

2018-11-28 Thread Alistair Popple
Thanks! Looks like a reasonable cleanup. Assuming it compiles I can't see any 
reason not to add:

Acked-by: Alistair Popple 

On Thursday, 29 November 2018 1:11:34 PM AEDT Alexey Kardashevskiy wrote:
> Ping?
> 
> On 28/09/2018 16:48, Alexey Kardashevskiy wrote:
> > The macro and few headers are not used so remove them.
> > 
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> > 
> >  arch/powerpc/platforms/powernv/npu-dma.c | 14 --
> >  1 file changed, 14 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c
> > b/arch/powerpc/platforms/powernv/npu-dma.c index 8006c54..3a5c4ed 100644
> > --- a/arch/powerpc/platforms/powernv/npu-dma.c
> > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> > @@ -9,32 +9,18 @@
> > 
> >   * License as published by the Free Software Foundation.
> >   */
> > 
> > -#include 
> > 
> >  #include 
> >  #include 
> >  #include 
> > 
> > -#include 
> > 
> >  #include 
> >  #include 
> > 
> > -#include 
> > -#include 
> > 
> >  #include 
> > 
> > -#include 
> > 
> >  #include 
> > 
> > -#include 
> > -#include 
> > -#include 
> > -#include 
> > -#include 
> > -#include 
> > 
> >  #include 
> > 
> > -#include "powernv.h"
> > 
> >  #include "pci.h"
> > 
> > -#define npu_to_phb(x) container_of(x, struct pnv_phb, npu)
> > -
> > 
> >  /*
> >  
> >   * spinlock to protect initialisation of an npu_context for a particular
> >   * mm_struct.




Re: [PATCH] selftests/powerpc: New TM signal self test

2018-11-28 Thread Michael Neuling
On Wed, 2018-11-28 at 11:23 -0200, Breno Leitao wrote:
> A new self test that forces MSR[TS] to be set without calling any TM
> instruction. This test also tries to cause a page fault at a signal
> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
> when tm_recheckpoint() is called.
> 
> This test is not deterministic since it is hard to guarantee that the page
> access will cause a page fault. Tests have shown that the bug could be
> exposed with few interactions in a buggy kernel. This test is configured to
> loop 5000x, having a good chance to hit the kernel issue in just one run.
> This self test takes less than two seconds to run.

You could try using sigaltstack() to put the ucontext somewhere else. Then you
could play tricks with that memory to try to force a fault.
madvise()+MADV_DONTNEED or fadvise()+POSIX_FADV_DONTNEED might do the trick.

This is more extra credit to make it more reliable. Not a requirement.


> This test uses set/getcontext because the kernel will recheckpoint
> zeroed structures, causing the test to segfault, which is undesired because
> the test needs to rerun, so, there is a signal handler for SIGSEGV which
> will restart the test.

Please put this description at the top of the test also.

Other than that, it looks good.

Mikey

> 
> Signed-off-by: Breno Leitao 
> ---
>  tools/testing/selftests/powerpc/tm/.gitignore |   1 +
>  tools/testing/selftests/powerpc/tm/Makefile   |   3 +-
>  .../powerpc/tm/tm-signal-force-msr.c  | 115 ++
>  3 files changed, 118 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> 
> diff --git a/tools/testing/selftests/powerpc/tm/.gitignore
> b/tools/testing/selftests/powerpc/tm/.gitignore
> index c3ee8393dae8..89679822ebc9 100644
> --- a/tools/testing/selftests/powerpc/tm/.gitignore
> +++ b/tools/testing/selftests/powerpc/tm/.gitignore
> @@ -11,6 +11,7 @@ tm-signal-context-chk-fpu
>  tm-signal-context-chk-gpr
>  tm-signal-context-chk-vmx
>  tm-signal-context-chk-vsx
> +tm-signal-force-msr
>  tm-vmx-unavail
>  tm-unavailable
>  tm-trap
> diff --git a/tools/testing/selftests/powerpc/tm/Makefile
> b/tools/testing/selftests/powerpc/tm/Makefile
> index 9fc2cf6fbc92..58a2ebd13958 100644
> --- a/tools/testing/selftests/powerpc/tm/Makefile
> +++ b/tools/testing/selftests/powerpc/tm/Makefile
> @@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-
> signal-context-chk-fpu
>  
>  TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-
> stack \
>   tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable 
> tm-trap 
> \
> - $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn
> + $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr
>  
>  top_srcdir = ../../../../..
>  include ../../lib.mk
> @@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
>  $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
>  $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-
> error=uninitialized -mvsx
>  $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
> +$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread
>  
>  SIGNAL_CONTEXT_CHK_TESTS := $(patsubst
> %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
>  $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
> diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> new file mode 100644
> index ..4441d61c2328
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp.
> + */
> +
> +#define _GNU_SOURCE
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "tm.h"
> +#include "utils.h"
> +
> +#define __MASK(X)   (1UL<<(X))
> +#define MSR_TS_S_LG 33  /* Trans Mem state: Suspended */
> +#define MSR_TM  __MASK(MSR_TM_LG)   /* Transactional Mem Available */
> +#define MSR_TS_S__MASK(MSR_TS_S_LG) /* Transaction Suspended */

Surely we have these defined somewhere else in selftests? 

> +
> +#define COUNT_MAX   5000/* Number of interactions */
> +
> +/* Setting contexts because the test will crash and we want to recover */
> +ucontext_t init_context, main_context;
> +
> +static int count, first_time;
> +
> +void trap_signal_handler(int signo, siginfo_t *si, void *uc)
> +{
> + ucontext_t *ucp = uc;
> +
> + /*
> +  * Allocating memory in a signal handler, and never freeing it on
> +  * purpose, forcing the heap increase, so, the memory leak is what
> +  * we want here.
> +  */
> + ucp->uc_link = malloc(sizeof(ucontext_t));
> + memcpy(>uc_link, >uc_mcontext, sizeof(ucp->uc_mcontext));
> +
> + /* Forcing to enable MSR[TM] */
> + ucp->uc_mcontext.gp_regs[PT_MSR] 

Re: [PATCH kernel] powerpc/powernv/npu: Remove unused headers and a macro.

2018-11-28 Thread Alexey Kardashevskiy
Ping?


On 28/09/2018 16:48, Alexey Kardashevskiy wrote:
> The macro and few headers are not used so remove them.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 14 --
>  1 file changed, 14 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index 8006c54..3a5c4ed 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -9,32 +9,18 @@
>   * License as published by the Free Software Foundation.
>   */
>  
> -#include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
> -#include 
> -#include 
>  
>  #include 
> -#include 
>  #include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
>  #include 
>  
> -#include "powernv.h"
>  #include "pci.h"
>  
> -#define npu_to_phb(x) container_of(x, struct pnv_phb, npu)
> -
>  /*
>   * spinlock to protect initialisation of an npu_context for a particular
>   * mm_struct.
> 

-- 
Alexey


[PATCH] powerpc: annotate implicit fall throughs

2018-11-28 Thread Stephen Rothwell
There is a plan to build the kernel with -Wimplicit-fallthrough and these
places in the code produced warnings, but because we build arch/powerpc
with -Werror, they became errors.  Fix them up.

This patch produces no change in behaviour, but should be reviewed in
case these are actually bugs not intentional fallthoughs.

Cc: Kees Cook 
Cc: Gustavo A. R. Silva 
Cc: Nicholas Piggin 
Cc: Balbir Singh 
Signed-off-by: Stephen Rothwell 
---
 arch/powerpc/kernel/nvram_64.c| 1 +
 arch/powerpc/platforms/powermac/feature.c | 1 +
 arch/powerpc/xmon/xmon.c  | 1 +
 3 files changed, 3 insertions(+)

The patch is relative to v4.20-rc4 and has been in linux-next for a
few days.

diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
index 22e9d281324d..06e2eda2430e 100644
--- a/arch/powerpc/kernel/nvram_64.c
+++ b/arch/powerpc/kernel/nvram_64.c
@@ -809,6 +809,7 @@ static long dev_nvram_ioctl(struct file *file, unsigned int 
cmd,
 #ifdef CONFIG_PPC_PMAC
case OBSOLETE_PMAC_NVRAM_GET_OFFSET:
printk(KERN_WARNING "nvram: Using obsolete 
PMAC_NVRAM_GET_OFFSET ioctl\n");
+   /* fall through */
case IOC_NVRAM_GET_OFFSET: {
int part, offset;
 
diff --git a/arch/powerpc/platforms/powermac/feature.c 
b/arch/powerpc/platforms/powermac/feature.c
index ed2f54b3f173..a7ec06876b53 100644
--- a/arch/powerpc/platforms/powermac/feature.c
+++ b/arch/powerpc/platforms/powermac/feature.c
@@ -1471,6 +1471,7 @@ static long g5_i2s_enable(struct device_node *node, long 
param, long value)
case 2:
if (macio->type == macio_shasta)
break;
+   /* fall through */
default:
return -ENODEV;
}
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 36b8dc47a3c3..308326f8b7ed 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -4033,6 +4033,7 @@ static int do_spu_cmd(void)
subcmd = inchar();
if (isxdigit(subcmd) || subcmd == '\n')
termch = subcmd;
+   /* fall through */
case 'f':
scanhex();
if (num >= XMON_NUM_SPUS || !spu_info[num].spu) {
-- 
2.20.0.rc1

-- 
Cheers,
Stephen Rothwell


pgpuIEL50L1jB.pgp
Description: OpenPGP digital signature


Re: [PATCH RESEND] powerpc/perf: Update perf_regs structure to include SIER

2018-11-28 Thread Michael Ellerman
Madhavan Srinivasan  writes:
> On 28/11/18 9:04 AM, Michael Ellerman wrote:
>> Arnaldo Carvalho de Melo  writes:
>>
>>> Em Mon, Nov 26, 2018 at 11:34:08PM +0530, Madhavan Srinivasan escreveu:
 On each sample, Sample Instruction Event Register (SIER) content
 is saved in pt_regs. SIER does not have a entry as-is in the pt_regs
 but instead, SIER content is saved in the "dar" register of pt_regs.

 Patch adds another entry to the perf_regs structure to include the "SIER"
 printing which internally maps to the "dar" of pt_regs.
>>> I think the patch is ok, when we talked in Vancouver I thought I saw
>>> something like this before, i.e. adding more registers to a perf_regs.h
>>> file, this was the cset:
>>>
>>>commit 0da0017f72554c005c1a04c3adc5da9eb64fa7e5
>>>Author: Hendrik Brueckner 
>>>Date:   Wed Nov 8 07:30:15 2017 +0100
>>>
>>>s390/perf: extend perf_regs support to include floating-point 
>>> registers
>>>
>>> That I came across because it broke the perf build, making me add this
>>> cset:
>>>
>>>commit 10b9baa701d5023897f70a4acb3bf0235da3dc4f
>>>Author: Arnaldo Carvalho de Melo 
>>>Date:   Tue Nov 28 11:08:41 2017 -0300
>>>
>>>  tools arch s390: Do not include header files from the kernel sources
>>>
>>> :-)
>>>
>>> Michael? What about the ppc specific details?
>> The only possible objection is that not all CPUs have an SIER register,
>> so on CPUs without it you'll get the content of the DAR register rather
>> than the SIER (because we (ab)use the DAR slot of pt_regs for the SIER).
>>
>> Perhaps we should make sure that we return 0 on CPUs that don't have the
>> register?
>
>
> Yes this make sense. We should make it zero instead of having the DAR
> value. I will respin the patch with that change.

Thanks.

There's three cases we need to cover.

The first is when we're not using core-book3s.c at all, ie. when we're
using the Freescale PMU code. You can probably handle that at build time
using CONFIG_FSL_EMB_PERF_EVENT?

Then there's the 32-bit case in core-book3s.c, currently that version of
perf_read_regs() never sets a value in dar.

And then there's the 64-bit case but PPMU_HAS_SIER is not set.

cheers


Re: [PATCH V2 0/5] NestMMU pte upgrade workaround for mprotect

2018-11-28 Thread Andrew Morton
On Wed, 28 Nov 2018 20:04:33 +0530 "Aneesh Kumar K.V" 
 wrote:

> We can upgrade pte access (R -> RW transition) via mprotect. We need
> to make sure we follow the recommended pte update sequence as outlined in
> commit: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle 
> nest MMU hang")
> for such updates. This patch series do that.

The mm bits look (mostly) OK to me.  I suggest all these be merged via
the appropriate powerpc tree.


Re: [PATCH V2 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update

2018-11-28 Thread Andrew Morton
On Wed, 28 Nov 2018 20:04:37 +0530 "Aneesh Kumar K.V" 
 wrote:

> Signed-off-by: Aneesh Kumar K.V 

Some explanation of the motivation would be useful.

>  include/linux/hugetlb.h | 18 ++
>  mm/hugetlb.c|  8 +---
>  2 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 087fd5f48c91..e2a3b0c854eb 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -543,6 +543,24 @@ static inline void set_huge_swap_pte_at(struct mm_struct 
> *mm, unsigned long addr
>   set_huge_pte_at(mm, addr, ptep, pte);
>  }
>  #endif
> +
> +#ifndef huge_ptep_modify_prot_start
> +static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep)
> +{
> + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> +}
> +#endif

#define huge_ptep_modify_prot_start huge_ptep_modify_prot_start

> +#ifndef huge_ptep_modify_prot_commit
> +static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep,
> + pte_t old_pte, pte_t pte)
> +{
> + set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
> +}
> +#endif

#define huge_ptep_modify_prot_commit huge_ptep_modify_prot_commit




Re: powerpc: Fix COFF zImage booting on old powermacs

2018-11-28 Thread Michael Ellerman
On Mon, 2018-11-26 at 22:01:54 UTC, Paul Mackerras wrote:
> Commit 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper
> as a relocatable ET_DYN", 2011-04-12) changed the procedure descriptor
> at the start of crt0.S to have a hard-coded start address of 0x50
> rather than a reference to _zimage_start, presumably because having
> a reference to a symbol introduced a relocation which is awkward to
> handle in a position-independent executable.  Unfortunately, what is
> at 0x50 in the COFF image is not the first instruction, but the
> procedure descriptor itself, that is, a word containing 0x50,
> which is not a valid instruction.  Hence, booting a COFF zImage
> results in a "DEFAULT CATCH!, code=FFF00700" message from Open
> Firmware.
> 
> This fixes the problem by (a) putting the procedure descriptor in the
> data section and (b) adding a branch to _zimage_start as the first
> instruction in the program.
> 
> Fixes: 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper as a 
> relocatable ET_DYN")
> Signed-off-by: Paul Mackerras 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/5564597d51c8ff5b88d95c76255e18

cheers


Re: powerpc/mm: Fix linux page tables build with some configs

2018-11-28 Thread Michael Ellerman
On Mon, 2018-11-26 at 01:59:16 UTC, Michael Ellerman wrote:
> For some configs the build fails with:
> 
>   arch/powerpc/mm/dump_linuxpagetables.c: In function 'populate_markers':
>   arch/powerpc/mm/dump_linuxpagetables.c:306:39: error: 'PKMAP_BASE' 
> undeclared (first use in this function)
>   arch/powerpc/mm/dump_linuxpagetables.c:314:50: error: 'LAST_PKMAP' 
> undeclared (first use in this function)
> 
> These come from highmem.h, including that fixes the build.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/462951cd32e1496dc64b00051dfb77

cheers


Re: [PATCH] powerpc/boot: Copy serial.c in Makefile

2018-11-28 Thread kbuild test robot
Hi Daniel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.20-rc4 next-20181128]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Axtens/powerpc-boot-Copy-serial-c-in-Makefile/20181129-025021
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/boot/epapr.o: In function `epapr_platform_init':
>> epapr.c:(.text+0x184): undefined reference to `serial_console_init'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: use generic DMA mapping code in powerpc V4

2018-11-28 Thread Michal Suchánek
On Wed, 28 Nov 2018 16:55:30 +0100
Christian Zigotzky  wrote:

> On 28 November 2018 at 12:05PM, Michael Ellerman wrote:
> > Nothing specific yet.
> >
> > I'm a bit worried it might break one of the many old obscure platforms
> > we have that aren't well tested.
> >  
> Please don't apply the new DMA mapping code if you don't be sure if it 
> works on all supported PowerPC machines. Is the new DMA mapping code 
> really necessary? It's not really nice, to rewrote code if the old code 
> works perfect. We must not forget, that we work for the end users. Does 
> the end user have advantages with this new code? Is it faster? The old 
> code works without any problems. 

There is another service provided to the users as well: new code that is
cleaner and simpler which allows easier bug fixes and new features.
Without being familiar with the DMA mapping code I cannot really say if
that's the case here.

> I am also worried about this code. How 
> can I test this new DMA mapping code?

I suppose if your machine works it works for you.

Thanks

Michal


Re: use generic DMA mapping code in powerpc V4

2018-11-28 Thread Christian Zigotzky
I will compile and test the kernel from the following Git on my PowerPC 
machines.

http://git.infradead.org/users/hch/misc.git

On 28 November 2018 at 12:05PM, Michael Ellerman wrote:
Nothing specific yet.

I'm a bit worried it might break one of the many old obscure platforms
we have that aren't well tested.



[PATCH v3] powerpc/mm: add exec protection on powerpc 603

2018-11-28 Thread Christophe Leroy
The 603 doesn't have a HASH table, TLB misses are handled by
software. It is then possible to generate page fault when
_PAGE_EXEC is not set like in nohash/32.

There is one "reserved" PTE bit available, this patch uses
it for _PAGE_EXEC.

In order to support it, set_pte_filter() and
set_access_flags_filter() are made common, and the handling
is made dependent on MMU_FTR_HPTE_TABLE

Signed-off-by: Christophe Leroy 
---
 v3: Included the _PAGE_EXEC flag in the existing test in TLB handler, no 
additional insns.

 v2: Amended commit log and removed #ifdef in pagetable dump

 arch/powerpc/include/asm/book3s/32/hash.h  |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h   | 18 +-
 arch/powerpc/include/asm/cputable.h|  8 
 arch/powerpc/kernel/head_32.S  |  2 +-
 arch/powerpc/mm/dump_linuxpagetables-generic.c |  2 --
 arch/powerpc/mm/pgtable.c  | 20 +++-
 6 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/hash.h 
b/arch/powerpc/include/asm/book3s/32/hash.h
index f2892c7ab73e..2a0a467d2985 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -26,6 +26,7 @@
 #define _PAGE_WRITETHRU0x040   /* W: cache write-through */
 #define _PAGE_DIRTY0x080   /* C: page changed */
 #define _PAGE_ACCESSED 0x100   /* R: page referenced */
+#define _PAGE_EXEC 0x200   /* software: exec allowed */
 #define _PAGE_RW   0x400   /* software: user write access allowed */
 #define _PAGE_SPECIAL  0x800   /* software: Special page */
 
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index b849b45429d5..6accf3e686af 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -10,9 +10,9 @@
 /* And here we include common definitions */
 
 #define _PAGE_KERNEL_RO0
-#define _PAGE_KERNEL_ROX   0
+#define _PAGE_KERNEL_ROX   (_PAGE_EXEC)
 #define _PAGE_KERNEL_RW(_PAGE_DIRTY | _PAGE_RW)
-#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW)
+#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)
 
 #define _PAGE_HPTEFLAGS _PAGE_HASHPTE
 
@@ -66,11 +66,11 @@ static inline bool pte_user(pte_t pte)
  */
 #define PAGE_NONE  __pgprot(_PAGE_BASE)
 #define PAGE_SHARED__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
-#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
+#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_EXEC)
 #define PAGE_COPY  __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
 #define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
 
 /* Permission masks used for kernel mappings */
 #define PAGE_KERNEL__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
@@ -318,7 +318,7 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
   int psize)
 {
unsigned long set = pte_val(entry) &
-   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW);
+   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
 
pte_update(ptep, 0, set);
 
@@ -384,7 +384,7 @@ static inline int pte_dirty(pte_t pte)  { 
return !!(pte_val(pte) & _PAGE_DIRTY);
 static inline int pte_young(pte_t pte) { return !!(pte_val(pte) & 
_PAGE_ACCESSED); }
 static inline int pte_special(pte_t pte)   { return !!(pte_val(pte) & 
_PAGE_SPECIAL); }
 static inline int pte_none(pte_t pte)  { return (pte_val(pte) & 
~_PTE_NONE_MASK) == 0; }
-static inline bool pte_exec(pte_t pte) { return true; }
+static inline bool pte_exec(pte_t pte) { return pte_val(pte) & 
_PAGE_EXEC; }
 
 static inline int pte_present(pte_t pte)
 {
@@ -451,7 +451,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
 
 static inline pte_t pte_exprotect(pte_t pte)
 {
-   return pte;
+   return __pte(pte_val(pte) & ~_PAGE_EXEC);
 }
 
 static inline pte_t pte_mkclean(pte_t pte)
@@ -466,7 +466,7 @@ static inline pte_t pte_mkold(pte_t pte)
 
 static inline pte_t pte_mkexec(pte_t pte)
 {
-   return pte;
+   return __pte(pte_val(pte) | _PAGE_EXEC);
 }
 
 static inline pte_t pte_mkpte(pte_t pte)
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 29f49a35d6ee..a0395ccbbe9e 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -296,7 +296,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTRS_PPC601(CPU_FTR_COMMON | CPU_FTR_601 | \
CPU_FTR_COHERENT_ICACHE | CPU_FTR_UNIFIED_ID_CACHE | CPU_FTR_USE_RTC)
 

[PATCH 0/4] powerpc/perf: IMC trace-mode support

2018-11-28 Thread Anju T Sudhakar
IMC (In-Memory collection counters) is a hardware monitoring facility  
that collects large number of hardware performance events. 
POWER9 support two modes for IMC which are the Accumulation mode and   
Trace mode. In Accumulation mode, event counts are accumulated in system   
Memory. Hypervisor then reads the posted counts periodically or when   
requested. In IMC Trace mode, event counted is fixed for cycles and on 
each overflow, hardware snapshots the program counter along with other 
details and writes into memory pointed by LDBAR(ring buffer memory,
hardware wraps around). LDBAR has bit which indicates the IMC trace-mode.   


Trace-IMC Implementation:  
-- 
To enable trace-imc, we need to

* Add trace node in the DTS file for power9, so that the new trace node can
be discovered by the kernel.   

Information included in the DTS file are as follows, (a snippet from  
the ima-catalog)   

TRACE_IMC: trace-events {  
 #address-cells = <0x1>;
 #size-cells = <0x1>;   
 event@1020 {   
 event-name = "cycles" ;
 reg = <0x1020 0x8>;
 desc = "Reference cycles" ;
 }; 
 }; 
 trace@0 {  
 compatible = "ibm,imc-counters";   
 events-prefix = "trace_";  
 reg = <0x0 0x8>;   
 events = < _IMC >;   
 type = <0x2>;  
 size = <0x4>;  
 }; 

OP-BUILD changes needed to include the "trace node" is already pulled in   
to the ima-catalog repo.   

ps://github.com/open-power/op-build/commit/d3e75dc26d1283d7d5eb444bff1ec9e40d5dfc07

* Enchance the opal_imc_counters_* calls to support this new trace mode
in imc. Add support to initialize the trace-mode scom. 

TRACE_IMC_SCOM bit representation: 

0:1 : SAMPSEL  
2:33: CPMC_LOAD
34:40   : CPMC1SEL 
41:47   : CPMC2SEL 
48:50   : BUFFERSIZE   
51:63   : RESERVED 

CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL determines  
the event to count. BUFFRSIZE indicates the memory range. On each overflow,
hardware snapshots program counter along with other details and update the 
memory and reloads the CMPC_LOAD value for the next sampling duration. 
IMC hardware does not support exceptions, so it quietly wraps around if
memory buffer reaches the end. 

Link to the skiboot patches to enhance the opal_imc_counters_* calls:
https://lists.ozlabs.org/pipermail/skiboot/2018-October/012442.html
https://lists.ozlabs.org/pipermail/skiboot/2018-October/012441.html
https://lists.ozlabs.org/pipermail/skiboot/2018-October/012439.html
https://lists.ozlabs.org/pipermail/skiboot/2018-October/012440.html

* Set LDBAR spr to enable imc-trace mode.

LDBAR Layout:

0 : Enable/Disable
1 : 0 -> Accumulation Mode
1 -> Trace Mode
2:3   : Reserved
4-6   : PB scope
7 : 

[PATCH 4/4] powerpc/perf: Trace imc PMU functions

2018-11-28 Thread Anju T Sudhakar
Add PMU functions to support trace-imc.

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/perf/imc-pmu.c | 172 
 1 file changed, 172 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index d9ffe7f03f1e..18af7c3e2345 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1117,6 +1117,170 @@ static int trace_imc_cpu_init(void)
  ppc_trace_imc_cpu_offline);
 }
 
+static u64 get_trace_imc_event_base_addr(void)
+{
+   return (u64)per_cpu(trace_imc_mem, smp_processor_id());
+}
+
+/*
+ * Function to parse trace-imc data obtained
+ * and to prepare the perf sample.
+ */
+static int trace_imc_prepare_sample(struct trace_imc_data *mem,
+   struct perf_sample_data *data,
+   u64 *prev_tb,
+   struct perf_event_header *header,
+   struct perf_event *event)
+{
+   /* Sanity checks for a valid record */
+   if (be64_to_cpu(READ_ONCE(mem->tb1)) > *prev_tb)
+   *prev_tb = be64_to_cpu(READ_ONCE(mem->tb1));
+   else
+   return -EINVAL;
+
+   if ((be64_to_cpu(READ_ONCE(mem->tb1)) & IMC_TRACE_RECORD_TB1_MASK) !=
+be64_to_cpu(READ_ONCE(mem->tb2)))
+   return -EINVAL;
+
+   /* Prepare perf sample */
+   data->ip =  be64_to_cpu(READ_ONCE(mem->ip));
+   data->period = event->hw.last_period;
+
+   header->type = PERF_RECORD_SAMPLE;
+   header->size = sizeof(*header) + event->header_size;
+   header->misc = 0;
+
+   if (is_kernel_addr(data->ip))
+   header->misc |= PERF_RECORD_MISC_KERNEL;
+   else
+   header->misc |= PERF_RECORD_MISC_USER;
+
+   perf_event_header__init_id(header, data, event);
+
+   return 0;
+}
+
+static void dump_trace_imc_data(struct perf_event *event)
+{
+   struct trace_imc_data *mem;
+   int i, ret;
+   u64 prev_tb = 0;
+
+   mem = (struct trace_imc_data *)get_trace_imc_event_base_addr();
+   for (i = 0; i < (trace_imc_mem_size / sizeof(struct trace_imc_data));
+   i++, mem++) {
+   struct perf_sample_data data;
+   struct perf_event_header header;
+
+   ret = trace_imc_prepare_sample(mem, , _tb, , 
event);
+   if (ret) /* Exit, if not a valid record */
+   break;
+   else {
+   /* If this is a valid record, create the sample */
+   struct perf_output_handle handle;
+
+   if (perf_output_begin(, event, header.size))
+   return;
+
+   perf_output_sample(, , , event);
+   perf_output_end();
+   }
+   }
+}
+
+static int trace_imc_event_add(struct perf_event *event, int flags)
+{
+   /* Enable the sched_task to start the engine */
+   perf_sched_cb_inc(event->ctx->pmu);
+   return 0;
+}
+
+static void trace_imc_event_read(struct perf_event *event)
+{
+   dump_trace_imc_data(event);
+}
+
+static void trace_imc_event_stop(struct perf_event *event, int flags)
+{
+   trace_imc_event_read(event);
+}
+
+static void trace_imc_event_start(struct perf_event *event, int flags)
+{
+   return;
+}
+
+static void trace_imc_event_del(struct perf_event *event, int flags)
+{
+   perf_sched_cb_dec(event->ctx->pmu);
+}
+
+void trace_imc_pmu_sched_task(struct perf_event_context *ctx,
+   bool sched_in)
+{
+   int core_id = smp_processor_id() / threads_per_core;
+   struct imc_pmu_ref *ref;
+   u64 local_mem, ldbar_value;
+
+   /* Set trace-imc bit in ldbar and load ldbar with per-thread memory 
address */
+   local_mem = get_trace_imc_event_base_addr();
+   ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | 
TRACE_IMC_ENABLE;
+
+   ref = _imc_refc[core_id];
+   if (!ref)
+   return;
+
+   if (sched_in) {
+   mtspr(SPRN_LDBAR, ldbar_value);
+   mutex_lock(>lock);
+   if (ref->refc == 0) {
+   if (opal_imc_counters_start(OPAL_IMC_COUNTERS_TRACE,
+   
get_hard_smp_processor_id(smp_processor_id( {
+   mutex_unlock(>lock);
+   pr_err("trace-imc: Unable to start the counters 
for core %d\n", core_id);
+   mtspr(SPRN_LDBAR, 0);
+   return;
+   }
+   }
+   ++ref->refc;
+   mutex_unlock(>lock);
+   } else {
+   mtspr(SPRN_LDBAR, 0);
+   mutex_lock(>lock);
+   ref->refc--;
+   if (ref->refc == 0) {
+   if (opal_imc_counters_stop(OPAL_IMC_COUNTERS_TRACE,
+ 

[PATCH 3/4] powerpc/perf: Trace imc events detection and cpuhotplug

2018-11-28 Thread Anju T Sudhakar
Patch detects trace-imc events, does memory initilizations for each online
cpu, and registers cpuhotplug call-backs.

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/perf/imc-pmu.c   | 91 +++
 arch/powerpc/platforms/powernv/opal-imc.c |  3 +
 include/linux/cpuhotplug.h|  1 +
 3 files changed, 95 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 3bef46f8417d..d9ffe7f03f1e 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -43,6 +43,10 @@ static DEFINE_PER_CPU(u64 *, thread_imc_mem);
 static struct imc_pmu *thread_imc_pmu;
 static int thread_imc_mem_size;
 
+/* Trace IMC data structures */
+static DEFINE_PER_CPU(u64 *, trace_imc_mem);
+static int trace_imc_mem_size;
+
 static struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
 {
return container_of(event->pmu, struct imc_pmu, pmu);
@@ -1065,6 +1069,54 @@ static void thread_imc_event_del(struct perf_event 
*event, int flags)
imc_event_update(event);
 }
 
+/*
+ * Allocate a page of memory for each cpu, and load LDBAR with 0.
+ */
+static int trace_imc_mem_alloc(int cpu_id, int size)
+{
+   u64 *local_mem = per_cpu(trace_imc_mem, cpu_id);
+   int phys_id = cpu_to_node(cpu_id), rc = 0;
+
+   if (!local_mem) {
+   local_mem = page_address(alloc_pages_node(phys_id,
+   GFP_KERNEL | __GFP_ZERO | 
__GFP_THISNODE |
+   __GFP_NOWARN, get_order(size)));
+   if (!local_mem)
+   return -ENOMEM;
+   per_cpu(trace_imc_mem, cpu_id) = local_mem;
+
+   /* Initialise the counters for trace mode */
+   rc = opal_imc_counters_init(OPAL_IMC_COUNTERS_TRACE, __pa((void 
*)local_mem),
+   get_hard_smp_processor_id(cpu_id));
+   if (rc) {
+   pr_info("IMC:opal init failed for trace imc\n");
+   return rc;
+   }
+   }
+
+   mtspr(SPRN_LDBAR, 0);
+   return 0;
+}
+
+static int ppc_trace_imc_cpu_online(unsigned int cpu)
+{
+   return trace_imc_mem_alloc(cpu, trace_imc_mem_size);
+}
+
+static int ppc_trace_imc_cpu_offline(unsigned int cpu)
+{
+   mtspr(SPRN_LDBAR, 0);
+   return 0;
+}
+
+static int trace_imc_cpu_init(void)
+{
+   return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
+ "perf/powerpc/imc_trace:online",
+ ppc_trace_imc_cpu_online,
+ ppc_trace_imc_cpu_offline);
+}
+
 /* update_pmu_ops : Populate the appropriate operations for "pmu" */
 static int update_pmu_ops(struct imc_pmu *pmu)
 {
@@ -1186,6 +1238,17 @@ static void cleanup_all_thread_imc_memory(void)
}
 }
 
+static void cleanup_all_trace_imc_memory(void)
+{
+   int i, order = get_order(trace_imc_mem_size);
+
+   for_each_online_cpu(i) {
+   if (per_cpu(trace_imc_mem, i))
+   free_pages((u64)per_cpu(trace_imc_mem, i), order);
+
+   }
+}
+
 /* Function to free the attr_groups which are dynamically allocated */
 static void imc_common_mem_free(struct imc_pmu *pmu_ptr)
 {
@@ -1227,6 +1290,11 @@ static void imc_common_cpuhp_mem_free(struct imc_pmu 
*pmu_ptr)
cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE);
cleanup_all_thread_imc_memory();
}
+
+   if (pmu_ptr->domain == IMC_DOMAIN_TRACE) {
+   cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE);
+   cleanup_all_trace_imc_memory();
+   }
 }
 
 /*
@@ -1309,6 +1377,21 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct 
device_node *parent,
 
thread_imc_pmu = pmu_ptr;
break;
+   case IMC_DOMAIN_TRACE:
+   /* Update the pmu name */
+   pmu_ptr->pmu.name = kasprintf(GFP_KERNEL, "%s%s", s, "_imc");
+   if (!pmu_ptr->pmu.name)
+   return -ENOMEM;
+
+   trace_imc_mem_size = pmu_ptr->counter_mem_size;
+   for_each_online_cpu(cpu) {
+   res = trace_imc_mem_alloc(cpu, trace_imc_mem_size);
+   if (res) {
+   cleanup_all_trace_imc_memory();
+   goto err;
+   }
+   }
+   break;
default:
return -EINVAL;
}
@@ -1381,6 +1464,14 @@ int init_imc_pmu(struct device_node *parent, struct 
imc_pmu *pmu_ptr, int pmu_id
goto err_free_mem;
}
 
+   break;
+   case IMC_DOMAIN_TRACE:
+   ret = trace_imc_cpu_init();
+   if (ret) {
+   cleanup_all_trace_imc_memory();
+   goto err_free_mem;
+   }
+
break;
default:
 

[PATCH 1/4] powerpc/include: Add data structures and macros for IMC trace mode

2018-11-28 Thread Anju T Sudhakar
Add the macros needed for IMC (In-Memory Collection Counters) trace-mode
and data structure to hold the trace-imc record data.
Also, add the new type "OPAL_IMC_COUNTERS_TRACE" in 'opal-api.h', since
there is a new switch case added in the opal-calls for IMC.

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/include/asm/imc-pmu.h  | 39 +
 arch/powerpc/include/asm/opal-api.h |  1 +
 2 files changed, 40 insertions(+)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 69f516ecb2fd..7c2ef0e42661 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -33,6 +33,7 @@
  */
 #define THREAD_IMC_LDBAR_MASK   0x0003e000ULL
 #define THREAD_IMC_ENABLE   0x8000ULL
+#define TRACE_IMC_ENABLE   0x4000ULL
 
 /*
  * For debugfs interface for imc-mode and imc-command
@@ -59,6 +60,34 @@ struct imc_events {
char *scale;
 };
 
+/*
+ * Trace IMC hardware updates a 64bytes record on
+ * Core Performance Monitoring Counter (CPMC)
+ * overflow. Here is the layout for the trace imc record
+ *
+ * DW 0 : Timebase
+ * DW 1 : Program Counter
+ * DW 2 : PIDR information
+ * DW 3 : CPMC1
+ * DW 4 : CPMC2
+ * DW 5 : CPMC3
+ * Dw 6 : CPMC4
+ * DW 7 : Timebase
+ * .
+ *
+ * The following is the data structure to hold trace imc data.
+ */
+struct trace_imc_data {
+   u64 tb1;
+   u64 ip;
+   u64 val;
+   u64 cpmc1;
+   u64 cpmc2;
+   u64 cpmc3;
+   u64 cpmc4;
+   u64 tb2;
+};
+
 /* Event attribute array index */
 #define IMC_FORMAT_ATTR0
 #define IMC_EVENT_ATTR 1
@@ -68,6 +97,13 @@ struct imc_events {
 /* PMU Format attribute macros */
 #define IMC_EVENT_OFFSET_MASK  0xULL
 
+/*
+ * Macro to mask bits 0:21 of first double word(which is the timebase) to
+ * compare with 8th double word (timebase) of trace imc record data.
+ */
+#define IMC_TRACE_RECORD_TB1_MASK  0x3ffULL
+
+
 /*
  * Device tree parser code detects IMC pmu support and
  * registers new IMC pmus. This structure will hold the
@@ -113,6 +149,7 @@ struct imc_pmu_ref {
 
 enum {
IMC_TYPE_THREAD = 0x1,
+   IMC_TYPE_TRACE  = 0x2,
IMC_TYPE_CORE   = 0x4,
IMC_TYPE_CHIP   = 0x10,
 };
@@ -123,6 +160,8 @@ enum {
 #define IMC_DOMAIN_NEST1
 #define IMC_DOMAIN_CORE2
 #define IMC_DOMAIN_THREAD  3
+/* For trace-imc the domain is still thread but it operates in trace-mode */
+#define IMC_DOMAIN_TRACE   4
 
 extern int init_imc_pmu(struct device_node *parent,
struct imc_pmu *pmu_ptr, int pmu_id);
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 870fb7b239ea..a4130b21b159 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -1118,6 +1118,7 @@ enum {
 enum {
OPAL_IMC_COUNTERS_NEST = 1,
OPAL_IMC_COUNTERS_CORE = 2,
+   OPAL_IMC_COUNTERS_TRACE = 3,
 };
 
 
-- 
2.17.1



[PATCH 2/4] powerpc/perf: Rearrange setting of ldbar for thread-imc

2018-11-28 Thread Anju T Sudhakar
LDBAR holds the memory address allocated for each cpu. For thread-imc
the mode bit (i.e bit 1) of LDBAR is set to accumulation.
Currently, ldbar is loaded with per cpu memory address and mode set to
accumulation at boot time.

To enable trace-imc, the mode bit of ldbar should be set to 'trace'. So to
accommodate trace-mode of IMC, reposition setting of ldbar for thread-imc
to thread_imc_event_add(). Also reset ldbar at thread_imc_event_del().

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/perf/imc-pmu.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f292a3f284f1..3bef46f8417d 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -806,8 +806,11 @@ static int core_imc_event_init(struct perf_event *event)
 }
 
 /*
- * Allocates a page of memory for each of the online cpus, and write the
- * physical base address of that page to the LDBAR for that cpu.
+ * Allocates a page of memory for each of the online cpus, and load
+ * LDBAR with 0.
+ * The physical base address of the page allocated for a cpu will be
+ * written to the LDBAR for that cpu, when the thread-imc event
+ * is added.
  *
  * LDBAR Register Layout:
  *
@@ -825,7 +828,7 @@ static int core_imc_event_init(struct perf_event *event)
  */
 static int thread_imc_mem_alloc(int cpu_id, int size)
 {
-   u64 ldbar_value, *local_mem = per_cpu(thread_imc_mem, cpu_id);
+   u64 *local_mem = per_cpu(thread_imc_mem, cpu_id);
int nid = cpu_to_node(cpu_id);
 
if (!local_mem) {
@@ -842,9 +845,7 @@ static int thread_imc_mem_alloc(int cpu_id, int size)
per_cpu(thread_imc_mem, cpu_id) = local_mem;
}
 
-   ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | 
THREAD_IMC_ENABLE;
-
-   mtspr(SPRN_LDBAR, ldbar_value);
+   mtspr(SPRN_LDBAR, 0);
return 0;
 }
 
@@ -995,6 +996,7 @@ static int thread_imc_event_add(struct perf_event *event, 
int flags)
 {
int core_id;
struct imc_pmu_ref *ref;
+   u64 ldbar_value, *local_mem = per_cpu(thread_imc_mem, 
smp_processor_id());
 
if (flags & PERF_EF_START)
imc_event_start(event, flags);
@@ -1003,6 +1005,9 @@ static int thread_imc_event_add(struct perf_event *event, 
int flags)
return -EINVAL;
 
core_id = smp_processor_id() / threads_per_core;
+   ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | 
THREAD_IMC_ENABLE;
+   mtspr(SPRN_LDBAR, ldbar_value);
+
/*
 * imc pmus are enabled only when it is used.
 * See if this is triggered for the first time.
@@ -1034,11 +1039,7 @@ static void thread_imc_event_del(struct perf_event 
*event, int flags)
int core_id;
struct imc_pmu_ref *ref;
 
-   /*
-* Take a snapshot and calculate the delta and update
-* the event counter values.
-*/
-   imc_event_update(event);
+   mtspr(SPRN_LDBAR, 0);
 
core_id = smp_processor_id() / threads_per_core;
ref = _imc_refc[core_id];
@@ -1057,6 +1058,11 @@ static void thread_imc_event_del(struct perf_event 
*event, int flags)
ref->refc = 0;
}
mutex_unlock(>lock);
+   /*
+* Take a snapshot and calculate the delta and update
+* the event counter values.
+*/
+   imc_event_update(event);
 }
 
 /* update_pmu_ops : Populate the appropriate operations for "pmu" */
-- 
2.17.1



Re: use generic DMA mapping code in powerpc V4

2018-11-28 Thread Christian Zigotzky

On 28 November 2018 at 12:05PM, Michael Ellerman wrote:

Nothing specific yet.

I'm a bit worried it might break one of the many old obscure platforms
we have that aren't well tested.

Please don't apply the new DMA mapping code if you don't be sure if it 
works on all supported PowerPC machines. Is the new DMA mapping code 
really necessary? It's not really nice, to rewrote code if the old code 
works perfect. We must not forget, that we work for the end users. Does 
the end user have advantages with this new code? Is it faster? The old 
code works without any problems. I am also worried about this code. How 
can I test this new DMA mapping code?


Thanks



Re: [PATCH 0/1] Fix NULL pointer access in PowerPC MSI teardown code

2018-11-28 Thread Radu Rendec
Hi Michael,

On Wed, Nov 28, 2018 at 6:00 AM Michael Ellerman  wrote:
>
> Radu Rendec  writes:
> >
> > The assumption in arch_teardown_msi_irqs() is wrong and results in a
> > function call on a NULL pointer. An example of how this can happen is
> > included in the actual patch header. In my case, it happens when the PCI
> > hardware is configured during kernel start-up, because my controller
> > doesn't support MSI and the ops are NULL.
>
> What hardware are you on?

I'm on Freescale MPC8378 - old stuff, but still going strong :)
The MSI capable device is a Broadcom PEX 8613 (a 3-port PCIe switch).

> > I'm proposing the attached patch to fix the problem. It basically just
> > checks the pointer before the function call.
>
> Yeah that patch looks good to me.
>
> I suspect this bug was introduced in:
>
>   6b2fd7efeb88 ("PCI/MSI/PPC: Remove arch_msi_check_device()")
>
> Previously we had that check routine which would run before any of the
> MSI setup had been done, and so if there were no MSI ops then we bailed
> out early and didn't call teardown.
>
> I guess since then (2014) we haven't tested an MSI capable device on a
> system that isn't MSI capable?

Thanks for looking into this. You're probably right, it really looks like
that patch could have introduced this bug.

Cheers,
Radu


[PATCH V2 5/5] arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgrade

2018-11-28 Thread Aneesh Kumar K.V
NestMMU requires us to mark the pte invalid and flush the tlb when we do a
RW upgrade of pte. We fixed a variant of this in the fault path in commit
Fixes: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle 
nest MMU hang")

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hugetlb.h | 12 
 arch/powerpc/mm/hugetlbpage-radix.c  | 17 
 arch/powerpc/mm/hugetlbpage.c| 29 
 3 files changed, 58 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index 5b0177733994..66c1e4f88d65 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -13,6 +13,10 @@ radix__hugetlb_get_unmapped_area(struct file *file, unsigned 
long addr,
unsigned long len, unsigned long pgoff,
unsigned long flags);
 
+extern void radix__huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep,
+   pte_t old_pte, pte_t pte);
+
 static inline int hstate_get_psize(struct hstate *hstate)
 {
unsigned long shift;
@@ -42,4 +46,12 @@ static inline bool gigantic_page_supported(void)
 /* hugepd entry valid bit */
 #define HUGEPD_VAL_BITS(0x8000UL)
 
+#define huge_ptep_modify_prot_start huge_ptep_modify_prot_start
+extern pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
+unsigned long addr, pte_t *ptep);
+
+#define huge_ptep_modify_prot_commit huge_ptep_modify_prot_commit
+extern void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
+unsigned long addr, pte_t *ptep,
+pte_t old_pte, pte_t new_pte);
 #endif
diff --git a/arch/powerpc/mm/hugetlbpage-radix.c 
b/arch/powerpc/mm/hugetlbpage-radix.c
index 2486bee0f93e..1f77d71e7708 100644
--- a/arch/powerpc/mm/hugetlbpage-radix.c
+++ b/arch/powerpc/mm/hugetlbpage-radix.c
@@ -90,3 +90,20 @@ radix__hugetlb_get_unmapped_area(struct file *file, unsigned 
long addr,
 
return vm_unmapped_area();
 }
+
+void radix__huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
+unsigned long addr, pte_t *ptep,
+pte_t old_pte, pte_t pte)
+{
+   struct mm_struct *mm = vma->vm_mm;
+
+   /*
+* To avoid NMMU hang while relaxing access we need to flush the tlb 
before
+* we set the new value.
+*/
+   if (is_pte_rw_upgrade(pte_val(old_pte), pte_val(pte)) &&
+   (atomic_read(>context.copros) > 0))
+   flush_hugetlb_page(vma, addr);
+
+   set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+}
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8cf035e68378..39d33a3d0dc6 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -912,3 +912,32 @@ int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned 
long addr,
 
return 1;
 }
+
+#ifdef CONFIG_PPC_BOOK3S_64
+pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+   unsigned long pte_val;
+   /*
+* Clear the _PAGE_PRESENT so that no hardware parallel update is
+* possible. Also keep the pte_present true so that we don't take
+* wrong fault.
+*/
+   pte_val = pte_update(vma->vm_mm, addr, ptep,
+_PAGE_PRESENT, _PAGE_INVALID, 1);
+
+   return __pte(pte_val);
+}
+EXPORT_SYMBOL(huge_ptep_modify_prot_start);
+
+void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long 
addr,
+ pte_t *ptep, pte_t old_pte, pte_t pte)
+{
+
+   if (radix_enabled())
+   return radix__huge_ptep_modify_prot_commit(vma, addr, ptep,
+  old_pte, pte);
+   set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+}
+EXPORT_SYMBOL(huge_ptep_modify_prot_commit);
+#endif
-- 
2.19.1



[PATCH V2 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update

2018-11-28 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 include/linux/hugetlb.h | 18 ++
 mm/hugetlb.c|  8 +---
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 087fd5f48c91..e2a3b0c854eb 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -543,6 +543,24 @@ static inline void set_huge_swap_pte_at(struct mm_struct 
*mm, unsigned long addr
set_huge_pte_at(mm, addr, ptep, pte);
 }
 #endif
+
+#ifndef huge_ptep_modify_prot_start
+static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
+{
+   return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+}
+#endif
+
+#ifndef huge_ptep_modify_prot_commit
+static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep,
+   pte_t old_pte, pte_t pte)
+{
+   set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+}
+#endif
+
 #else  /* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page(v, a, r) NULL
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7f2a28ab46d5..e5f5cda14f28 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4388,10 +4388,12 @@ unsigned long hugetlb_change_protection(struct 
vm_area_struct *vma,
continue;
}
if (!huge_pte_none(pte)) {
-   pte = huge_ptep_get_and_clear(mm, address, ptep);
-   pte = pte_mkhuge(huge_pte_modify(pte, newprot));
+   pte_t old_pte;
+
+   old_pte = huge_ptep_modify_prot_start(vma, address, 
ptep);
+   pte = pte_mkhuge(huge_pte_modify(old_pte, newprot));
pte = arch_make_huge_pte(pte, vma, NULL, 0);
-   set_huge_pte_at(mm, address, ptep, pte);
+   huge_ptep_modify_prot_commit(vma, address, ptep, 
old_pte, pte);
pages++;
}
spin_unlock(ptl);
-- 
2.19.1



[PATCH V2 3/5] arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade.

2018-11-28 Thread Aneesh Kumar K.V
NestMMU requires us to mark the pte invalid and flush the tlb when we do a
RW upgrade of pte. We fixed a variant of this in the fault path in commit
Fixes: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle 
nest MMU hang")

Do the same for mprotect upgrades.

Hugetlb is handled in the next patch.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 18 +
 arch/powerpc/include/asm/book3s/64/radix.h   |  4 +++
 arch/powerpc/mm/pgtable-book3s64.c   | 27 
 arch/powerpc/mm/pgtable-radix.c  | 18 +
 4 files changed, 67 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 2e6ada28da64..92eaea164700 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1314,6 +1314,24 @@ static inline int pud_pfn(pud_t pud)
BUILD_BUG();
return 0;
 }
+#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
+pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
+void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
+pte_t *, pte_t, pte_t);
+
+/*
+ * Returns true for a R -> RW upgrade of pte
+ */
+static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long 
new_val)
+{
+   if (!(old_val & _PAGE_READ))
+   return false;
+
+   if ((!(old_val & _PAGE_WRITE)) && (new_val & _PAGE_WRITE))
+   return true;
+
+   return false;
+}
 
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 7d1a3d1543fc..5ab134eeed20 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -127,6 +127,10 @@ extern void radix__ptep_set_access_flags(struct 
vm_area_struct *vma, pte_t *ptep
 pte_t entry, unsigned long address,
 int psize);
 
+extern void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
+  unsigned long addr, pte_t *ptep,
+  pte_t old_pte, pte_t pte);
+
 static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr,
   unsigned long set)
 {
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 9f93c9f985c5..3d126353b11e 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -482,3 +482,30 @@ void arch_report_meminfo(struct seq_file *m)
   atomic_long_read(_pages_count[MMU_PAGE_1G]) << 20);
 }
 #endif /* CONFIG_PROC_FS */
+
+pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
+pte_t *ptep)
+{
+   unsigned long pte_val;
+
+   /*
+* Clear the _PAGE_PRESENT so that no hardware parallel update is
+* possible. Also keep the pte_present true so that we don't take
+* wrong fault.
+*/
+   pte_val = pte_update(vma->vm_mm, addr, ptep, _PAGE_PRESENT, 
_PAGE_INVALID, 0);
+
+   return __pte(pte_val);
+
+}
+EXPORT_SYMBOL(ptep_modify_prot_start);
+
+void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
+pte_t *ptep, pte_t old_pte, pte_t pte)
+{
+   if (radix_enabled())
+   return radix__ptep_modify_prot_commit(vma, addr,
+ ptep, old_pte, pte);
+   set_pte_at(vma->vm_mm, addr, ptep, pte);
+}
+EXPORT_SYMBOL(ptep_modify_prot_commit);
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 931156069a81..14938186df5b 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -1063,3 +1063,21 @@ void radix__ptep_set_access_flags(struct vm_area_struct 
*vma, pte_t *ptep,
}
/* See ptesync comment in radix__set_pte_at */
 }
+
+void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep,
+   pte_t old_pte, pte_t pte)
+{
+   struct mm_struct *mm = vma->vm_mm;
+
+   /*
+* To avoid NMMU hang while relaxing access we need to flush the tlb 
before
+* we set the new value. We need to do this only for radix, because hash
+* translation does flush when updating the linux pte.
+*/
+   if (is_pte_rw_upgrade(pte_val(old_pte), pte_val(pte)) &&
+   (atomic_read(>context.copros) > 0))
+   flush_tlb_page(vma, addr);
+
+   set_pte_at(mm, addr, ptep, pte);
+}
-- 
2.19.1



[PATCH V2 2/5] mm: update ptep_modify_prot_commit to take old pte value as arg

2018-11-28 Thread Aneesh Kumar K.V
Architectures like ppc64 requires to do a conditional tlb flush based on the old
and new value of pte. Enable that by passing old pte value as the arg.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/s390/include/asm/pgtable.h | 3 ++-
 arch/s390/mm/pgtable.c  | 2 +-
 arch/x86/include/asm/paravirt.h | 2 +-
 fs/proc/task_mmu.c  | 8 +---
 include/asm-generic/pgtable.h   | 2 +-
 mm/memory.c | 8 
 mm/mprotect.c   | 6 +++---
 7 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5d730199e37b..76dc344edb8c 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1070,7 +1070,8 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm,
 
 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
 pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
-void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, pte_t *, 
pte_t);
+void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
+pte_t *, pte_t, pte_t);
 
 #define __HAVE_ARCH_PTEP_CLEAR_FLUSH
 static inline pte_t ptep_clear_flush(struct vm_area_struct *vma,
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 29c0a21cd34a..b283b92722cc 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -322,7 +322,7 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, 
unsigned long addr,
 EXPORT_SYMBOL(ptep_modify_prot_start);
 
 void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
-pte_t *ptep, pte_t pte)
+pte_t *ptep, pte_t old_pte, pte_t pte)
 {
pgste_t pgste;
struct mm_struct *mm = vma->vm_mm;
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 1154f154025d..0d75a4f60500 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -429,7 +429,7 @@ static inline pte_t ptep_modify_prot_start(struct 
vm_area_struct *vma, unsigned
 }
 
 static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, 
unsigned long addr,
-  pte_t *ptep, pte_t pte)
+  pte_t *ptep, pte_t old_pte, pte_t 
pte)
 {
struct mm_struct *mm = vma->vm_mm;
 
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 9952d7185170..8d62891d38a8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -940,10 +940,12 @@ static inline void clear_soft_dirty(struct vm_area_struct 
*vma,
pte_t ptent = *pte;
 
if (pte_present(ptent)) {
-   ptent = ptep_modify_prot_start(vma, addr, pte);
-   ptent = pte_wrprotect(ptent);
+   pte_t old_pte;
+
+   old_pte = ptep_modify_prot_start(vma, addr, pte);
+   ptent = pte_wrprotect(old_pte);
ptent = pte_clear_soft_dirty(ptent);
-   ptep_modify_prot_commit(vma, addr, pte, ptent);
+   ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
} else if (is_swap_pte(ptent)) {
ptent = pte_swp_clear_soft_dirty(ptent);
set_pte_at(vma->vm_mm, addr, pte, ptent);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index c9897dcc46c4..37039e918f17 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -619,7 +619,7 @@ static inline pte_t ptep_modify_prot_start(struct 
vm_area_struct *vma,
  */
 static inline void ptep_modify_prot_commit(struct vm_area_struct *vma,
   unsigned long addr,
-  pte_t *ptep, pte_t pte)
+  pte_t *ptep, pte_t old_pte, pte_t 
pte)
 {
__ptep_modify_prot_commit(vma->vm_mm, addr, ptep, pte);
 }
diff --git a/mm/memory.c b/mm/memory.c
index d36b0eaa7862..4f3ddaedc764 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3568,7 +3568,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
int last_cpupid;
int target_nid;
bool migrated = false;
-   pte_t pte;
+   pte_t pte, old_pte;
bool was_writable = pte_savedwrite(vmf->orig_pte);
int flags = 0;
 
@@ -3588,12 +3588,12 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 * Make it present again, Depending on how arch implementes non
 * accessible ptes, some can allow access by kernel mode.
 */
-   pte = ptep_modify_prot_start(vma, vmf->address, vmf->pte);
-   pte = pte_modify(pte, vma->vm_page_prot);
+   old_pte = ptep_modify_prot_start(vma, vmf->address, vmf->pte);
+   pte = pte_modify(old_pte, vma->vm_page_prot);
pte = pte_mkyoung(pte);
if (was_writable)
pte = pte_mkwrite(pte);
-   ptep_modify_prot_commit(vma, vmf->address, vmf->pte, pte);

[PATCH V2 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg

2018-11-28 Thread Aneesh Kumar K.V
Some architecture may want to call flush_tlb_range from these helpers.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/s390/include/asm/pgtable.h | 4 ++--
 arch/s390/mm/pgtable.c  | 6 --
 arch/x86/include/asm/paravirt.h | 7 +--
 fs/proc/task_mmu.c  | 4 ++--
 include/asm-generic/pgtable.h   | 8 
 mm/memory.c | 4 ++--
 mm/mprotect.c   | 4 ++--
 7 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 063732414dfb..5d730199e37b 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1069,8 +1069,8 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm,
 }
 
 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
-pte_t ptep_modify_prot_start(struct mm_struct *, unsigned long, pte_t *);
-void ptep_modify_prot_commit(struct mm_struct *, unsigned long, pte_t *, 
pte_t);
+pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
+void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, pte_t *, 
pte_t);
 
 #define __HAVE_ARCH_PTEP_CLEAR_FLUSH
 static inline pte_t ptep_clear_flush(struct vm_area_struct *vma,
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index f2cc7da473e4..29c0a21cd34a 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -301,12 +301,13 @@ pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long 
addr,
 }
 EXPORT_SYMBOL(ptep_xchg_lazy);
 
-pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr,
+pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
 pte_t *ptep)
 {
pgste_t pgste;
pte_t old;
int nodat;
+   struct mm_struct *mm = vma->vm_mm;
 
preempt_disable();
pgste = ptep_xchg_start(mm, addr, ptep);
@@ -320,10 +321,11 @@ pte_t ptep_modify_prot_start(struct mm_struct *mm, 
unsigned long addr,
 }
 EXPORT_SYMBOL(ptep_modify_prot_start);
 
-void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
 pte_t *ptep, pte_t pte)
 {
pgste_t pgste;
+   struct mm_struct *mm = vma->vm_mm;
 
if (!MACHINE_HAS_NX)
pte_val(pte) &= ~_PAGE_NOEXEC;
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 4bf42f9e4eea..1154f154025d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -417,19 +417,22 @@ static inline pgdval_t pgd_val(pgd_t pgd)
 }
 
 #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
-static inline pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long 
addr,
+static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, 
unsigned long addr,
   pte_t *ptep)
 {
pteval_t ret;
+   struct mm_struct *mm = vma->vm_mm;
 
ret = PVOP_CALL3(pteval_t, mmu.ptep_modify_prot_start, mm, addr, ptep);
 
return (pte_t) { .pte = ret };
 }
 
-static inline void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long 
addr,
+static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, 
unsigned long addr,
   pte_t *ptep, pte_t pte)
 {
+   struct mm_struct *mm = vma->vm_mm;
+
if (sizeof(pteval_t) > sizeof(long))
/* 5 arg words */
pv_ops.mmu.ptep_modify_prot_commit(mm, addr, ptep, pte);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 47c3764c469b..9952d7185170 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -940,10 +940,10 @@ static inline void clear_soft_dirty(struct vm_area_struct 
*vma,
pte_t ptent = *pte;
 
if (pte_present(ptent)) {
-   ptent = ptep_modify_prot_start(vma->vm_mm, addr, pte);
+   ptent = ptep_modify_prot_start(vma, addr, pte);
ptent = pte_wrprotect(ptent);
ptent = pte_clear_soft_dirty(ptent);
-   ptep_modify_prot_commit(vma->vm_mm, addr, pte, ptent);
+   ptep_modify_prot_commit(vma, addr, pte, ptent);
} else if (is_swap_pte(ptent)) {
ptent = pte_swp_clear_soft_dirty(ptent);
set_pte_at(vma->vm_mm, addr, pte, ptent);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 359fb935ded6..c9897dcc46c4 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -606,22 +606,22 @@ static inline void __ptep_modify_prot_commit(struct 
mm_struct *mm,
  * queue the update to be done at some later time.  The update must be
  * actually committed before the pte lock is released, however.
  */
-static inline pte_t ptep_modify_prot_start(struct mm_struct *mm,
+static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
   

[PATCH V2 0/5] NestMMU pte upgrade workaround for mprotect

2018-11-28 Thread Aneesh Kumar K.V


We can upgrade pte access (R -> RW transition) via mprotect. We need
to make sure we follow the recommended pte update sequence as outlined in
commit: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle 
nest MMU hang")
for such updates. This patch series do that.

Changes from V1:
* Restrict ths only for R->RW upgrade. We don't need to do this for Autonuma
* Restrict this only for radix translation mode.

Aneesh Kumar K.V (5):
  mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg
  mm: update ptep_modify_prot_commit to take old pte value as arg
  arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade.
  mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update
  arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW
upgrade

 arch/powerpc/include/asm/book3s/64/hugetlb.h | 12 
 arch/powerpc/include/asm/book3s/64/pgtable.h | 18 
 arch/powerpc/include/asm/book3s/64/radix.h   |  4 +++
 arch/powerpc/mm/hugetlbpage-radix.c  | 17 
 arch/powerpc/mm/hugetlbpage.c| 29 
 arch/powerpc/mm/pgtable-book3s64.c   | 27 ++
 arch/powerpc/mm/pgtable-radix.c  | 18 
 arch/s390/include/asm/pgtable.h  |  5 ++--
 arch/s390/mm/pgtable.c   |  8 --
 arch/x86/include/asm/paravirt.h  |  9 --
 fs/proc/task_mmu.c   |  8 --
 include/asm-generic/pgtable.h| 10 +++
 include/linux/hugetlb.h  | 18 
 mm/hugetlb.c |  8 --
 mm/memory.c  |  8 +++---
 mm/mprotect.c|  6 ++--
 16 files changed, 179 insertions(+), 26 deletions(-)

-- 
2.19.1



[PATCH] selftests/powerpc: New TM signal self test

2018-11-28 Thread Breno Leitao
A new self test that forces MSR[TS] to be set without calling any TM
instruction. This test also tries to cause a page fault at a signal
handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
when tm_recheckpoint() is called.

This test is not deterministic since it is hard to guarantee that the page
access will cause a page fault. Tests have shown that the bug could be
exposed with few interactions in a buggy kernel. This test is configured to
loop 5000x, having a good chance to hit the kernel issue in just one run.
This self test takes less than two seconds to run.

This test uses set/getcontext because the kernel will recheckpoint
zeroed structures, causing the test to segfault, which is undesired because
the test needs to rerun, so, there is a signal handler for SIGSEGV which
will restart the test.

Signed-off-by: Breno Leitao 
---
 tools/testing/selftests/powerpc/tm/.gitignore |   1 +
 tools/testing/selftests/powerpc/tm/Makefile   |   3 +-
 .../powerpc/tm/tm-signal-force-msr.c  | 115 ++
 3 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore 
b/tools/testing/selftests/powerpc/tm/.gitignore
index c3ee8393dae8..89679822ebc9 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -11,6 +11,7 @@ tm-signal-context-chk-fpu
 tm-signal-context-chk-gpr
 tm-signal-context-chk-vmx
 tm-signal-context-chk-vsx
+tm-signal-force-msr
 tm-vmx-unavail
 tm-unavailable
 tm-trap
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 9fc2cf6fbc92..58a2ebd13958 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu
 
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable 
tm-trap \
-   $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn
+   $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr
 
 top_srcdir = ../../../../..
 include ../../lib.mk
@@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
 $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized 
-mvsx
 $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
+$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread
 
 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst 
%,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
 $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
new file mode 100644
index ..4441d61c2328
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "tm.h"
+#include "utils.h"
+
+#define __MASK(X)   (1UL<<(X))
+#define MSR_TS_S_LG 33  /* Trans Mem state: Suspended */
+#define MSR_TM  __MASK(MSR_TM_LG)   /* Transactional Mem Available */
+#define MSR_TS_S__MASK(MSR_TS_S_LG) /* Transaction Suspended */
+
+#define COUNT_MAX   5000/* Number of interactions */
+
+/* Setting contexts because the test will crash and we want to recover */
+ucontext_t init_context, main_context;
+
+static int count, first_time;
+
+void trap_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+   ucontext_t *ucp = uc;
+
+   /*
+* Allocating memory in a signal handler, and never freeing it on
+* purpose, forcing the heap increase, so, the memory leak is what
+* we want here.
+*/
+   ucp->uc_link = malloc(sizeof(ucontext_t));
+   memcpy(>uc_link, >uc_mcontext, sizeof(ucp->uc_mcontext));
+
+   /* Forcing to enable MSR[TM] */
+   ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
+
+   /*
+* A fork inside a signal handler seems to be more efficient than a
+* fork() prior to the signal being raised.
+*/
+   if (fork() == 0) {
+   /*
+* Both child and parent will return, but, child returns
+* with count set so it will exit in the next segfault.
+* Parent will continue to loop.
+*/
+   count = COUNT_MAX;
+   }
+
+   /*
+* If the change above does not hit the bug, it will cause a
+* segmentation fault, since the ck structures are NULL.
+*/
+}
+
+void seg_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+

[PATCH v10 9/9] powerpc: clean stack pointers naming

2018-11-28 Thread Christophe Leroy
Some stack pointers used to also be thread_info pointers
and were called tp. Now that they are only stack pointers,
rename them sp.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c  | 17 +++--
 arch/powerpc/kernel/setup_64.c | 20 ++--
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 62cfccf4af89..754f0efc507b 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs)
 void do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
-   void *curtp, *irqtp, *sirqtp;
+   void *cursp, *irqsp, *sirqsp;
 
/* Switch to the irq stack to handle this */
-   curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
-   irqtp = hardirq_ctx[raw_smp_processor_id()];
-   sirqtp = softirq_ctx[raw_smp_processor_id()];
+   cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
+   irqsp = hardirq_ctx[raw_smp_processor_id()];
+   sirqsp = softirq_ctx[raw_smp_processor_id()];
 
/* Already there ? */
-   if (unlikely(curtp == irqtp || curtp == sirqtp)) {
+   if (unlikely(cursp == irqsp || cursp == sirqsp)) {
__do_irq(regs);
set_irq_regs(old_regs);
return;
}
/* Switch stack and call */
-   call_do_irq(regs, irqtp);
+   call_do_irq(regs, irqsp);
 
set_irq_regs(old_regs);
 }
@@ -732,10 +732,7 @@ void irq_ctx_init(void)
 
 void do_softirq_own_stack(void)
 {
-   void *irqtp;
-
-   irqtp = softirq_ctx[smp_processor_id()];
-   call_do_softirq(irqtp);
+   call_do_softirq(softirq_ctx[smp_processor_id()]);
 }
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 0b227d0891ec..49765ccbc8c0 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -718,22 +718,22 @@ void __init emergency_stack_init(void)
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
for_each_possible_cpu(i) {
-   void *ti;
+   void *sp;
 
-   ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
-   paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE;
+   sp = alloc_stack(limit, i);
+   memset(sp, 0, THREAD_SIZE);
+   paca_ptrs[i]->emergency_sp = sp + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
-   ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
-   paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE;
+   sp = alloc_stack(limit, i);
+   memset(sp, 0, THREAD_SIZE);
+   paca_ptrs[i]->nmi_emergency_sp = sp + THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
-   ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
-   paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE;
+   sp = alloc_stack(limit, i);
+   memset(sp, 0, THREAD_SIZE);
+   paca_ptrs[i]->mc_emergency_sp = sp + THREAD_SIZE;
 #endif
}
 }
-- 
2.13.3



[PATCH v10 8/9] powerpc/64: Remove CURRENT_THREAD_INFO

2018-11-28 Thread Christophe Leroy
Now that current_thread_info is located at the beginning of 'current'
task struct, CURRENT_THREAD_INFO macro is not really needed any more.

This patch replaces it by loads of the value at PACACURRENT(r13).

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/exception-64s.h   |  4 ++--
 arch/powerpc/include/asm/thread_info.h |  4 
 arch/powerpc/kernel/entry_64.S | 10 +-
 arch/powerpc/kernel/exceptions-64e.S   |  2 +-
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  6 +++---
 8 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..dd6a5ae7a769 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -671,7 +671,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define RUNLATCH_ON\
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r3, r1);\
+   ld  r3, PACACURRENT(r13);   \
ld  r4,TI_LOCAL_FLAGS(r3);  \
andi.   r0,r4,_TLF_RUNLATCH;\
beqlppc64_runlatch_on_trampoline;   \
@@ -721,7 +721,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 #ifdef CONFIG_PPC_970_NAP
 #define FINISH_NAP \
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r9,TI_LOCAL_FLAGS(r11); \
andi.   r10,r9,_TLF_NAPPING;\
bnelpower4_fixup_nap;   \
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index c959b8d66cac..8e1d0195ac36 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -17,10 +17,6 @@
 
 #define THREAD_SIZE(1 << THREAD_SHIFT)
 
-#ifdef CONFIG_PPC64
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#endif
-
 #ifndef __ASSEMBLY__
 #include 
 #include 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 03cbf409c3f8..b017bd3da1ed 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -158,7 +158,7 @@ system_call:/* label this so stack 
traces look sane */
li  r10,IRQS_ENABLED
std r10,SOFTE(r1)
 
-   CURRENT_THREAD_INFO(r11, r1)
+   ld  r11, PACACURRENT(r13)
ld  r10,TI_FLAGS(r11)
andi.   r11,r10,_TIF_SYSCALL_DOTRACE
bne .Lsyscall_dotrace   /* does not return */
@@ -205,7 +205,7 @@ system_call:/* label this so stack 
traces look sane */
ld  r3,RESULT(r1)
 #endif
 
-   CURRENT_THREAD_INFO(r12, r1)
+   ld  r12, PACACURRENT(r13)
 
ld  r8,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3S
@@ -336,7 +336,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* Repopulate r9 and r10 for the syscall path */
addir9,r1,STACK_FRAME_OVERHEAD
-   CURRENT_THREAD_INFO(r10, r1)
+   ld  r10, PACACURRENT(r13)
ld  r10,TI_FLAGS(r10)
 
cmpldi  r0,NR_syscalls
@@ -734,7 +734,7 @@ _GLOBAL(ret_from_except_lite)
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r3,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3E
ld  r10,PACACURRENT(r13)
@@ -848,7 +848,7 @@ resume_kernel:
 1: bl  preempt_schedule_irq
 
/* Re-test flags and eventually loop */
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r4,TI_FLAGS(r9)
andi.   r0,r4,_TIF_NEED_RESCHED
bne 1b
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 231d066b4a3d..dfafcd0af009 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -469,7 +469,7 @@ exc_##n##_bad_stack:
\
  * interrupts happen before the wait instruction.
  */
 #define CHECK_NAPPING()
\
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r10,TI_LOCAL_FLAGS(r11);\
andi.   r9,r10,_TLF_NAPPING;\
beq+1f; \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 89d32bb79d5e..1cbe1a78df57 100644

[PATCH v10 7/9] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU

2018-11-28 Thread Christophe Leroy
Now that thread_info is similar to task_struct, it's address is in r2
so CURRENT_THREAD_INFO() macro is useless. This patch removes it.

At the same time, as the 'cpu' field is not anymore in thread_info,
this patch renames it to TASK_CPU.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/include/asm/thread_info.h |  2 --
 arch/powerpc/kernel/asm-offsets.c  |  2 +-
 arch/powerpc/kernel/entry_32.S | 43 --
 arch/powerpc/kernel/epapr_hcalls.S |  5 ++--
 arch/powerpc/kernel/head_fsl_booke.S   |  5 ++--
 arch/powerpc/kernel/idle_6xx.S |  8 +++
 arch/powerpc/kernel/idle_e500.S|  8 +++
 arch/powerpc/kernel/misc_32.S  |  3 +--
 arch/powerpc/mm/hash_low_32.S  | 14 ---
 arch/powerpc/sysdev/6xx-suspend.S  |  5 ++--
 11 files changed, 35 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index e04e988c86b1..d43c4036bb74 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -425,7 +425,7 @@ ifdef CONFIG_SMP
 prepare: task_cpu_prepare
 
 task_cpu_prepare: prepare0
-   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
 endif
 
 # Check toolchain versions:
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index d91523c2c7d8..c959b8d66cac 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -19,8 +19,6 @@
 
 #ifdef CONFIG_PPC64
 #define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#else
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(mr dest, r2)
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 94ac190a0b16..03439785c2ea 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -96,7 +96,7 @@ int main(void)
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
 #ifdef CONFIG_SMP
-   OFFSET(TI_CPU, task_struct, cpu);
+   OFFSET(TASK_CPU, task_struct, cpu);
 #endif
 
 #ifdef CONFIG_LIVEPATCH
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index bd3b146e18a3..d0c546ce387e 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -168,8 +168,7 @@ transfer_to_handler:
tophys(r11,r11)
addir11,r11,global_dbcr0@l
 #ifdef CONFIG_SMP
-   CURRENT_THREAD_INFO(r9, r1)
-   lwz r9,TI_CPU(r9)
+   lwz r9,TASK_CPU(r2)
slwir9,r9,3
add r11,r11,r9
 #endif
@@ -180,8 +179,7 @@ transfer_to_handler:
stw r12,4(r11)
 #endif
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
+   tophys(r9, r2)
ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
 #endif
 
@@ -195,8 +193,7 @@ transfer_to_handler:
ble-stack_ovf   /* then the kernel stack overflowed */
 5:
 #if defined(CONFIG_6xx) || defined(CONFIG_E500)
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9,r9)   /* check local flags */
+   tophys(r9,r2)   /* check local flags */
lwz r12,TI_LOCAL_FLAGS(r9)
mtcrf   0x01,r12
bt- 31-TLF_NAPPING,4f
@@ -345,8 +342,7 @@ _GLOBAL(DoSyscall)
mtmsr   r11
 1:
 #endif /* CONFIG_TRACE_IRQFLAGS */
-   CURRENT_THREAD_INFO(r10, r1)
-   lwz r11,TI_FLAGS(r10)
+   lwz r11,TI_FLAGS(r2)
andi.   r11,r11,_TIF_SYSCALL_DOTRACE
bne-syscall_dotrace
 syscall_dotrace_cont:
@@ -379,13 +375,12 @@ ret_from_syscall:
lwz r3,GPR3(r1)
 #endif
mr  r6,r3
-   CURRENT_THREAD_INFO(r12, r1)
/* disable interrupts so current_thread_info()->flags can't change */
LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */
/* Note: We don't bother telling lockdep about it */
SYNC
MTMSRD(r10)
-   lwz r9,TI_FLAGS(r12)
+   lwz r9,TI_FLAGS(r2)
li  r8,-MAX_ERRNO
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne-syscall_exit_work
@@ -432,8 +427,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
andi.   r4,r8,MSR_PR
beq 3f
-   CURRENT_THREAD_INFO(r4, r1)
-   ACCOUNT_CPU_USER_EXIT(r4, r5, r7)
+   ACCOUNT_CPU_USER_EXIT(r2, r5, r7)
 3:
 #endif
lwz r4,_LINK(r1)
@@ -526,7 +520,7 @@ syscall_exit_work:
/* Clear per-syscall TIF flags if any are set.  */
 
li  r11,_TIF_PERSYSCALL_MASK
-   addir12,r12,TI_FLAGS
+   addir12,r2,TI_FLAGS
 3: lwarx   r8,0,r12

[PATCH v10 6/9] powerpc: 'current_set' is now a table of task_struct pointers

2018-11-28 Thread Christophe Leroy
The table of pointers 'current_set' has been used for retrieving
the stack and current. They used to be thread_info pointers as
they were pointing to the stack and current was taken from the
'task' field of the thread_info.

Now, the pointers of 'current_set' table are now both pointers
to task_struct and pointers to thread_info.

As they are used to get current, and the stack pointer is
retrieved from current's stack field, this patch changes
their type to task_struct, and renames secondary_ti to
secondary_current.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ++--
 arch/powerpc/kernel/head_32.S |  6 +++---
 arch/powerpc/kernel/head_44x.S|  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S  |  4 ++--
 arch/powerpc/kernel/smp.c | 10 --
 5 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index ec691d489656..b1b999b22a12 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -23,8 +23,8 @@
 #include 
 
 /* SMP */
-extern struct thread_info *current_set[NR_CPUS];
-extern struct thread_info *secondary_ti;
+extern struct task_struct *current_set[NR_CPUS];
+extern struct task_struct *secondary_current;
 void start_secondary(void *unused);
 
 /* kexec */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 44dfd73b2a62..ba0341bd5a00 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -842,9 +842,9 @@ __secondary_start:
 #endif /* CONFIG_6xx */
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   tophys(r1,r1)
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   tophys(r2,r2)
+   lwz r2,secondary_current@l(r2)
tophys(r1,r2)
lwz r1,TASK_STACK(r1)
 
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 2c7e90f36358..48e4de4dfd0c 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1021,8 +1021,8 @@ _GLOBAL(start_secondary_47x)
/* Now we can get our task struct and real stack pointer */
 
/* Get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* Current stack pointer */
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index b8a2b789677e..0d27bfff52dd 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1076,8 +1076,8 @@ __secondary_start:
bl  call_setup_cpu
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* stack */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index aa4517686f90..a41fa8924004 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -76,7 +76,7 @@
 static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 #endif
 
-struct thread_info *secondary_ti;
+struct task_struct *secondary_current;
 bool has_big_cores;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
@@ -664,7 +664,7 @@ void smp_send_stop(void)
 }
 #endif /* CONFIG_NMI_IPI */
 
-struct thread_info *current_set[NR_CPUS];
+struct task_struct *current_set[NR_CPUS];
 
 static void smp_store_cpu_info(int id)
 {
@@ -929,7 +929,7 @@ void smp_prepare_boot_cpu(void)
paca_ptrs[boot_cpuid]->__current = current;
 #endif
set_numa_node(numa_cpu_lookup_table[boot_cpuid]);
-   current_set[boot_cpuid] = task_thread_info(current);
+   current_set[boot_cpuid] = current;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -1014,15 +1014,13 @@ static bool secondaries_inhibited(void)
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 {
-   struct thread_info *ti = task_thread_info(idle);
-
 #ifdef CONFIG_PPC64
paca_ptrs[cpu]->__current = idle;
paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
 THREAD_SIZE - STACK_FRAME_OVERHEAD;
 #endif
idle->cpu = cpu;
-   secondary_ti = current_set[cpu] = ti;
+   secondary_current = current_set[cpu] = idle;
 }
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
-- 
2.13.3



[PATCH v10 5/9] powerpc: regain entire stack space

2018-11-28 Thread Christophe Leroy
thread_info is not anymore in the stack, so the entire stack
can now be used.

There is also no risk anymore of corrupting task_cpu(p) with a
stack overflow so the patch removes the test.

When doing this, an explicit test for NULL stack pointer is
needed in validate_sp() as it is not anymore implicitely covered
by the sizeof(thread_info) gap.

In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/irq.h   | 10 +-
 arch/powerpc/include/asm/processor.h |  3 +--
 arch/powerpc/kernel/asm-offsets.c|  1 -
 arch/powerpc/kernel/entry_32.S   | 14 --
 arch/powerpc/kernel/irq.c| 19 +--
 arch/powerpc/kernel/misc_32.S|  6 ++
 arch/powerpc/kernel/process.c| 32 +---
 arch/powerpc/kernel/setup_64.c   |  8 
 8 files changed, 38 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 2efbae8d93be..966ddd4d2414 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -48,9 +48,9 @@ struct pt_regs;
  * Per-cpu stacks for handling critical, debug and machine check
  * level interrupts.
  */
-extern struct thread_info *critirq_ctx[NR_CPUS];
-extern struct thread_info *dbgirq_ctx[NR_CPUS];
-extern struct thread_info *mcheckirq_ctx[NR_CPUS];
+extern void *critirq_ctx[NR_CPUS];
+extern void *dbgirq_ctx[NR_CPUS];
+extern void *mcheckirq_ctx[NR_CPUS];
 extern void exc_lvl_ctx_init(void);
 #else
 #define exc_lvl_ctx_init()
@@ -59,8 +59,8 @@ extern void exc_lvl_ctx_init(void);
 /*
  * Per-cpu stacks for handling hard and soft interrupts.
  */
-extern struct thread_info *hardirq_ctx[NR_CPUS];
-extern struct thread_info *softirq_ctx[NR_CPUS];
+extern void *hardirq_ctx[NR_CPUS];
+extern void *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
 void call_do_softirq(void *sp);
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 15acb282a876..8179b64871ed 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -325,8 +325,7 @@ struct thread_struct {
 #define ARCH_MIN_TASKALIGN 16
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
-#define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
+#define INIT_SP_LIMIT  ((unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 1fb52206c106..94ac190a0b16 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -92,7 +92,6 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(NMI_MASK, NMI_MASK);
 #else
-   DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index fa7a69ffb37a..bd3b146e18a3 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -97,14 +97,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,_SRR1(r11)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,SAVED_KSP_LIMIT(r11)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
@@ -121,14 +118,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,crit_srr1@l(0)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,saved_ksp_limit@l(0)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 3fdb6b6973cf..62cfccf4af89 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -618,9 +618,8 @@ static inline void check_stack_overflow(void)
sp = current_stack_pointer() & (THREAD_SIZE-1);
 
/* check for stack overflow: is there less than 2KB free? */
-   if (unlikely(sp < (sizeof(struct thread_info) + 2048))) {
-   pr_err("do_IRQ: stack overflow: %ld\n",
-   sp - sizeof(struct thread_info));
+  

[PATCH v10 4/9] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK

2018-11-28 Thread Christophe Leroy
This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

This has the following consequences:
- thread_info is now located at the beginning of task_struct.
- The 'cpu' field is now in task_struct, and only exists when
CONFIG_SMP is active.
- thread_info doesn't have anymore the 'task' field.

This patch:
- Removes all recopy of thread_info struct when the stack changes.
- Changes the CURRENT_THREAD_INFO() macro to point to current.
- Selects CONFIG_THREAD_INFO_IN_TASK.
- Modifies raw_smp_processor_id() to get ->cpu from current without
including linux/sched.h to avoid circular inclusion and without
including asm/asm-offsets.h to avoid symbol names duplication
between ASM constants and C constants.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  7 +
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/smp.h | 17 +++-
 arch/powerpc/include/asm/thread_info.h | 17 ++--
 arch/powerpc/kernel/asm-offsets.c  |  7 +++--
 arch/powerpc/kernel/entry_32.S |  9 +++
 arch/powerpc/kernel/exceptions-64e.S   | 11 
 arch/powerpc/kernel/head_32.S  |  6 ++---
 arch/powerpc/kernel/head_44x.S |  4 +--
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_booke.h   |  8 +-
 arch/powerpc/kernel/head_fsl_booke.S   |  7 +++--
 arch/powerpc/kernel/irq.c  | 47 +-
 arch/powerpc/kernel/kgdb.c | 28 
 arch/powerpc/kernel/machine_kexec_64.c |  6 ++---
 arch/powerpc/kernel/setup_64.c | 21 ---
 arch/powerpc/kernel/smp.c  |  2 +-
 arch/powerpc/net/bpf_jit32.h   |  5 ++--
 19 files changed, 52 insertions(+), 154 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8be31261aec8..fd634a293d7f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -238,6 +238,7 @@ config PPC
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
+   select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
#
# Please keep this list sorted alphabetically.
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 0bff8bd82ed5..e04e988c86b1 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -421,6 +421,13 @@ else
 endif
 endif
 
+ifdef CONFIG_SMP
+prepare: task_cpu_prepare
+
+task_cpu_prepare: prepare0
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+endif
+
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
 checkbin:
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 0b8a735b6d85..64271e562fed 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -157,7 +157,7 @@ extern int ptrace_put_reg(struct task_struct *task, int 
regno,
  unsigned long data);
 
 #define current_pt_regs() \
-   ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) 
- 1)
+   ((struct pt_regs *)((unsigned long)task_stack_page(current) + 
THREAD_SIZE) - 1)
 /*
  * We use the least-significant bit of the trap field to indicate
  * whether we have saved the full set of registers, or only a
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 41695745032c..0de717e16dd6 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -83,7 +83,22 @@ int is_cpu_dead(unsigned int cpu);
 /* 32-bit */
 extern int smp_hw_index[];
 
-#define raw_smp_processor_id() (current_thread_info()->cpu)
+/*
+ * This is particularly ugly: it appears we can't actually get the definition
+ * of task_struct here, but we need access to the CPU this task is running on.
+ * Instead of using task_struct we're using _TASK_CPU which is extracted from
+ * asm-offsets.h by kbuild to get the current processor ID.
+ *
+ * This also needs to be safeguarded when building asm-offsets.s because at
+ * that time _TASK_CPU is not defined yet. It could have been guarded by
+ * _TASK_CPU itself, but we want the build to fail if _TASK_CPU is missing
+ * when building something else than asm-offsets.s
+ */
+#ifdef GENERATING_ASM_OFFSETS
+#define raw_smp_processor_id() (0)
+#else
+#define raw_smp_processor_id() (*(unsigned int *)((void *)current + 
_TASK_CPU))
+#endif
 #define hard_smp_processor_id()(smp_hw_index[smp_processor_id()])
 
 static inline int 

[PATCH v10 3/9] powerpc: Prepare for moving thread_info into task_struct

2018-11-28 Thread Christophe Leroy
This patch cleans the powerpc kernel before activating
CONFIG_THREAD_INFO_IN_TASK:
- The purpose of the pointer given to call_do_softirq() and
call_do_irq() is to point the new stack ==> change it to void* and
rename it 'sp'
- Don't use CURRENT_THREAD_INFO() to locate the stack.
- Fix a few comments.
- Replace current_thread_info()->task by current
- Remove unnecessary casts to thread_info, as they'll become invalid
once thread_info is not in stack anymore.
- Rename THREAD_INFO to TASK_STASK: as it is in fact the offset of the
pointer to the stack in task_struct, this pointer will not be impacted
by the move of THREAD_INFO.
- Makes TASK_STACK available to PPC64. PPC64 will need it to get the
stack pointer from current once the thread_info have been moved.
- Modifies klp_init_thread_info() to take task_struct pointer argument.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/irq.h   |  4 ++--
 arch/powerpc/include/asm/livepatch.h |  7 ---
 arch/powerpc/include/asm/processor.h |  4 ++--
 arch/powerpc/include/asm/reg.h   |  2 +-
 arch/powerpc/kernel/asm-offsets.c|  2 +-
 arch/powerpc/kernel/entry_32.S   |  2 +-
 arch/powerpc/kernel/entry_64.S   |  2 +-
 arch/powerpc/kernel/head_32.S|  4 ++--
 arch/powerpc/kernel/head_40x.S   |  4 ++--
 arch/powerpc/kernel/head_44x.S   |  2 +-
 arch/powerpc/kernel/head_8xx.S   |  2 +-
 arch/powerpc/kernel/head_booke.h |  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S |  4 ++--
 arch/powerpc/kernel/irq.c|  2 +-
 arch/powerpc/kernel/misc_32.S|  4 ++--
 arch/powerpc/kernel/process.c|  8 
 arch/powerpc/kernel/setup-common.c   |  2 +-
 arch/powerpc/kernel/setup_32.c   | 15 +--
 arch/powerpc/kernel/smp.c|  4 +++-
 19 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index ee39ce56b2a2..2efbae8d93be 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
-extern void call_do_softirq(struct thread_info *tp);
-extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp);
+void call_do_softirq(void *sp);
+void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
 extern void __init init_IRQ(void);
 extern void __do_irq(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index 47a03b9b528b..8a81d10ccc82 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -43,13 +43,14 @@ static inline unsigned long 
klp_get_ftrace_location(unsigned long faddr)
return ftrace_location_range(faddr, faddr + 16);
 }
 
-static inline void klp_init_thread_info(struct thread_info *ti)
+static inline void klp_init_thread_info(struct task_struct *p)
 {
+   struct thread_info *ti = task_thread_info(p);
/* + 1 to account for STACK_END_MAGIC */
-   ti->livepatch_sp = (unsigned long *)(ti + 1) + 1;
+   ti->livepatch_sp = end_of_stack(p) + 1;
 }
 #else
-static void klp_init_thread_info(struct thread_info *ti) { }
+static inline void klp_init_thread_info(struct task_struct *p) { }
 #endif /* CONFIG_LIVEPATCH */
 
 #endif /* _ASM_POWERPC_LIVEPATCH_H */
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 692f7383d461..15acb282a876 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -40,7 +40,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -326,7 +326,7 @@ struct thread_struct {
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
 #define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack)
+   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index de52c3166ba4..95b68bdf34df 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1060,7 +1060,7 @@
  * - SPRG9 debug exception scratch
  *
  * All 32-bit:
- * - SPRG3 current thread_info pointer
+ * - SPRG3 current thread_struct physical addr pointer
  *(virtual on BookE, physical on others)
  *
  * 32-bit classic:
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9ffc72ded73a..b2b52e002a76 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -90,10 +90,10 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(NMI_MASK, NMI_MASK);
 #else
-   OFFSET(THREAD_INFO, task_struct, stack);
DEFINE(THREAD_INFO_GAP, 

[PATCH v10 2/9] powerpc: Only use task_struct 'cpu' field on SMP

2018-11-28 Thread Christophe Leroy
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field
gets moved into task_struct and only defined when CONFIG_SMP is set.

This patch ensures that TI_CPU is only used when CONFIG_SMP is set and
that task_struct 'cpu' field is not used directly out of SMP code.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kernel/head_fsl_booke.S | 2 ++
 arch/powerpc/kernel/misc_32.S| 4 
 arch/powerpc/xmon/xmon.c | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index e2750b856c8f..05b574f416b3 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -243,8 +243,10 @@ set_ivor:
li  r0,0
stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
 
+#ifdef CONFIG_SMP
CURRENT_THREAD_INFO(r22, r1)
stw r24, TI_CPU(r22)
+#endif
 
bl  early_init
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 695b24a2d954..2f0fe8bfc078 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll)
or  r4,r4,r5
mtspr   SPRN_HID1,r4
 
+#ifdef CONFIG_SMP
/* Store new HID1 image */
CURRENT_THREAD_INFO(r6, r1)
lwz r6,TI_CPU(r6)
slwir6,r6,2
+#else
+   li  r6, 0
+#endif
addis   r6,r6,nap_save_hid1@ha
stw r4,nap_save_hid1@l(r6)
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fee96bbabf42..e64f9bb51025 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2995,7 +2995,7 @@ static void show_task(struct task_struct *tsk)
printf("%px %016lx %6d %6d %c %2d %s\n", tsk,
tsk->thread.ksp,
tsk->pid, rcu_dereference(tsk->parent)->pid,
-   state, task_thread_info(tsk)->cpu,
+   state, task_cpu(tsk),
tsk->comm);
 }
 
-- 
2.13.3



[PATCH v10 1/9] book3s/64: avoid circular header inclusion in mmu-hash.h

2018-11-28 Thread Christophe Leroy
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h
includes asm/current.h. This generates a circular dependency.
To avoid that, asm/processor.h shall not be included in mmu-hash.h

In order to do that, this patch moves into a new header called
asm/task_size_user64.h the information from asm/processor.h required
by mmu-hash.h

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
 arch/powerpc/include/asm/processor.h  | 34 +-
 arch/powerpc/include/asm/task_size_user64.h   | 42 +++
 arch/powerpc/kvm/book3s_hv_hmi.c  |  1 +
 4 files changed, 45 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size_user64.h

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 12e522807f9f..b2aba048301e 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -23,7 +23,7 @@
  */
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ee58526cb6c2..692f7383d461 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -95,40 +95,8 @@ void release_thread(struct task_struct *);
 #endif
 
 #ifdef CONFIG_PPC64
-/*
- * 64-bit user address space can have multiple limits
- * For now supported values are:
- */
-#define TASK_SIZE_64TB  (0x4000UL)
-#define TASK_SIZE_128TB (0x8000UL)
-#define TASK_SIZE_512TB (0x0002UL)
-#define TASK_SIZE_1PB   (0x0004UL)
-#define TASK_SIZE_2PB   (0x0008UL)
-/*
- * With 52 bits in the address we can support
- * upto 4PB of range.
- */
-#define TASK_SIZE_4PB   (0x0010UL)
 
-/*
- * For now 512TB is only supported with book3s and 64K linux page size.
- */
-#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
-/*
- * Max value currently used:
- */
-#define TASK_SIZE_USER64   TASK_SIZE_4PB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
-#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
-#else
-#define TASK_SIZE_USER64   TASK_SIZE_64TB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
-/*
- * We don't need to allocate extended context ids for 4K page size, because
- * we limit the max effective address on this config to 64TB.
- */
-#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
-#endif
+#include 
 
 /*
  * 32-bit user address space is 4GB - 1 page
diff --git a/arch/powerpc/include/asm/task_size_user64.h 
b/arch/powerpc/include/asm/task_size_user64.h
new file mode 100644
index ..a4043075864b
--- /dev/null
+++ b/arch/powerpc/include/asm/task_size_user64.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_TASK_SIZE_USER64_H
+#define _ASM_POWERPC_TASK_SIZE_USER64_H
+
+#ifdef CONFIG_PPC64
+/*
+ * 64-bit user address space can have multiple limits
+ * For now supported values are:
+ */
+#define TASK_SIZE_64TB  (0x4000UL)
+#define TASK_SIZE_128TB (0x8000UL)
+#define TASK_SIZE_512TB (0x0002UL)
+#define TASK_SIZE_1PB   (0x0004UL)
+#define TASK_SIZE_2PB   (0x0008UL)
+/*
+ * With 52 bits in the address we can support
+ * upto 4PB of range.
+ */
+#define TASK_SIZE_4PB   (0x0010UL)
+
+/*
+ * For now 512TB is only supported with book3s and 64K linux page size.
+ */
+#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
+/*
+ * Max value currently used:
+ */
+#define TASK_SIZE_USER64   TASK_SIZE_4PB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
+#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
+#else
+#define TASK_SIZE_USER64   TASK_SIZE_64TB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
+/*
+ * We don't need to allocate extended context ids for 4K page size, because
+ * we limit the max effective address on this config to 64TB.
+ */
+#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
+#endif
+
+#endif /* CONFIG_PPC64 */
+#endif /* _ASM_POWERPC_TASK_SIZE_USER64_H */
diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c
index e3f738eb1cac..64b5011475c7 100644
--- a/arch/powerpc/kvm/book3s_hv_hmi.c
+++ b/arch/powerpc/kvm/book3s_hv_hmi.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void wait_for_subcore_guest_exit(void)
 {
-- 
2.13.3



[PATCH v10 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2018-11-28 Thread Christophe Leroy
The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

Changes since v9:
 - Rebased on 183cbf93be88 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Fixed conflict on xmon

Changes since v8:
 - Rebased on e589b79e40d9 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Main impact was conflicts due to commit 9a8dd708d547 ("memblock: rename 
memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*")

Changes since v7:
 - Rebased on fb6c6ce7907d ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")

Changes since v6:
 - Fixed validate_sp() to exclude NULL sp in 'regain entire stack space' patch 
(early crash with CONFIG_KMEMLEAK)

Changes since v5:
 - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding
 - Fixed PPC_BPF_LOAD_CPU() macro

Changes since v4:
 - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is 
not
 already existing, was due to spaces instead of a tab in the Makefile

Changes since RFC v3: (based on Nick's review)
 - Renamed task_size.h to task_size_user64.h to better relate to what it 
contains.
 - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs 
moved to a separate patch.
 - Removed CURRENT_THREAD_INFO macro completely.
 - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is 
defined.
 - Added a patch at the end to rename 'tp' pointers to 'sp' pointers
 - Renamed 'tp' into 'sp' pointers in preparation patch when relevant
 - Fixed a few commit logs
 - Fixed checkpatch report.

Changes since RFC v2:
 - Removed the modification of names in asm-offsets
 - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu 
in CFLAGS
 - Modified asm/smp.h to use the offset set in CFLAGS
 - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch
 - Moved the modification of current_pt_regs in the patch activating 
CONFIG_THREAD_INFO_IN_TASK

Changes since RFC v1:
 - Removed the first patch which was modifying header inclusion order in timer
 - Modified some names in asm-offsets to avoid conflicts when including 
asm-offsets in C files
 - Modified asm/smp.h to avoid having to include linux/sched.h (using 
asm-offsets instead)
 - Moved some changes from the activation patch to the preparation patch.

Christophe Leroy (9):
  book3s/64: avoid circular header inclusion in mmu-hash.h
  powerpc: Only use task_struct 'cpu' field on SMP
  powerpc: Prepare for moving thread_info into task_struct
  powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
  powerpc: regain entire stack space
  powerpc: 'current_set' is now a table of task_struct pointers
  powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
  powerpc/64: Remove CURRENT_THREAD_INFO
  powerpc: clean stack pointers naming

 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  7 +++
 arch/powerpc/include/asm/asm-prototypes.h  |  4 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h  |  2 +-
 arch/powerpc/include/asm/exception-64s.h   |  4 +-
 arch/powerpc/include/asm/irq.h | 14 ++---
 arch/powerpc/include/asm/livepatch.h   |  7 ++-
 arch/powerpc/include/asm/processor.h   | 39 +
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/reg.h |  2 +-
 arch/powerpc/include/asm/smp.h | 17 +-
 arch/powerpc/include/asm/task_size_user64.h| 42 ++
 arch/powerpc/include/asm/thread_info.h | 19 ---
 arch/powerpc/kernel/asm-offsets.c  | 10 ++--
 arch/powerpc/kernel/entry_32.S | 66 --
 arch/powerpc/kernel/entry_64.S | 12 ++--
 arch/powerpc/kernel/epapr_hcalls.S |  5 +-
 arch/powerpc/kernel/exceptions-64e.S   | 13 +
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/head_32.S  | 14 ++---
 arch/powerpc/kernel/head_40x.S |  4 +-
 arch/powerpc/kernel/head_44x.S |  8 +--
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_8xx.S |  2 +-
 arch/powerpc/kernel/head_booke.h   | 12 +---
 arch/powerpc/kernel/head_fsl_booke.S   | 16 +++---
 arch/powerpc/kernel/idle_6xx.S |  8 +--
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_e500.S|  8 +--
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/irq.c  | 77 +-
 arch/powerpc/kernel/kgdb.c   

[PATCH v7 16/16] powerpc/8xx: regroup TLB handler routines

2018-11-28 Thread Christophe Leroy
As this is running with MMU off, the CPU only does speculative
fetch for code in the same page.

Following the significant size reduction of TLB handler routines,
the side handlers can be brought back close to the main part,
ie in the same page.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 112 -
 1 file changed, 54 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 0a4f8a9c85ff..b171b7c0a0e7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -399,6 +399,23 @@ InstructionTLBMiss:
rfi
 #endif
 
+#ifndef CONFIG_PIN_TLB_TEXT
+ITLBMissLinear:
+   mtcrr11
+   /* Set 8M byte page and mark it valid */
+   li  r11, MI_PS8MEG | MI_SVALID
+   mtspr   SPRN_MI_TWC, r11
+   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
+   ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT
+   mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
+
+0: mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+   patch_site  0b, patch__itlbmiss_exit_2
+#endif
+
. = 0x1200
 DataStoreTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
@@ -484,6 +501,43 @@ DataStoreTLBMiss:
rfi
 #endif
 
+DTLBMissIMMR:
+   mtcrr11
+   /* Set 512k byte guarded page and mark it valid */
+   li  r10, MD_PS512K | MD_GUARDED | MD_SVALID
+   mtspr   SPRN_MD_TWC, r10
+   mfspr   r10, SPRN_IMMR  /* Get current IMMR */
+   rlwinm  r10, r10, 0, 0xfff8 /* Get 512 kbytes boundary */
+   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT | _PAGE_NO_CACHE
+   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
+
+   li  r11, RPN_PATTERN
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+
+0: mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+   patch_site  0b, patch__dtlbmiss_exit_2
+
+DTLBMissLinear:
+   mtcrr11
+   /* Set 8M byte page and mark it valid */
+   li  r11, MD_PS8MEG | MD_SVALID
+   mtspr   SPRN_MD_TWC, r11
+   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
+   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT
+   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
+
+   li  r11, RPN_PATTERN
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+
+0: mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+   patch_site  0b, patch__dtlbmiss_exit_3
+
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
@@ -583,64 +637,6 @@ InstructionBreakpoint:
 
. = 0x2000
 
-/*
- * Bottom part of DataStoreTLBMiss handlers for IMMR area and linear RAM.
- * not enough space in the DataStoreTLBMiss area.
- */
-DTLBMissIMMR:
-   mtcrr11
-   /* Set 512k byte guarded page and mark it valid */
-   li  r10, MD_PS512K | MD_GUARDED | MD_SVALID
-   mtspr   SPRN_MD_TWC, r10
-   mfspr   r10, SPRN_IMMR  /* Get current IMMR */
-   rlwinm  r10, r10, 0, 0xfff8 /* Get 512 kbytes boundary */
-   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT | _PAGE_NO_CACHE
-   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
-
-   li  r11, RPN_PATTERN
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
-
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
-   rfi
-   patch_site  0b, patch__dtlbmiss_exit_2
-
-DTLBMissLinear:
-   mtcrr11
-   /* Set 8M byte page and mark it valid */
-   li  r11, MD_PS8MEG | MD_SVALID
-   mtspr   SPRN_MD_TWC, r11
-   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
-   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT
-   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
-
-   li  r11, RPN_PATTERN
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
-
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
-   rfi
-   patch_site  0b, patch__dtlbmiss_exit_3
-
-#ifndef CONFIG_PIN_TLB_TEXT
-ITLBMissLinear:
-   mtcrr11
-   /* Set 8M byte page and mark it valid */
-   li  r11, MI_PS8MEG | MI_SVALID
-   mtspr   SPRN_MI_TWC, r11
-   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
-   ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
-   

[PATCH v7 15/16] powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers

2018-11-28 Thread Christophe Leroy
This patch reworks the TLB Miss handler in order to not use r12
register, hence avoiding having to save it into SPRN_SPRG_SCRATCH2.

In the DAR Fixup code we can now use SPRN_M_TW, freeing
SPRN_SPRG_SCRATCH2.

Then SPRN_SPRG_SCRATCH2 may be used for something else in the future.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 110 ++---
 1 file changed, 49 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 85fb4b8bf6c7..0a4f8a9c85ff 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -302,90 +302,87 @@ SystemCall:
  */
 
 #ifdef CONFIG_8xx_CPU15
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr) \
-   additmp, addr, PAGE_SIZE;   \
-   tlbie   tmp;\
-   additmp, addr, -PAGE_SIZE;  \
-   tlbie   tmp
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)  \
+   addiaddr, addr, PAGE_SIZE;  \
+   tlbie   addr;   \
+   addiaddr, addr, -(PAGE_SIZE << 1);  \
+   tlbie   addr;   \
+   addiaddr, addr, PAGE_SIZE
 #else
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr)
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)
 #endif
 
 InstructionTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
mtspr   SPRN_SPRG_SCRATCH1, r11
-#ifdef ITLB_MISS_KERNEL
-   mtspr   SPRN_SPRG_SCRATCH2, r12
 #endif
 
/* If we are faulting a kernel address, we have to use the
 * kernel page tables.
 */
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
-   INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
+   INVALIDATE_ADJACENT_PAGES_CPU15(r10)
mtspr   SPRN_MD_EPN, r10
/* Only modules will cause ITLB Misses as we always
 * pin the first 8MB of kernel memory */
 #ifdef ITLB_MISS_KERNEL
-   mfcrr12
+   mfcrr11
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
-   andis.  r11, r10, 0x8000/* Address >= 0x8000 */
+   cmpicr0, r10, 0 /* Address >= 0x8000 */
 #else
-   rlwinm  r11, r10, 16, 0xfff8
-   cmpli   cr0, r11, PAGE_OFFSET@h
+   rlwinm  r10, r10, 16, 0xfff8
+   cmpli   cr0, r10, PAGE_OFFSET@h
 #ifndef CONFIG_PIN_TLB_TEXT
/* It is assumed that kernel code fits into the first 8M page */
-0: cmpli   cr7, r11, (PAGE_OFFSET + 0x080)@h
+0: cmpli   cr7, r10, (PAGE_OFFSET + 0x080)@h
patch_site  0b, patch__itlbmiss_linmem_top
 #endif
 #endif
 #endif
-   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
+   mfspr   r10, SPRN_M_TWB /* Get level 1 table */
 #ifdef ITLB_MISS_KERNEL
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
-   beq+3f
+   bge+3f
 #else
blt+3f
 #endif
 #ifndef CONFIG_PIN_TLB_TEXT
blt cr7, ITLBMissLinear
 #endif
-   rlwinm  r11, r11, 0, 20, 31
-   orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
+   rlwinm  r10, r10, 0, 20, 31
+   orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
 #endif
-   lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the 
level 1 entry */
+   lwz r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10)/* Get level 1 
entry */
+   mtspr   SPRN_MI_TWC, r10/* Set segment attributes */
 
-   mtspr   SPRN_MD_TWC, r11
+   mtspr   SPRN_MD_TWC, r10
mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
 #ifdef ITLB_MISS_KERNEL
-   mtcrr12
+   mtcrr11
 #endif
-   /* Load the MI_TWC with the attributes for this "segment." */
-   mtspr   SPRN_MI_TWC, r11/* Set segment attributes */
-
 #ifdef CONFIG_SWAP
rlwinm  r11, r10, 32-5, _PAGE_PRESENT
and r11, r11, r10
rlwimi  r10, r11, 0, _PAGE_PRESENT
 #endif
-   li  r11, RPN_PATTERN | 0x200
/* The Linux PTE won't go exactly into the MMU TLB.
 * Software indicator bits 20 and 23 must be clear.
 * Software indicator bits 22, 24, 25, 26, and 27 must be
 * set.  All other Linux PTE bits control the behavior
 * of the MMU.
 */
-   rlwimi  r11, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */
-   rlwimi  r10, r11, 0, 0x0ff0 /* Set 22, 24-27, clear 20,23 */
+   rlwimi  r10, r10, 0, 0x0f00 /* Clear bits 20-23 */
+   rlwimi  r10, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */
+   ori r10, r10, RPN_PATTERN | 0x200 /* Set 22 and 24-27 */
mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
 
/* Restore registers */
 0: mfspr   r10, SPRN_SPRG_SCRATCH0
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
mfspr   r11, SPRN_SPRG_SCRATCH1
-#ifdef ITLB_MISS_KERNEL
-   mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
patch_site  0b, 

[PATCH v7 14/16] powerpc/mm: reintroduce 16K pages with HW assistance on 8xx

2018-11-28 Thread Christophe Leroy
Using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

In order to use hardware assistance with 16K pages, this patch does
the following modifications:
- Make PGD size independent of the main page size
- In 16k pages mode, redefine pte_t as a struct with 4 elements,
and populate those 4 elements in __set_pte_at() and pte_update()
- Adapt the size of the hugepage tables.
- Define a PTE_FRAGMENT_NB so that a 16k page contains 4 page tables.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h |  1 +
 arch/powerpc/include/asm/nohash/32/pgtable.h | 19 ++-
 arch/powerpc/include/asm/nohash/pgtable.h|  4 
 arch/powerpc/include/asm/pgtable-types.h |  4 
 5 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ddfccdf004fe..8be31261aec8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -689,7 +689,7 @@ config PPC_4K_PAGES
 
 config PPC_16K_PAGES
bool "16k page size"
-   depends on 44x
+   depends on 44x || PPC_8xx
 
 config PPC_64K_PAGES
bool "64k page size"
diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index fa05aa566ece..25f05131afd5 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -190,6 +190,7 @@ typedef struct {
struct slice_mask mask_8m;
 # endif
 #endif
+   void *pte_frag;
 } mm_context_t;
 
 #define PHYS_IMMR_BASE (mfspr(SPRN_IMMR) & 0xfff8)
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 31a03e9a42c4..e3e81b078432 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -19,7 +19,14 @@ extern int icache_44x_need_flush;
 
 #endif /* __ASSEMBLY__ */
 
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+#define PTE_INDEX_SIZE  (PTE_SHIFT - 2)
+#define PTE_FRAG_NR4
+#define PTE_FRAG_SIZE_SHIFT12
+#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
+#else
 #define PTE_INDEX_SIZE PTE_SHIFT
+#endif
 
 #define PMD_INDEX_SIZE 0
 #define PUD_INDEX_SIZE 0
@@ -49,7 +56,11 @@ extern int icache_44x_need_flush;
  * -Matt
  */
 /* PGDIR_SHIFT determines what a top-level page table entry can map */
+#ifdef CONFIG_PPC_8xx
+#define PGDIR_SHIFT22
+#else
 #define PGDIR_SHIFT(PAGE_SHIFT + PTE_INDEX_SIZE)
+#endif
 #define PGDIR_SIZE (1UL << PGDIR_SHIFT)
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
@@ -233,7 +244,13 @@ static inline unsigned long pte_update(pte_t *p,
: "cc" );
 #else /* PTE_ATOMIC_UPDATES */
unsigned long old = pte_val(*p);
-   *p = __pte((old & ~clr) | set);
+   unsigned long new = (old & ~clr) | set;
+
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   p->pte = p->pte1 = p->pte2 = p->pte3 = new;
+#else
+   *p = __pte(new);
+#endif
 #endif /* !PTE_ATOMIC_UPDATES */
 
 #ifdef CONFIG_44x
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 70ff23974b59..1ca1c1864b32 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -209,7 +209,11 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
/* Anything else just stores the PTE normally. That covers all 64-bit
 * cases, and 32-bit non-hash with 32-bit PTEs.
 */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
+#else
*ptep = pte;
+#endif
 
/*
 * With hardware tablewalk, a sync is needed to ensure that
diff --git a/arch/powerpc/include/asm/pgtable-types.h 
b/arch/powerpc/include/asm/pgtable-types.h
index eccb30b38b47..3b0edf041b2e 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -3,7 +3,11 @@
 #define _ASM_POWERPC_PGTABLE_TYPES_H
 
 /* PTE level */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
+#else
 typedef struct { pte_basic_t pte; } pte_t;
+#endif
 #define __pte(x)   ((pte_t) { (x) })
 static inline pte_basic_t pte_val(pte_t x)
 {
-- 
2.13.3



[PATCH v7 11/16] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx

2018-11-28 Thread Christophe Leroy
Today, on the 8xx the TLB handlers do SW tablewalk by doing all
the calculation in ASM, in order to match with the Linux page
table structure.

The 8xx offers hardware assistance which allows significant size
reduction of the TLB handlers, hence also reduces the time spent
in the handlers.

However, using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

This patch modifies the TLB handlers to use HW assistance for 4K PAGES.

Before that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 80 ticks
- DTLB miss: 62 ticks
After that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 72 ticks
- DTLB miss: 54 ticks
So the improvement is 10% for ITLB and 13% for DTLB misses

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 58 +-
 arch/powerpc/mm/8xx_mmu.c  |  4 +--
 2 files changed, 26 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 01f58b1d9ae7..85fb4b8bf6c7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -292,7 +292,7 @@ SystemCall:
. = 0x1100
 /*
  * For the MPC8xx, this is a software tablewalk to load the instruction
- * TLB.  The task switch loads the M_TW register with the pointer to the first
+ * TLB.  The task switch loads the M_TWB register with the pointer to the first
  * level table.
  * If we discover there is no second level table (value is zero) or if there
  * is an invalid pte, we load that into the TLB, which causes another fault
@@ -323,6 +323,7 @@ InstructionTLBMiss:
 */
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
+   mtspr   SPRN_MD_EPN, r10
/* Only modules will cause ITLB Misses as we always
 * pin the first 8MB of kernel memory */
 #ifdef ITLB_MISS_KERNEL
@@ -339,7 +340,7 @@ InstructionTLBMiss:
 #endif
 #endif
 #endif
-   mfspr   r11, SPRN_M_TW  /* Get level 1 table */
+   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
 #ifdef ITLB_MISS_KERNEL
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
beq+3f
@@ -349,16 +350,14 @@ InstructionTLBMiss:
 #ifndef CONFIG_PIN_TLB_TEXT
blt cr7, ITLBMissLinear
 #endif
-   lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+   rlwinm  r11, r11, 0, 20, 31
+   orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
 #endif
-   /* Insert level 1 index */
-   rlwimi  r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 
29
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the 
level 1 entry */
 
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-   rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
+   mtspr   SPRN_MD_TWC, r11
+   mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
 #ifdef ITLB_MISS_KERNEL
mtcrr12
@@ -417,7 +416,7 @@ DataStoreTLBMiss:
mfspr   r10, SPRN_MD_EPN
rlwinm  r11, r10, 16, 0xfff8
cmpli   cr0, r11, PAGE_OFFSET@h
-   mfspr   r11, SPRN_M_TW  /* Get level 1 table */
+   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
blt+3f
rlwinm  r11, r10, 16, 0xfff8
 #ifndef CONFIG_PIN_TLB_IMMR
@@ -430,20 +429,16 @@ DataStoreTLBMiss:
patch_site  0b, patch__dtlbmiss_immr_jmp
 #endif
blt cr7, DTLBMissLinear
-   lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
+   rlwinm  r11, r11, 0, 20, 31
+   orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
-
-   /* Insert level 1 index */
-   rlwimi  r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 
29
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the 
level 1 entry */
 
-   /* We have a pte table, so load fetch the pte from the table.
-*/
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-   rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
+   mtspr   SPRN_MD_TWC, r11
+   mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
-4:
+
mtcrr12
 
/* Insert the Guarded flag into the TWC from the Linux PTE.
@@ -668,9 +663,10 @@ FixupDAR:/* Entry point for dcbx workaround. */
mtspr   

[PATCH v7 13/16] powerpc/mm: Enable 512k hugepage support with HW assistance on the 8xx

2018-11-28 Thread Christophe Leroy
For using 512k pages with hardware assistance, the PTEs have to be spread
every 128 bytes in the L2 table.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hugetlb.h |  4 +++-
 arch/powerpc/mm/hugetlbpage.c  | 13 +
 arch/powerpc/mm/tlb_nohash.c   |  3 +++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index dfb8bf236586..62a0ca02ca7d 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -74,7 +74,9 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned 
long addr,
unsigned long idx = 0;
 
pte_t *dir = hugepd_page(hpd);
-#ifndef CONFIG_PPC_FSL_BOOK3E
+#ifdef CONFIG_PPC_8xx
+   idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT;
+#elif !defined(CONFIG_PPC_FSL_BOOK3E)
idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd);
 #endif
 
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index bc97874d7c74..d0b92a0a072d 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -66,7 +66,11 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
cachep = PGT_CACHE(PTE_T_ORDER);
num_hugepd = 1 << (pshift - pdshift);
} else {
+#ifdef CONFIG_PPC_8xx
+   cachep = PGT_CACHE(PTE_SHIFT);
+#else
cachep = PGT_CACHE(pdshift - pshift);
+#endif
num_hugepd = 1;
}
 
@@ -332,8 +336,13 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
hugepd_t *hpdp, int pdshif
if (shift >= pdshift)
hugepd_free(tlb, hugepte);
else
+#ifdef CONFIG_PPC_8xx
+   pgtable_free_tlb(tlb, hugepte,
+get_hugepd_cache_index(PTE_SHIFT));
+#else
pgtable_free_tlb(tlb, hugepte,
 get_hugepd_cache_index(pdshift - shift));
+#endif
 }
 
 static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
@@ -701,7 +710,11 @@ static int __init hugetlbpage_init(void)
 * use pgt cache for hugepd.
 */
if (pdshift > shift)
+#ifdef CONFIG_PPC_8xx
+   pgtable_cache_add(PTE_SHIFT);
+#else
pgtable_cache_add(pdshift - shift);
+#endif
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
else
pgtable_cache_add(PTE_T_ORDER);
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 8ad7aab150b7..ae5d568e267f 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -97,6 +97,9 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.shift  = 14,
},
 #endif
+   [MMU_PAGE_512K] = {
+   .shift  = 19,
+   },
[MMU_PAGE_8M] = {
.shift  = 23,
},
-- 
2.13.3



[PATCH v7 12/16] powerpc/mm: Enable 8M hugepage support with HW assistance on the 8xx

2018-11-28 Thread Christophe Leroy
HW assistance naturally supports 8M huge pages without
further modifications.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/tlb_nohash.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 4f79639e432f..8ad7aab150b7 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -97,6 +97,9 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.shift  = 14,
},
 #endif
+   [MMU_PAGE_8M] = {
+   .shift  = 23,
+   },
 };
 #else
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
-- 
2.13.3



[PATCH v7 10/16] powerpc/8xx: Temporarily disable 16k pages and hugepages

2018-11-28 Thread Christophe Leroy
In preparation of making use of hardware assistance in TLB handlers,
this patch temporarily disables 16K pages and hugepages. The reason
is that when using HW assistance in 4K pages mode, the linux model
fit with the HW model for 4K pages and 8M pages.

However for 16K pages and 512K mode some additional work is needed
to get linux model fit with HW model.
For the 8M pages, they will naturaly come back when we switch to
HW assistance, without any additional handling.
In order to keep the following patch smaller, the removal of the
current special handling for 8M pages gets removed here as well.

Therefore the 4K pages mode will be implemented first and without
support for 512k hugepages. Then the 512k hugepages will be brought
back. And the 16K pages will be implemented in the following step.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/kernel/head_8xx.S | 74 +++---
 arch/powerpc/mm/tlb_nohash.c   |  6 
 3 files changed, 6 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8be31261aec8..ddfccdf004fe 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -689,7 +689,7 @@ config PPC_4K_PAGES
 
 config PPC_16K_PAGES
bool "16k page size"
-   depends on 44x || PPC_8xx
+   depends on 44x
 
 config PPC_64K_PAGES
bool "64k page size"
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c203defe49a4..01f58b1d9ae7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -314,7 +314,7 @@ SystemCall:
 InstructionTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
mtspr   SPRN_SPRG_SCRATCH1, r11
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mtspr   SPRN_SPRG_SCRATCH2, r12
 #endif
 
@@ -325,10 +325,8 @@ InstructionTLBMiss:
INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
/* Only modules will cause ITLB Misses as we always
 * pin the first 8MB of kernel memory */
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
-   mfcrr12
-#endif
 #ifdef ITLB_MISS_KERNEL
+   mfcrr12
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
andis.  r11, r10, 0x8000/* Address >= 0x8000 */
 #else
@@ -360,15 +358,9 @@ InstructionTLBMiss:
 
/* Extract level 2 index */
rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
-   mtcrr11
-   bt- 28, 10f /* bit 28 = Large page (8M) */
-   bt- 29, 20f /* bit 29 = Large page (8M or 512k) */
-#endif
rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
lwz r10, 0(r10) /* Get the pte */
-4:
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mtcrr12
 #endif
/* Load the MI_TWC with the attributes for this "segment." */
@@ -393,7 +385,7 @@ InstructionTLBMiss:
/* Restore registers */
 0: mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
@@ -406,35 +398,12 @@ InstructionTLBMiss:
stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
 #endif
 
-#ifdef CONFIG_HUGETLB_PAGE
-10:/* 8M pages */
-#ifdef CONFIG_PPC_16K_PAGES
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M 
- (PAGE_SHIFT << 1), 29
-   /* Add level 2 base */
-   rlwimi  r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
-#else
-   /* Level 2 base */
-   rlwinm  r10, r11, 0, ~HUGEPD_SHIFT_MASK
-#endif
-   lwz r10, 0(r10) /* Get the pte */
-   b   4b
-
-20:/* 512k pages */
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + 
PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
-   /* Add level 2 base */
-   rlwimi  r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
-   lwz r10, 0(r10) /* Get the pte */
-   b   4b
-#endif
-
. = 0x1200
 DataStoreTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
@@ -472,11 +441,6 @@ DataStoreTLBMiss:
 */
/* Extract level 2 index */
rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
-   mtcrr11
-   bt- 28, 10f /* bit 28 = Large page (8M) */
-   bt- 29, 20f /* bit 29 = Large page (8M or 512k) */
-#endif
rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
   

[PATCH v7 08/16] powerpc/mm: Extend pte_fragment functionality to PPC32

2018-11-28 Thread Christophe Leroy
In order to allow the 8xx to handle pte_fragments, this patch
extends the use of pte_fragments to PPC32 platforms.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  5 -
 arch/powerpc/include/asm/book3s/32/pgalloc.h  | 18 ++
 arch/powerpc/include/asm/book3s/32/pgtable.h  |  5 +++--
 arch/powerpc/include/asm/mmu_context.h|  2 +-
 arch/powerpc/include/asm/nohash/32/mmu.h  |  4 +++-
 arch/powerpc/include/asm/nohash/32/pgalloc.h  | 23 ---
 arch/powerpc/include/asm/nohash/32/pgtable.h  |  8 +---
 arch/powerpc/mm/Makefile  |  1 +
 arch/powerpc/mm/mmu_context.c | 10 ++
 arch/powerpc/mm/mmu_context_nohash.c  |  2 +-
 arch/powerpc/mm/pgtable_32.c  | 25 -
 11 files changed, 54 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index 5bd26c218b94..2bb500d25de6 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
 #define _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
+
 /*
  * 32-bit hash table MMU support
  */
@@ -9,6 +10,8 @@
  * BATs
  */
 
+#include 
+
 /* Block size masks */
 #define BL_128K0x000
 #define BL_256K 0x001
@@ -43,7 +46,7 @@ struct ppc_bat {
u32 batl;
 };
 
-typedef struct page *pgtable_t;
+typedef pte_t *pgtable_t;
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index a70f3cf16dc8..56e805107352 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -27,6 +27,10 @@ extern void __bad_pte(pmd_t *pmd);
 extern struct kmem_cache *pgtable_cache[];
 #define PGT_CACHE(shift) pgtable_cache[shift]
 
+void pte_frag_destroy(void *pte_frag);
+pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int 
kernel);
+void pte_fragment_free(unsigned long *table, int kernel);
+
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
@@ -56,30 +60,28 @@ static inline void pmd_populate_kernel(struct mm_struct 
*mm, pmd_t *pmdp,
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pte_page)
 {
-   *pmdp = __pmd((page_to_pfn(pte_page) << PAGE_SHIFT) | _PMD_PRESENT);
+   *pmdp = __pmd(__pa(pte_page) | _PMD_PRESENT);
 }
 
-#define pmd_pgtable(pmd) pmd_page(pmd)
+#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
 extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
 
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long)pte);
+   pte_fragment_free((unsigned long *)pte, 1);
 }
 
 static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
 {
-   pgtable_page_dtor(ptepage);
-   __free_page(ptepage);
+   pte_fragment_free((unsigned long *)ptepage, 0);
 }
 
 static inline void pgtable_free(void *table, unsigned index_size)
 {
if (!index_size) {
-   pgtable_page_dtor(virt_to_page(table));
-   free_page((unsigned long)table);
+   pte_fragment_free((unsigned long *)table, 0);
} else {
BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
kmem_cache_free(PGT_CACHE(index_size), table);
@@ -117,6 +119,6 @@ static inline void pgtable_free_tlb(struct mmu_gather *tlb,
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
  unsigned long address)
 {
-   pgtable_free_tlb(tlb, page_address(table), 0);
+   pgtable_free_tlb(tlb, table, 0);
 }
 #endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 32c33eccc0e2..47156b93f9af 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -329,7 +329,7 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define pte_same(A,B)  (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
 
 #define pmd_page_vaddr(pmd)\
-   ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
+   ((unsigned long)__va(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
 #define pmd_page(pmd)  \
pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
 
@@ -346,7 +346,8 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define pte_offset_kernel(dir, addr)   \
((pte_t *) pmd_page_vaddr(*(dir)) + pte_index(addr))
 #define pte_offset_map(dir, addr)  \
-   ((pte_t *) 

[PATCH v7 09/16] powerpc/8xx: Move SW perf counters in first 32kb of memory

2018-11-28 Thread Christophe Leroy
In order to simplify time critical exceptions handling 8xx
specific SW perf counters, this patch moves the counters into
the beginning of memory. This is possible because .text is readable
and the counters are never modified outside of the handlers.

By doing this, we avoid having to set a second register with
the upper part of the address of the counters.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 58 --
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 3b67b9533c82..c203defe49a4 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -106,6 +106,23 @@ turn_on_mmu:
mtspr   SPRN_SRR0,r0
rfi /* enables MMU */
 
+
+#ifdef CONFIG_PERF_EVENTS
+   .align  4
+
+   .globl  itlb_miss_counter
+itlb_miss_counter:
+   .space  4
+
+   .globl  dtlb_miss_counter
+dtlb_miss_counter:
+   .space  4
+
+   .globl  instruction_counter
+instruction_counter:
+   .space  4
+#endif
+
 /*
  * Exception entry code.  This code runs with address translation
  * turned off, i.e. using physical addresses.
@@ -384,17 +401,16 @@ InstructionTLBMiss:
 
 #ifdef CONFIG_PERF_EVENTS
patch_site  0f, patch__itlbmiss_perf
-0: lis r10, (itlb_miss_counter - PAGE_OFFSET)@ha
-   lwz r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
-   addir11, r11, 1
-   stw r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
-#endif
+0: lwz r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, 1
+   stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
 #if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
+#endif
 
 #ifdef CONFIG_HUGETLB_PAGE
 10:/* 8M pages */
@@ -509,15 +525,14 @@ DataStoreTLBMiss:
 
 #ifdef CONFIG_PERF_EVENTS
patch_site  0f, patch__dtlbmiss_perf
-0: lis r10, (dtlb_miss_counter - PAGE_OFFSET)@ha
-   lwz r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
-   addir11, r11, 1
-   stw r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
-#endif
+0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, 1
+   stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
mfspr   r12, SPRN_SPRG_SCRATCH2
rfi
+#endif
 
 #ifdef CONFIG_HUGETLB_PAGE
 10:/* 8M pages */
@@ -625,16 +640,13 @@ DataBreakpoint:
. = 0x1d00
 InstructionBreakpoint:
mtspr   SPRN_SPRG_SCRATCH0, r10
-   mtspr   SPRN_SPRG_SCRATCH1, r11
-   lis r10, (instruction_counter - PAGE_OFFSET)@ha
-   lwz r11, (instruction_counter - PAGE_OFFSET)@l(r10)
-   addir11, r11, -1
-   stw r11, (instruction_counter - PAGE_OFFSET)@l(r10)
+   lwz r10, (instruction_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, -1
+   stw r10, (instruction_counter - PAGE_OFFSET)@l(0)
lis r10, 0x
ori r10, r10, 0x01
mtspr   SPRN_COUNTA, r10
mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
rfi
 #else
EXCEPTION(0x1d00, Trap_1d, unknown_exception, EXC_XFER_EE)
@@ -1065,17 +1077,3 @@ swapper_pg_dir:
  */
 abatron_pteptrs:
.space  8
-
-#ifdef CONFIG_PERF_EVENTS
-   .globl  itlb_miss_counter
-itlb_miss_counter:
-   .space  4
-
-   .globl  dtlb_miss_counter
-dtlb_miss_counter:
-   .space  4
-
-   .globl  instruction_counter
-instruction_counter:
-   .space  4
-#endif
-- 
2.13.3



[PATCH v7 07/16] powerpc/mm: add helpers to get/set mm.context->pte_frag

2018-11-28 Thread Christophe Leroy
In order to handle pte_fragment functions with single fragment
without adding pte_frag in all mm_context_t, this patch creates
two helpers which do nothing on platforms using a single fragment.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/pgtable.h | 31 +++
 arch/powerpc/mm/pgtable-frag.c |  8 
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 1c49ca31dcfe..74810bba45d2 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -110,6 +110,37 @@ void mark_initmem_nx(void);
 static inline void mark_initmem_nx(void) { }
 #endif
 
+/*
+ * When used, PTE_FRAG_NR is defined in subarch pgtable.h
+ * so we are sure it is included when arriving here.
+ */
+#ifndef PTE_FRAG_NR
+#define PTE_FRAG_NR1
+#define PTE_FRAG_SIZE_SHIFTPAGE_SHIFT
+#define PTE_FRAG_SIZE  (1UL << PTE_FRAG_SIZE_SHIFT)
+#endif
+
+#if PTE_FRAG_NR != 1
+static inline void *pte_frag_get(mm_context_t *ctx)
+{
+   return ctx->pte_frag;
+}
+
+static inline void pte_frag_set(mm_context_t *ctx, void *p)
+{
+   ctx->pte_frag = p;
+}
+#else
+static inline void *pte_frag_get(mm_context_t *ctx)
+{
+   return NULL;
+}
+
+static inline void pte_frag_set(mm_context_t *ctx, void *p)
+{
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 7544d0d7177d..af23a587f019 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -38,7 +38,7 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
return NULL;
 
spin_lock(>page_table_lock);
-   ret = mm->context.pte_frag;
+   ret = pte_frag_get(>context);
if (ret) {
pte_frag = ret + PTE_FRAG_SIZE;
/*
@@ -46,7 +46,7 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 */
if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
pte_frag = NULL;
-   mm->context.pte_frag = pte_frag;
+   pte_frag_set(>context, pte_frag);
}
spin_unlock(>page_table_lock);
return (pte_t *)ret;
@@ -86,9 +86,9 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int 
kernel)
 * the allocated page with single fragement
 * count.
 */
-   if (likely(!mm->context.pte_frag)) {
+   if (likely(!pte_frag_get(>context))) {
atomic_set(>pt_frag_refcount, PTE_FRAG_NR);
-   mm->context.pte_frag = ret + PTE_FRAG_SIZE;
+   pte_frag_set(>context, ret + PTE_FRAG_SIZE);
}
spin_unlock(>page_table_lock);
 
-- 
2.13.3



[PATCH v7 06/16] powerpc/mm: Move pgtable_t into platform headers

2018-11-28 Thread Christophe Leroy
This patch move pgtable_t into platform headers.

It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64
as nohash/64 doesn't support CONFIG_PPC_64K_PAGES.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  2 ++
 arch/powerpc/include/asm/book3s/64/mmu.h  |  9 +
 arch/powerpc/include/asm/nohash/32/mmu.h  |  4 
 arch/powerpc/include/asm/nohash/64/mmu.h  |  4 
 arch/powerpc/include/asm/page.h   | 14 --
 5 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index e38c91388c40..5bd26c218b94 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -42,6 +42,8 @@ struct ppc_bat {
u32 batu;
u32 batl;
 };
+
+typedef struct page *pgtable_t;
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 6328857f259f..1ceee000c18d 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_MMU_H_
 #define _ASM_POWERPC_BOOK3S_64_MMU_H_
 
+#include 
+
 #ifndef __ASSEMBLY__
 /*
  * Page size definition
@@ -24,6 +26,13 @@ struct mmu_psize_def {
 };
 extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 
+/*
+ * For BOOK3s 64 with 4k and 64K linux page size
+ * we want to use pointers, because the page table
+ * actually store pfn
+ */
+typedef pte_t *pgtable_t;
+
 #endif /* __ASSEMBLY__ */
 
 /* 64-bit classic hash table MMU */
diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h 
b/arch/powerpc/include/asm/nohash/32/mmu.h
index af0e8b54876a..f61f933a4cd8 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu.h
@@ -16,4 +16,8 @@
 #include 
 #endif
 
+#ifndef __ASSEMBLY__
+typedef struct page *pgtable_t;
+#endif
+
 #endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
index 87871d027b75..e6585480dfc4 100644
--- a/arch/powerpc/include/asm/nohash/64/mmu.h
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -5,4 +5,8 @@
 /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
 #include 
 
+#ifndef __ASSEMBLY__
+typedef struct page *pgtable_t;
+#endif
+
 #endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 9ea903221a9f..a7624a3b1435 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -335,20 +335,6 @@ void arch_free_page(struct page *page, int order);
 #endif
 
 struct vm_area_struct;
-#ifdef CONFIG_PPC_BOOK3S_64
-/*
- * For BOOK3s 64 with 4k and 64K linux page size
- * we want to use pointers, because the page table
- * actually store pfn
- */
-typedef pte_t *pgtable_t;
-#else
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC64)
-typedef pte_t *pgtable_t;
-#else
-typedef struct page *pgtable_t;
-#endif
-#endif
 
 #include 
 #endif /* __ASSEMBLY__ */
-- 
2.13.3



[PATCH v7 05/16] powerpc/mm: move platform specific mmu-xxx.h in platform directories

2018-11-28 Thread Christophe Leroy
The purpose of this patch is to move platform specific
mmu-xxx.h files in platform directories like pte-xxx.h files.

In the meantime this patch creates common nohash and
nohash/32 + nohash/64 mmu.h files for future common parts.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/mmu.h | 14 ++
 arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h |  0
 arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h |  0
 arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h |  0
 arch/powerpc/include/asm/nohash/32/mmu.h   | 19 +++
 arch/powerpc/include/asm/nohash/64/mmu.h   |  8 
 arch/powerpc/include/asm/{ => nohash}/mmu-book3e.h |  0
 arch/powerpc/include/asm/nohash/mmu.h  | 11 +++
 arch/powerpc/kernel/cpu_setup_fsl_booke.S  |  2 +-
 arch/powerpc/kvm/e500.h|  2 +-
 10 files changed, 42 insertions(+), 14 deletions(-)
 rename arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h (100%)
 rename arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h (100%)
 rename arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h (100%)
 create mode 100644 arch/powerpc/include/asm/nohash/32/mmu.h
 create mode 100644 arch/powerpc/include/asm/nohash/64/mmu.h
 rename arch/powerpc/include/asm/{ => nohash}/mmu-book3e.h (100%)
 create mode 100644 arch/powerpc/include/asm/nohash/mmu.h

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index eb20eb3b8fb0..2184021b0e1c 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -341,18 +341,8 @@ static inline void mmu_early_init_devtree(void) { }
 #if defined(CONFIG_PPC_STD_MMU_32)
 /* 32-bit classic hash table MMU */
 #include 
-#elif defined(CONFIG_40x)
-/* 40x-style software loaded TLB */
-#  include 
-#elif defined(CONFIG_44x)
-/* 44x-style software loaded TLB */
-#  include 
-#elif defined(CONFIG_PPC_BOOK3E_MMU)
-/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
-#  include 
-#elif defined (CONFIG_PPC_8xx)
-/* Motorola/Freescale 8xx software loaded TLB */
-#  include 
+#elif defined(CONFIG_PPC_MMU_NOHASH)
+#include 
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/mmu-40x.h 
b/arch/powerpc/include/asm/nohash/32/mmu-40x.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-40x.h
rename to arch/powerpc/include/asm/nohash/32/mmu-40x.h
diff --git a/arch/powerpc/include/asm/mmu-44x.h 
b/arch/powerpc/include/asm/nohash/32/mmu-44x.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-44x.h
rename to arch/powerpc/include/asm/nohash/32/mmu-44x.h
diff --git a/arch/powerpc/include/asm/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-8xx.h
rename to arch/powerpc/include/asm/nohash/32/mmu-8xx.h
diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h 
b/arch/powerpc/include/asm/nohash/32/mmu.h
new file mode 100644
index ..af0e8b54876a
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/32/mmu.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_32_MMU_H_
+#define _ASM_POWERPC_NOHASH_32_MMU_H_
+
+#if defined(CONFIG_40x)
+/* 40x-style software loaded TLB */
+#include 
+#elif defined(CONFIG_44x)
+/* 44x-style software loaded TLB */
+#include 
+#elif defined(CONFIG_PPC_BOOK3E_MMU)
+/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
+#include 
+#elif defined (CONFIG_PPC_8xx)
+/* Motorola/Freescale 8xx software loaded TLB */
+#include 
+#endif
+
+#endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
new file mode 100644
index ..87871d027b75
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_64_MMU_H_
+#define _ASM_POWERPC_NOHASH_64_MMU_H_
+
+/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
+#include 
+
+#endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/nohash/mmu-book3e.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-book3e.h
rename to arch/powerpc/include/asm/nohash/mmu-book3e.h
diff --git a/arch/powerpc/include/asm/nohash/mmu.h 
b/arch/powerpc/include/asm/nohash/mmu.h
new file mode 100644
index ..a037cb1efb57
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/mmu.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_MMU_H_
+#define _ASM_POWERPC_NOHASH_MMU_H_
+
+#ifdef CONFIG_PPC64
+#include 
+#else
+#include 
+#endif
+
+#endif /* _ASM_POWERPC_NOHASH_MMU_H_ */
diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S 
b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
index 8d142e5d84cd..5fbc890d1094 100644
--- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
+++ 

[PATCH v7 02/16] powerpc/8xx: Remove PTE_ATOMIC_UPDATES

2018-11-28 Thread Christophe Leroy
commit 1bc54c03117b9 ("powerpc: rework 4xx PTE access and TLB miss")
introduced non atomic PTE updates and started the work of removing
PTE updates in TLB miss handlers, but kept PTE_ATOMIC_UPDATES for the
8xx with the following comment:
/* Until my rework is finished, 8xx still needs atomic PTE updates */

commit fe11dc3f9628e ("powerpc/8xx: Update TLB asm so it behaves as
linux mm expects") removed all PTE updates done in TLB miss handlers

Therefore, atomic PTE updates are not needed anymore for the 8xx

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pte-8xx.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 6bfe041ef59d..c9e4b2d90f65 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -65,9 +65,6 @@
 
 #define _PTE_NONE_MASK 0
 
-/* Until my rework is finished, 8xx still needs atomic PTE updates */
-#define PTE_ATOMIC_UPDATES 1
-
 #ifdef CONFIG_PPC_16K_PAGES
 #define _PAGE_PSIZE_PAGE_SPS
 #else
-- 
2.13.3



[PATCH v7 04/16] powerpc/mm: Avoid useless lock with single page fragments

2018-11-28 Thread Christophe Leroy
There is no point in taking the page table lock as pte_frag or
pmd_frag are always NULL when we have only one fragment.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable-book3s64.c | 3 +++
 arch/powerpc/mm/pgtable-frag.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 0c0fd173208a..f3c31f5e1026 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -244,6 +244,9 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
 {
void *pmd_frag, *ret;
 
+   if (PMD_FRAG_NR == 1)
+   return NULL;
+
spin_lock(>page_table_lock);
ret = mm->context.pmd_frag;
if (ret) {
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index d61e7c2a9a79..7544d0d7177d 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -34,6 +34,9 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 {
void *pte_frag, *ret;
 
+   if (PTE_FRAG_NR == 1)
+   return NULL;
+
spin_lock(>page_table_lock);
ret = mm->context.pte_frag;
if (ret) {
-- 
2.13.3



[PATCH v7 03/16] powerpc/mm: Move pte_fragment_alloc() to a common location

2018-11-28 Thread Christophe Leroy
In preparation of next patch which generalises the use of
pte_fragment_alloc() for all, this patch moves the related functions
in a place that is common to all subarches.

The 8xx will need that for supporting 16k pages, as in that mode
page tables still have a size of 4k.

Since pte_fragment with only once fragment is not different
from what is done in the general case, we can easily migrate all
subarchs to pte fragments.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h |   1 +
 arch/powerpc/mm/Makefile |   4 +-
 arch/powerpc/mm/mmu_context_book3s64.c   |  15 
 arch/powerpc/mm/pgtable-book3s64.c   |  85 
 arch/powerpc/mm/pgtable-frag.c   | 116 +++
 5 files changed, 120 insertions(+), 101 deletions(-)
 create mode 100644 arch/powerpc/mm/pgtable-frag.c

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index bfed4cf3b2f3..6c2808c0f052 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -39,6 +39,7 @@ extern struct vmemmap_backing *vmemmap_list;
 extern struct kmem_cache *pgtable_cache[];
 #define PGT_CACHE(shift) pgtable_cache[shift]
 
+void pte_frag_destroy(void *pte_frag);
 extern pte_t *pte_fragment_alloc(struct mm_struct *, unsigned long, int);
 extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long);
 extern void pte_fragment_free(unsigned long *, int);
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index ca96e7be4d0e..3cbb1acf0745 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -15,7 +15,9 @@ obj-$(CONFIG_PPC_MMU_NOHASH)  += mmu_context_nohash.o 
tlb_nohash.o \
 obj-$(CONFIG_PPC_BOOK3E)   += tlb_low_$(BITS)e.o
 hash64-$(CONFIG_PPC_NATIVE):= hash_native_64.o
 obj-$(CONFIG_PPC_BOOK3E_64)   += pgtable-book3e.o
-obj-$(CONFIG_PPC_BOOK3S_64)+= pgtable-hash64.o hash_utils_64.o slb.o 
$(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o
+obj-$(CONFIG_PPC_BOOK3S_64)+= pgtable-hash64.o hash_utils_64.o slb.o \
+  $(hash64-y) mmu_context_book3s64.o \
+  pgtable-book3s64.o pgtable-frag.o
 obj-$(CONFIG_PPC_RADIX_MMU)+= pgtable-radix.o tlb-radix.o
 obj-$(CONFIG_PPC_STD_MMU_32)   += ppc_mmu_32.o hash_low_32.o 
mmu_context_hash32.o
 obj-$(CONFIG_PPC_STD_MMU)  += tlb_hash$(BITS).o
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index 510f103d7813..f720c5cc0b5e 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -164,21 +164,6 @@ static void destroy_contexts(mm_context_t *ctx)
}
 }
 
-static void pte_frag_destroy(void *pte_frag)
-{
-   int count;
-   struct page *page;
-
-   page = virt_to_page(pte_frag);
-   /* drop all the pending references */
-   count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
-   /* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) {
-   pgtable_page_dtor(page);
-   __free_page(page);
-   }
-}
-
 static void pmd_frag_destroy(void *pmd_frag)
 {
int count;
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 9f93c9f985c5..0c0fd173208a 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -322,91 +322,6 @@ void pmd_fragment_free(unsigned long *pmd)
}
 }
 
-static pte_t *get_pte_from_cache(struct mm_struct *mm)
-{
-   void *pte_frag, *ret;
-
-   spin_lock(>page_table_lock);
-   ret = mm->context.pte_frag;
-   if (ret) {
-   pte_frag = ret + PTE_FRAG_SIZE;
-   /*
-* If we have taken up all the fragments mark PTE page NULL
-*/
-   if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
-   pte_frag = NULL;
-   mm->context.pte_frag = pte_frag;
-   }
-   spin_unlock(>page_table_lock);
-   return (pte_t *)ret;
-}
-
-static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
-{
-   void *ret = NULL;
-   struct page *page;
-
-   if (!kernel) {
-   page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
-   if (!page)
-   return NULL;
-   if (!pgtable_page_ctor(page)) {
-   __free_page(page);
-   return NULL;
-   }
-   } else {
-   page = alloc_page(PGALLOC_GFP);
-   if (!page)
-   return NULL;
-   }
-
-   atomic_set(>pt_frag_refcount, 1);
-
-   ret = page_address(page);
-   /*
-* if we support only one fragment just return the
-* allocated page.

[PATCH v7 01/16] powerpc/book3s32: Remove CONFIG_BOOKE dependent code

2018-11-28 Thread Christophe Leroy
BOOK3S/32 cannot be BOOKE, so remove useless code

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 18 --
 arch/powerpc/include/asm/book3s/32/pgtable.h | 14 --
 2 files changed, 32 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 96138ab3ddd6..a70f3cf16dc8 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -47,8 +47,6 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 #define __pmd_free_tlb(tlb,x,a)do { } while (0)
 /* #define pgd_populate(mm, pmd, pte)  BUG() */
 
-#ifndef CONFIG_BOOKE
-
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
   pte_t *pte)
 {
@@ -62,22 +60,6 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
 }
 
 #define pmd_pgtable(pmd) pmd_page(pmd)
-#else
-
-static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
-  pte_t *pte)
-{
-   *pmdp = __pmd((unsigned long)pte | _PMD_PRESENT);
-}
-
-static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
-   pgtable_t pte_page)
-{
-   *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) | 
_PMD_PRESENT);
-}
-
-#define pmd_pgtable(pmd) pmd_page(pmd)
-#endif
 
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
 extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index c21d33704633..32c33eccc0e2 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -328,24 +328,10 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define __HAVE_ARCH_PTE_SAME
 #define pte_same(A,B)  (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
 
-/*
- * Note that on Book E processors, the pmd contains the kernel virtual
- * (lowmem) address of the pte page.  The physical address is less useful
- * because everything runs with translation enabled (even the TLB miss
- * handler).  On everything else the pmd contains the physical address
- * of the pte page.  -- paulus
- */
-#ifndef CONFIG_BOOKE
 #define pmd_page_vaddr(pmd)\
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
 #define pmd_page(pmd)  \
pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
-#else
-#define pmd_page_vaddr(pmd)\
-   ((unsigned long) (pmd_val(pmd) & PAGE_MASK))
-#define pmd_page(pmd)  \
-   pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
-#endif
 
 /* to find an entry in a kernel page-table-directory */
 #define pgd_offset_k(address) pgd_offset(_mm, address)
-- 
2.13.3



[PATCH v7 00/16] Implement use of HW assistance on TLB table walk on 8xx

2018-11-28 Thread Christophe Leroy
The purpose of this serie is to implement hardware assistance for TLB table walk
on the 8xx.

First part prepares for using HW assistance in TLB routines:
- Reverts a former patch which broke SWAP on the 8xx
- move book3s64 page fragment code in a common part for reusing it by the
8xx as 16k page size mode still uses 4k page tables.
- switches to patch_site instead of patch_instruction, as it makes the code
clearer and avoids pollution with global symbols.
- Optimise access to perf counters (hence reducing number of registers used)

Second part implements HW assistance in TLB routines in the following steps:
- Disable 16k page size mode and 512k hugepages
- Switch 4k to HW assistance
- Bring back 512k hugepages
- Bring back 16k page size mode.

Tested successfully on 8xx and 83xx (book3s/32)

Changes in v7:
 - Reordered to get trivial and already reviewed patches in front.
 - Reordered to regroup all HW assistance related patches together.
 - Rebased on today merge branch (28 Nov)
 - Added a helper for access to mm_context_t.frag
 - Reduced the amount of changes in PPC32 to support pte_fragment
 - Applied pte_fragment to both nohash/32 and book3s/32

Changes in v6:
 - Droped the part related to handling GUARD attribute at PGD/PMD level.
 - Moved the commonalisation of page_fragment in the begining (this part has 
been reviewed by Aneesh)
 - Rebased on today merge branch (19 Oct)

Changes in v5:
 - Also avoid useless lock in get_pmd_from_cache()
 - A new patch to relocate mmu headers in platform specific directories
 - A new patch to distribute pgtable_t typedefs in platform specific
   mmu headers instead of the uggly #ifdef
 - Moved early_pte_alloc_kernel() in platform specific pgalloc
 - Restricted definition of PTE_FRAG_SIZE and PTE_FRAG_NR to platforms
   using the pte fragmentation.
 - arch_exit_mmap() and destroy_pagetable_cache() are now platform specific.

Changes in v4:
 - Reordered the serie to put at the end the modifications which makes
   L1 and L2 entries independant.
 - No modifications to ppc64 ioremap (we still have an opportunity to
   merge them, for a future patch serie)
 - 8xx code modified to use patch_site instead of patch_instruction
   to get a clearer code and avoid object pollution with global symbols
 - Moved perf counters in first 32kb of memory to optimise access
 - Split the big bang to HW assistance in several steps:
   1. Temporarily removes support of 16k pages and 512k hugepages
   2. Change TLB routines to use HW assistance for 4k pages and 8M hugepages
   3. Add back support for 512k hugepages
   4. Add back support for 16k pages (using pte_fragment as page tables are 
still 4k)

Changes in v3:
 - Fixed an issue in the 09/14 when CONFIG_PIN_TLB_TEXT was not enabled
 - Added performance measurement in the 09/14 commit log
 - Rebased on latest 'powerpc/merge' tree, which conflicted with 13/14

Changes in v2:
 - Removed the 3 first patchs which have been applied already
 - Fixed compilation errors reported by Michael
 - Squashed the commonalisation of ioremap functions into a single patch
 - Fixed the use of pte_fragment
 - Added a patch optimising perf counting of TLB misses and instructions


Christophe Leroy (16):
  powerpc/book3s32: Remove CONFIG_BOOKE dependent code
  powerpc/8xx: Remove PTE_ATOMIC_UPDATES
  powerpc/mm: Move pte_fragment_alloc() to a common location
  powerpc/mm: Avoid useless lock with single page fragments
  powerpc/mm: move platform specific mmu-xxx.h in platform directories
  powerpc/mm: Move pgtable_t into platform headers
  powerpc/mm: add helpers to get/set mm.context->pte_frag
  powerpc/mm: Extend pte_fragment functionality to PPC32
  powerpc/8xx: Move SW perf counters in first 32kb of memory
  powerpc/8xx: Temporarily disable 16k pages and hugepages
  powerpc/mm: Use hardware assistance in TLB handlers on the 8xx
  powerpc/mm: Enable 8M hugepage support with HW assistance on the 8xx
  powerpc/mm: Enable 512k hugepage support with HW assistance on the 8xx
  powerpc/mm: reintroduce 16K pages with HW assistance on 8xx
  powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers
  powerpc/8xx: regroup TLB handler routines

 arch/powerpc/include/asm/book3s/32/mmu-hash.h  |   5 +
 arch/powerpc/include/asm/book3s/32/pgalloc.h   |  36 +-
 arch/powerpc/include/asm/book3s/32/pgtable.h   |  19 +-
 arch/powerpc/include/asm/book3s/64/mmu.h   |   9 +
 arch/powerpc/include/asm/book3s/64/pgalloc.h   |   1 +
 arch/powerpc/include/asm/hugetlb.h |   4 +-
 arch/powerpc/include/asm/mmu.h |  14 +-
 arch/powerpc/include/asm/mmu_context.h |   2 +-
 arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h |   0
 arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h |   0
 arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h |   1 +
 arch/powerpc/include/asm/nohash/32/mmu.h   |  25 ++
 arch/powerpc/include/asm/nohash/32/pgalloc.h   |  23 +-
 arch/powerpc/include/asm/nohash/32/pgtable.h   

[PATCH] powerpc/8xx: hide itlbie and dtlbie symbols

2018-11-28 Thread Christophe Leroy
When disassembling InstructionTLBError we get the following messy code:

c000138c:   7d 84 63 78 mr  r4,r12
c0001390:   75 25 58 00 andis.  r5,r9,22528
c0001394:   75 2a 40 00 andis.  r10,r9,16384
c0001398:   41 a2 00 08 beq c00013a0 
c000139c:   7c 00 22 64 tlbie   r4,r0

c00013a0 :
c00013a0:   39 40 04 01 li  r10,1025
c00013a4:   91 4b 00 b0 stw r10,176(r11)
c00013a8:   39 40 10 32 li  r10,4146
c00013ac:   48 00 cc 59 bl  c000e004 

For a cleaner code dump, this patch replaces itlbie and dtlbie
symbols by numeric symbols.

c000138c:   7d 84 63 78 mr  r4,r12
c0001390:   75 25 58 00 andis.  r5,r9,22528
c0001394:   75 2a 40 00 andis.  r10,r9,16384
c0001398:   41 a2 00 08 beq c00013a0 
c000139c:   7c 00 22 64 tlbie   r4,r0
c00013a0:   39 40 04 01 li  r10,1025
c00013a4:   91 4b 00 b0 stw r10,176(r11)
c00013a8:   39 40 10 32 li  r10,4146
c00013ac:   48 00 cc 59 bl  c000e004 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 3b67b9533c82..8c848acfe249 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -552,11 +552,10 @@ InstructionTLBError:
mr  r4,r12
andis.  r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */
andis.  r10,r9,SRR1_ISI_NOPT@h
-   beq+1f
+   beq+1301f
tlbie   r4
-itlbie:
/* 0x400 is InstructionAccess exception, needed by bad_page_fault() */
-1: EXC_XFER_LITE(0x400, handle_page_fault)
+1301:  EXC_XFER_LITE(0x400, handle_page_fault)
 
 /* This is the data TLB error on the MPC8xx.  This could be due to
  * many reasons, including a dirty update to a pte.  We bail out to
@@ -578,10 +577,9 @@ DARFixed:/* Return from dcbx instruction bug workaround */
stw r5,_DSISR(r11)
mfspr   r4,SPRN_DAR
andis.  r10,r5,DSISR_NOHPTE@h
-   beq+1f
+   beq+1401f
tlbie   r4
-dtlbie:
-1: li  r10,RPN_PATTERN
+1401:  li  r10,RPN_PATTERN
mtspr   SPRN_DAR,r10/* Tag DAR, to be used in DTLB Error */
/* 0x300 is DataAccess exception, needed by bad_page_fault() */
EXC_XFER_LITE(0x300, handle_page_fault)
@@ -604,8 +602,8 @@ DataBreakpoint:
mtspr   SPRN_SPRG_SCRATCH1, r11
mfcrr10
mfspr   r11, SPRN_SRR0
-   cmplwi  cr0, r11, (dtlbie - PAGE_OFFSET)@l
-   cmplwi  cr7, r11, (itlbie - PAGE_OFFSET)@l
+   cmplwi  cr0, r11, (1401b - PAGE_OFFSET)@l
+   cmplwi  cr7, r11, (1301b - PAGE_OFFSET)@l
beq-cr0, 11f
beq-cr7, 11f
EXCEPTION_PROLOG_1
-- 
2.13.3



Re: use generic DMA mapping code in powerpc V4

2018-11-28 Thread Michael Ellerman
Christoph Hellwig  writes:

> Any comments?  I'd like to at least get the ball moving on the easy
> bits.

Nothing specific yet.

I'm a bit worried it might break one of the many old obscure platforms
we have that aren't well tested.

There's not much we can do about that, but I'll just try and test it on
everything I can find.

Is the plan that you take these via the dma-mapping tree or that they go
via powerpc?

cheers

> On Wed, Nov 14, 2018 at 09:22:40AM +0100, Christoph Hellwig wrote:
>> Hi all,
>> 
>> this series switches the powerpc port to use the generic swiotlb and
>> noncoherent dma ops, and to use more generic code for the coherent
>> direct mapping, as well as removing a lot of dead code.
>> 
>> As this series is very large and depends on the dma-mapping tree I've
>> also published a git tree:
>> 
>> git://git.infradead.org/users/hch/misc.git powerpc-dma.4
>> 
>> Gitweb:
>> 
>> 
>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.4
>> 
>> Changes since v3:
>>  - rebase on the powerpc fixes tree
>>  - add a new patch to actually make the baseline amigaone config
>>configure without warnings
>>  - only use ZONE_DMA for 64-bit embedded CPUs, on pseries an IOMMU is
>>always present
>>  - fix compile in mem.c for one configuration
>>  - drop the full npu removal for now, will be resent separately
>>  - a few git bisection fixes
>> 
>> The changes since v1 are to big to list and v2 was not posted in public.
>> 
>> ___
>> iommu mailing list
>> io...@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> ---end quoted text---


Re: [PATCH 0/1] Fix NULL pointer access in PowerPC MSI teardown code

2018-11-28 Thread Michael Ellerman
Hi Radu,

Radu Rendec  writes:
> Hi everyone,
>
> It seems there's an unchecked access to a NULL pointer (to a function)
> in the PowerPC MSI teardown code. I found this on kernel 4.9, but the
> code looks identical in the latest 4.20-rc. I don't see any reason why
> this wouldn't happen on recent kernels too.
>
> The PowerPC architecture specific MSI setup and teardown functions are
> in arch/powerpc/kernel/msi.c:
>
>   * arch_setup_msi_irqs() checks pointers for both the setup_msi_irqs
> and teardown_msi_irqs ops and returns -ENOSYS if either one is NULL.
>
>   * arch_teardown_msi_irqs() calls on the teardown_msi_irqs op pointer
> without checking it and assumes the function is never called unless
> arch_setup_msi_irqs() returns successfully.
>
> The assumption in arch_teardown_msi_irqs() is wrong and results in a
> function call on a NULL pointer. An example of how this can happen is
> included in the actual patch header. In my case, it happens when the PCI
> hardware is configured during kernel start-up, because my controller
> doesn't support MSI and the ops are NULL.

What hardware are you on?

> I'm proposing the attached patch to fix the problem. It basically just
> checks the pointer before the function call.

Yeah that patch looks good to me.

I suspect this bug was introduced in:

  6b2fd7efeb88 ("PCI/MSI/PPC: Remove arch_msi_check_device()")

Previously we had that check routine which would run before any of the
MSI setup had been done, and so if there were no MSI ops then we bailed
out early and didn't call teardown.

I guess since then (2014) we haven't tested an MSI capable device on a
system that isn't MSI capable?

cheers


Re: [RFC PATCH v1 5/6] powerpc/mm: Add a framework for Kernel Userspace Access Protection

2018-11-28 Thread Christophe LEROY




Le 22/11/2018 à 02:25, Russell Currey a écrit :

On Wed, 2018-11-21 at 09:32 +0100, Christophe LEROY wrote:


Le 21/11/2018 à 03:26, Russell Currey a écrit :

On Wed, 2018-11-07 at 16:56 +, Christophe Leroy wrote:

This patch implements a framework for Kernel Userspace Access
Protection.

Then subarches will have to possibility to provide their own
implementation
by providing setup_kuap(), and lock/unlock_user_rd/wr_access

We separate read and write accesses because some subarches like
book3s32 might only support write access protection.

Signed-off-by: Christophe Leroy 


Separating read and writes does have a performance impact, I'm
doing
some benchmarking to find out exactly how much - but at least for
radix
it means we have to do a RMW instead of just a write.  It does add
some
amount of security, though.

The other issue I have is that you're just locking everything here
(like I was), and not doing anything different for just reads or
writes.  In theory, wouldn't someone assume that they could (for
example) unlock reads, lock writes, then attempt to read?  At which
point the read would fail, because the lock actually locks both.

I would think we either need to bundle read/write locking/unlocking
together, or only implement this on platforms that can do one at a
time, unless there's a cleaner way to handle this.  Glancing at the
values you use for 8xx, this doesn't seem possible there, and it's
a
definite performance hit for radix.

At the same time, as you say, it would suck for book3s32 that can
only
do writes, but maybe just doing both at the same time and if
implemented for that platform it could just have a warning that it
only
applies to writes on init?


Well, I see your points. My idea was not to separate read and write
on platform that can lock both. I think it is no problem to also
unlocking writes when we are doing a read, so on platforms that can
do
both I think both should do the same..

The idea was to avoid spending time unlocking writes for doing a read
on
platforms on which reads are not locked. And for platforms able to
independently unlock/lock reads and writes, if only unlocking reads
can
improve performance it can be interesting as well.

For book3s/32, locking/unlocking will be done through Kp/Ks bits in
segment registers, the function won't be trivial as it may involve
more
than one segment at a time. So I just wanted to avoid spending time
doing that for reads as reads won't be protected. And may also be
the
case on older book3s/64, may not it ?
On Book3s/32, the page protection bits are as follows:

Key 0   1
PP
00  RW  NA
01  RW  RO
10  RW  RW
11  RO  RO

So the idea is to encode user RW with PP01 (instead of PP10 today)
and
user RO with PP11 (as done today), giving Key0 to user and Key1 to
kernel (today both user and kernel have Key1). Then when kernel needs
to
write, we change Ks to Key0 in segment register for the involved
segments.

I'm not sure there is any risk that someone nests unlocks/locks for
reads and unlocks/locks for writes, because the unlocks/locks are
done
in very limited places.


Yeah I don't think it's a risk since the scope is so limited, it just
needs to be clearly documented that locking/unlocking reads/writes
could have the side effect of covering the other.  My concern is less
about a problem in practice as much as functions that only don't
exactly do what the function name says.

Another option is to again have a single lock/unlock function that
takes a bitmask (so read/write/both), which due to being a singular
function might be a bit more obvious that it could lock/unlock
everything, but at this point I'm just bikeshedding.


In order to support book3s/32, I needed to add arguments to the 
unlock/lock functions, as the address is needed to identify the affected 
segments.


Therefore, I changed it to single functions as you suggested. These 
functions have 'to', 'from' and 'size' arguments. When it is a read, 
'to' is NULL. When it is a write, 'from' is NULL. When it is a copy, 
none is NULL.


See RFC v2 for the details.

Christophe



Doing it this way should be fine, I'm just cautious that some future
developer might be caught off guard.

Planning on sending my series based on top of yours for radix today.

- Russell



Christophe



Curious for people's thoughts on this.

- Russell



Re: [RFC PATCH v2 07/11] powerpc/mm/radix: Use KUEP API for Radix MMU

2018-11-28 Thread Christophe LEROY
Sorry, I forgot to reset the author so the patch appears as coming from 
yourself.


Le 28/11/2018 à 10:27, Russell Currey a écrit :

Execution protection already exists on radix, this just refactors
the radix init to provide the KUEP setup function instead.

Thus, the only functional change is that it can now be disabled.

Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/mm/pgtable-radix.c| 9 ++---
  arch/powerpc/platforms/Kconfig.cputype | 1 +
  2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 931156069a81..45aa9e501e76 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -535,8 +535,13 @@ static void radix_init_amor(void)
mtspr(SPRN_AMOR, (3ul << 62));
  }
  
-static void radix_init_iamr(void)

+void setup_kuep(bool disabled)
  {
+   if (disabled)
+   return;
+
+   pr_info("Activating Kernel Userspace Execution Prevention\n");
+
/*
 * Radix always uses key0 of the IAMR to determine if an access is
 * allowed. We set bit 0 (IBM bit 1) of key0, to prevent instruction
@@ -605,7 +610,6 @@ void __init radix__early_init_mmu(void)
  
  	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
  
-	radix_init_iamr();

radix_init_pgtable();
/* Switch to the guard PID before turning on MMU */
radix__switch_mmu_context(NULL, _mm);
@@ -627,7 +631,6 @@ void radix__early_init_mmu_secondary(void)
  __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
radix_init_amor();
}
-   radix_init_iamr();
  
  	radix__switch_mmu_context(NULL, _mm);

if (cpu_has_feature(CPU_FTR_HVMODE))
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index a20669a9ec13..e6831d0ec159 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -334,6 +334,7 @@ config PPC_RADIX_MMU
bool "Radix MMU Support"
depends on PPC_BOOK3S_64
select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
+   select PPC_HAVE_KUEP
default y
help
  Enable support for the Power ISA 3.0 Radix style MMU. Currently this



Re: [PATCH 4/4] powerpc/64s: Implement KUAP for Radix MMU

2018-11-28 Thread Christophe Leroy




On 11/22/2018 02:04 PM, Russell Currey wrote:

Kernel Userspace Access Prevention utilises a feature of
the Radix MMU which disallows read and write access to userspace
addresses.  By utilising this, the kernel is prevented from accessing
user data from outside of trusted paths that perform proper safety checks,
such as copy_{to/from}_user() and friends.

Userspace access is disabled from early boot and is only enabled when:

 - exiting the kernel and entering userspace
 - performing an operation like copy_{to/from}_user()
 - context switching to a process that has access enabled

and similarly, access is disabled again when exiting userspace and entering
the kernel.

This feature has a slight performance impact which I roughly measured to be
3% slower in the worst case (performing 1GB of 1 byte read()/write()
syscalls), and is gated behind the CONFIG_PPC_KUAP option for
performance-critical builds.

This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and
performing the following:

 echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT

if enabled, this should send SIGSEGV to the thread.

Signed-off-by: Russell Currey 


I squashed the paca thing into this one in RFC v2

Christophe


---
  arch/powerpc/include/asm/book3s/64/radix.h | 43 ++
  arch/powerpc/include/asm/exception-64e.h   |  3 ++
  arch/powerpc/include/asm/exception-64s.h   | 19 +-
  arch/powerpc/include/asm/mmu.h |  9 -
  arch/powerpc/include/asm/reg.h |  1 +
  arch/powerpc/kernel/entry_64.S | 16 +++-
  arch/powerpc/mm/pgtable-radix.c| 12 ++
  arch/powerpc/mm/pkeys.c|  7 +++-
  arch/powerpc/platforms/Kconfig.cputype |  1 +
  9 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 7d1a3d1543fc..9af93d05e6fa 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -284,5 +284,48 @@ static inline unsigned long radix__get_tree_size(void)
  int radix__create_section_mapping(unsigned long start, unsigned long end, int 
nid);
  int radix__remove_section_mapping(unsigned long start, unsigned long end);
  #endif /* CONFIG_MEMORY_HOTPLUG */
+
+#ifdef CONFIG_PPC_KUAP
+#include 
+/*
+ * We do have the ability to individually lock/unlock reads and writes rather
+ * than both at once, however it's a significant performance hit due to needing
+ * to do a read-modify-write, which adds a mfspr, which is slow.  As a result,
+ * locking/unlocking both at once is preferred.
+ */
+static inline void __unlock_user_rd_access(void)
+{
+   if (!mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   return;
+
+   mtspr(SPRN_AMR, 0);
+   isync();
+}
+
+static inline void __lock_user_rd_access(void)
+{
+   if (!mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   return;
+
+   mtspr(SPRN_AMR, AMR_LOCKED);
+}
+
+static inline void __unlock_user_wr_access(void)
+{
+   if (!mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   return;
+
+   mtspr(SPRN_AMR, 0);
+   isync();
+}
+
+static inline void __lock_user_wr_access(void)
+{
+   if (!mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   return;
+
+   mtspr(SPRN_AMR, AMR_LOCKED);
+}
+#endif /* CONFIG_PPC_KUAP */
  #endif /* __ASSEMBLY__ */
  #endif
diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index 555e22d5e07f..bf25015834ee 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -215,5 +215,8 @@ exc_##label##_book3e:
  #define RFI_TO_USER   \
rfi
  
+#define UNLOCK_USER_ACCESS(reg)

+#define LOCK_USER_ACCESS(reg)
+
  #endif /* _ASM_POWERPC_EXCEPTION_64E_H */
  
diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h

index 3b4767ed3ec5..d92614c66d87 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -264,6 +264,19 @@ BEGIN_FTR_SECTION_NESTED(943)  
\
std ra,offset(r13); \
  END_FTR_SECTION_NESTED(ftr,ftr,943)
  
+#define LOCK_USER_ACCESS(reg)			\

+BEGIN_MMU_FTR_SECTION_NESTED(944)  \
+   LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \
+   mtspr   SPRN_AMR,reg;   \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KUAP,MMU_FTR_RADIX_KUAP,944)
+
+#define UNLOCK_USER_ACCESS(reg)
\
+BEGIN_MMU_FTR_SECTION_NESTED(945)  \
+   li  reg,0;  \
+   mtspr   SPRN_AMR,reg;  

Re: [PATCH 2/4] powerpc/64: Setup KUP before feature fixups

2018-11-28 Thread Christophe Leroy




On 11/22/2018 02:04 PM, Russell Currey wrote:

The subsequent implementation of KUAP for radix makes use of a MMU
feature in order to patch out assembly when KUAP is disabled or
unsupported.  This won't work unless there's an entry point for
KUP support before the feature magic happens, so relocate
setup_kup() earlier in setup.

Signed-off-by: Russell Currey 


I squashed it in my RFC v2

Christophe


---
  arch/powerpc/kernel/setup_64.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 0f4e06ab70a5..cc20dc3e7b69 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -331,6 +331,12 @@ void __init early_setup(unsigned long dt_ptr)
 */
configure_exceptions();
  
+	/*

+* Configure Kernel Userspace Protection. This needs to happen before
+* feature fixups for platforms that implement this using features.
+*/
+   setup_kup();
+
/* Apply all the dynamic patching */
apply_feature_fixups();
setup_feature_keys();
@@ -372,7 +378,6 @@ void __init early_setup(unsigned long dt_ptr)
 */
btext_map();
  #endif /* CONFIG_PPC_EARLY_DEBUG_BOOTX */
-   setup_kup();
  }
  
  #ifdef CONFIG_SMP




Re: [PATCH 1/4] powerpc: Track KUAP state in the PACA

2018-11-28 Thread Christophe Leroy

On 11/22/2018 02:04 PM, Russell Currey wrote:

Necessary for subsequent patches that enable KUAP support for radix.
Could plausibly be useful for other platforms too, if similar to the
radix case, reading the register that manages these accesses is
costly.

Has the unfortunate downside of another layer of abstraction for
platforms that implement the locks and unlocks, but this could be
useful in future for other things too, like counters for benchmarking
or smartly handling lots of small accesses at once.

Signed-off-by: Russell Currey 


Build failure.

[root@po14163vm linux-powerpc]# make mpc885_ads_defconfig
#
# configuration written to .config
#
[root@po14163vm linux-powerpc]# make
scripts/kconfig/conf  --syncconfig Kconfig
  UPD include/config/kernel.release
  UPD include/generated/utsrelease.h
  CC  kernel/bounds.s
  CC  arch/powerpc/kernel/asm-offsets.s
In file included from ./include/linux/uaccess.h:14:0,
 from ./include/linux/compat.h:19,
 from arch/powerpc/kernel/asm-offsets.c:16:
./arch/powerpc/include/asm/uaccess.h: In function ‘unlock_user_rd_access’:
./arch/powerpc/include/asm/uaccess.h:70:2: error: implicit declaration 
of function ‘get_paca’ [-Werror=implicit-function-declaration]

  get_paca()->user_access_allowed = 1;
  ^
./arch/powerpc/include/asm/uaccess.h:70:12: error: invalid type argument 
of ‘->’ (have ‘int’)

  get_paca()->user_access_allowed = 1;
^
./arch/powerpc/include/asm/uaccess.h: In function ‘lock_user_rd_access’:
./arch/powerpc/include/asm/uaccess.h:75:12: error: invalid type argument 
of ‘->’ (have ‘int’)

  get_paca()->user_access_allowed = 0;
^
./arch/powerpc/include/asm/uaccess.h: In function ‘unlock_user_wr_access’:
./arch/powerpc/include/asm/uaccess.h:80:12: error: invalid type argument 
of ‘->’ (have ‘int’)

  get_paca()->user_access_allowed = 1;
^
./arch/powerpc/include/asm/uaccess.h: In function ‘lock_user_wr_access’:
./arch/powerpc/include/asm/uaccess.h:85:12: error: invalid type argument 
of ‘->’ (have ‘int’)

  get_paca()->user_access_allowed = 0;
^
cc1: some warnings being treated as errors
make[1]: *** [arch/powerpc/kernel/asm-offsets.s] Error 1
make: *** [prepare0] Error 2

Christophe


---
this is all because I can't do PACA things from radix.h and I spent
an hour figuring this out at midnight
---
  arch/powerpc/include/asm/nohash/32/pte-8xx.h |  8 +++
  arch/powerpc/include/asm/paca.h  |  3 +++
  arch/powerpc/include/asm/uaccess.h   | 23 +++-
  arch/powerpc/kernel/asm-offsets.c|  1 +
  4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index f1ec7cf949d5..7bc0955a56e9 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -137,22 +137,22 @@ static inline pte_t pte_mkhuge(pte_t pte)
  #define pte_mkhuge pte_mkhuge
  
  #ifdef CONFIG_PPC_KUAP

-static inline void lock_user_wr_access(void)
+static inline void __lock_user_wr_access(void)
  {
mtspr(SPRN_MD_AP, MD_APG_KUAP);
  }
  
-static inline void unlock_user_wr_access(void)

+static inline void __unlock_user_wr_access(void)
  {
mtspr(SPRN_MD_AP, MD_APG_INIT);
  }
  
-static inline void lock_user_rd_access(void)

+static inline void __lock_user_rd_access(void)
  {
mtspr(SPRN_MD_AP, MD_APG_KUAP);
  }
  
-static inline void unlock_user_rd_access(void)

+static inline void __unlock_user_rd_access(void)
  {
mtspr(SPRN_MD_AP, MD_APG_INIT);
  }
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index e843bc5d1a0f..56236f6d8c89 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -169,6 +169,9 @@ struct paca_struct {
u64 saved_r1;   /* r1 save for RTAS calls or PM or EE=0 
*/
u64 saved_msr;  /* MSR saved here by enter_rtas */
u16 trap_save;  /* Used when bad stack is encountered */
+#ifdef CONFIG_PPC_KUAP
+   u8 user_access_allowed; /* can the kernel access user memory? */
+#endif
u8 irq_soft_mask;   /* mask for irq soft masking */
u8 irq_happened;/* irq happened while soft-disabled */
u8 io_sync; /* writel() needs spin_unlock sync */
diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 2f3625cbfcee..76dae1095f7e 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -63,7 +63,28 @@ static inline int __access_ok(unsigned long addr, unsigned 
long size,
  
  #endif
  
-#ifndef CONFIG_PPC_KUAP

+#ifdef CONFIG_PPC_KUAP
+static inline void unlock_user_rd_access(void)
+{
+   __unlock_user_rd_access();
+   get_paca()->user_access_allowed = 1;
+}
+static inline void 

[RFC PATCH v2 11/11] powerpc/book3s32: Implement Kernel Userspace Access Protection

2018-11-28 Thread Christophe Leroy
This patch implements Kernel Userspace Access Protection for
book3s/32.

Due to limitations of the processor page protection capabilities,
the protection is only against writing. read protection cannot be
achieved using page protection.

In order to provide the protection, Ku and Ks keys are modified in
Userspace Segment registers, and different PP bits are used to:

PP01 provides RW for Key 0 and RO for Key 1
PP10 provides RW for all
PP11 provides RO for all

Today PP10 is used for RW pages and PP11 for RO pages. This patch
modifies page protection to PP01 for RW pages.

Then segment registers are set to Ku 0 and Ks 1. When kernel needs
to write to RW pages, the associated segment register is changed to
Ks 0 in order to allow write access to the kernel.

In order to avoid having the read all segment registers when
locking/unlocking the access, some data is kept in the thread_struct
and saved on stack on exceptions. The field identifies both the
first unlocked segment and the first segment following the last
unlocked one. When no segment is unlocked, it contains value 0.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/kup.h | 98 
 arch/powerpc/include/asm/kup.h   |  3 +
 arch/powerpc/kernel/head_32.S|  2 +-
 arch/powerpc/mm/ppc_mmu_32.c | 10 
 arch/powerpc/platforms/Kconfig.cputype   |  1 +
 5 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/book3s/32/kup.h

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
new file mode 100644
index ..7455ecaab3f9
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_BOOK3S_32_KUP_H
+#define _ASM_POWERPC_BOOK3S_32_KUP_H
+
+#ifdef CONFIG_PPC_KUAP
+#define LOCK_USER_ACCESS(val, sp, sr, srmax, thread)   \
+   lwz sr, KUAP(thread);   \
+   stw sr, _KUAP(sp);  \
+   cmpli   cr7, sr, 0; \
+   beq+cr7, 102f;  \
+   li  val, 0; \
+   stw val, KUAP(thread);  \
+   rlwinm  srmax, sr, 28, 0xf000;  \
+   mfsrin  val, sr;\
+   orisval, val ,0x4000;   /* Set Ks */\
+101:   \
+   mtsrin  val, sr;\
+   addival, val, 0x111;/* next VSID */ \
+   rlwinm  val, val, 0, 8, 3;  /* clear VSID overflow */   \
+   addis   sr, sr, 0x1000; /* address of next segment */   \
+   cmplcr7, sr, srmax; \
+   blt-cr7, 101b;  \
+102:
+
+#define REST_USER_ACCESS(val, sp, sr, srmax, curr) \
+   lwz sr, _KUAP(sp);  \
+   stw sr, THREAD+KUAP(curr);  \
+   cmpli   cr7, sr, 0; \
+   beq+cr7, 102f;  \
+   rlwinm  srmax, sr, 28, 0xf000;  \
+   mfsrin  val, sr;\
+   rlwinm  val, val ,0, ~0x4000;   /* Clear Ks */  \
+101:   \
+   mtsrin  val, sr;\
+   addival, val, 0x111;/* next VSID */ \
+   rlwinm  val, val, 0, 8, 3;  /* clear VSID overflow */   \
+   addis   sr, sr, 0x1000; /* address of next segment */   \
+   cmplcr7, sr, srmax; \
+   blt-cr7, 101b;  \
+102:
+
+#define KUAP_START 0
+#endif
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_PPC_KUAP
+
+#include 
+
+static inline void lock_user_access(void __user *to, const void __user *from,
+   unsigned long size)
+{
+   unsigned long addr = (unsigned long)to;
+   unsigned long end = addr + size;
+   unsigned long sr;
+
+   if (!to)
+   return;
+
+   current->thread.kuap = 0;
+   sr = mfsrin(addr);
+   sr |= 0x4000;   /* set Ks */
+   mb();   /* make sure all writes are done before SR are updated */
+   while (addr < end) {
+   mtsrin(sr, addr);
+   sr += 0x111;/* next VSID */
+ 

[RFC PATCH v2 10/11] powerpc/book3s32: Prepare Kernel Userspace Access Protection

2018-11-28 Thread Christophe Leroy
This patch prepares Kernel Userspace Access Protection for
book3s/32.

Due to limitations of the processor page protection capabilities,
the protection is only against writing. read protection cannot be
achieved using page protection.

In order to provide the protection, Ku and Ks keys are modified in
Userspace Segment registers, and different PP bits are used to:

PP01 provides RW for Key 0 and RO for Key 1
PP10 provides RW for all
PP11 provides RO for all

Today PP10 is used for RW pages and PP11 for RO pages, SR Ku and Ks
set to 1. This patch modifies page protection to user PP01 for RW pages.

Then segment registers are set to Ku 0 and Ks 0. This will allow
to setup Userspace write access protection by settng Ks to 1 in the
following patch.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S | 20 +++-
 arch/powerpc/mm/hash_low_32.S |  6 +++---
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 61ca27929355..1aca0dba0ec1 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -522,13 +522,13 @@ InstructionTLBMiss:
 */
stw r0,0(r2)/* update PTE (accessed bit) */
/* Convert linux-style PTE to low word of PPC-style PTE */
-   rlwinm  r1,r0,32-10,31,31   /* _PAGE_RW -> PP lsb */
-   rlwinm  r2,r0,32-7,31,31/* _PAGE_DIRTY -> PP lsb */
+   rlwinm  r1,r0,32-9,30,30/* _PAGE_RW -> PP msb */
+   rlwinm  r2,r0,32-6,30,30/* _PAGE_DIRTY -> PP msb */
and r1,r1,r2/* writable if _RW and _DIRTY */
rlwimi  r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */
rlwimi  r0,r0,32-1,31,31/* _PAGE_USER -> PP lsb */
ori r1,r1,0xe04 /* clear out reserved bits */
-   andcr1,r0,r1/* PP = user? (rw? 2: 3): 0 */
+   andcr1,r0,r1/* PP = user? (rw? 1: 3): 0 */
 BEGIN_FTR_SECTION
rlwinm  r1,r1,0,~_PAGE_COHERENT /* clear M (coherence not required) */
 END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
@@ -596,8 +596,8 @@ DataLoadTLBMiss:
 */
stw r0,0(r2)/* update PTE (accessed bit) */
/* Convert linux-style PTE to low word of PPC-style PTE */
-   rlwinm  r1,r0,32-10,31,31   /* _PAGE_RW -> PP lsb */
-   rlwinm  r2,r0,32-7,31,31/* _PAGE_DIRTY -> PP lsb */
+   rlwinm  r1,r0,32-9,30,30/* _PAGE_RW -> PP msb */
+   rlwinm  r2,r0,32-6,30,30/* _PAGE_DIRTY -> PP msb */
and r1,r1,r2/* writable if _RW and _DIRTY */
rlwimi  r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */
rlwimi  r0,r0,32-1,31,31/* _PAGE_USER -> PP lsb */
@@ -680,9 +680,9 @@ DataStoreTLBMiss:
 */
stw r0,0(r2)/* update PTE (accessed/dirty bits) */
/* Convert linux-style PTE to low word of PPC-style PTE */
-   rlwimi  r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */
-   li  r1,0xe05/* clear out reserved bits & PP lsb */
-   andcr1,r0,r1/* PP = user? 2: 0 */
+   rlwimi  r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */
+   li  r1,0xe06/* clear out reserved bits & PP msb */
+   andcr1,r0,r1/* PP = user? 1: 0 */
 BEGIN_FTR_SECTION
rlwinm  r1,r1,0,~_PAGE_COHERENT /* clear M (coherence not required) */
 END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
@@ -1014,7 +1014,9 @@ _ENTRY(switch_mmu_context)
blt-4f
mulli   r3,r3,897   /* multiply context by skew factor */
rlwinm  r3,r3,4,8,27/* VSID = (context & 0xf) << 4 */
-   addis   r3,r3,0x6000/* Set Ks, Ku bits */
+#ifdef CONFIG_PPC_KUAP
+   addis   r3,r3,0x4000/* Set Ks, clear Ku bits */
+#endif
li  r0,NUM_USER_SEGMENTS
mtctr   r0
 
diff --git a/arch/powerpc/mm/hash_low_32.S b/arch/powerpc/mm/hash_low_32.S
index 26acf6c8c20c..0e549eb91823 100644
--- a/arch/powerpc/mm/hash_low_32.S
+++ b/arch/powerpc/mm/hash_low_32.S
@@ -316,13 +316,13 @@ Hash_msk = (((1 << Hash_bits) - 1) * 64)
 
 _GLOBAL(create_hpte)
/* Convert linux-style PTE (r5) to low word of PPC-style PTE (r8) */
-   rlwinm  r8,r5,32-10,31,31   /* _PAGE_RW -> PP lsb */
-   rlwinm  r0,r5,32-7,31,31/* _PAGE_DIRTY -> PP lsb */
+   rlwinm  r8,r5,32-9,30,30/* _PAGE_RW -> PP msb */
+   rlwinm  r0,r5,32-6,30,30/* _PAGE_DIRTY -> PP msb */
and r8,r8,r0/* writable if _RW & _DIRTY */
rlwimi  r5,r5,32-1,30,30/* _PAGE_USER -> PP msb */
rlwimi  r5,r5,32-2,31,31/* _PAGE_USER -> PP lsb */
ori r8,r8,0xe04 /* clear out reserved bits */
-   andcr8,r5,r8/* PP = user? (rw? 2: 3): 0 */
+   andcr8,r5,r8/* PP = 

[RFC PATCH v2 09/11] powerpc/32: add helper to write into segment registers

2018-11-28 Thread Christophe Leroy
This patch add an helper which wraps 'mtsrin' instruction
to write into segment registers.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/reg.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index d9598e6790d8..6b5d2a61af5a 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1424,6 +1424,11 @@ static inline void msr_check_and_clear(unsigned long 
bits)
 #define mfsrin(v)  ({unsigned int rval; \
asm volatile("mfsrin %0,%1" : "=r" (rval) : "r" (v)); \
rval;})
+
+static inline void mtsrin(u32 val, u32 idx)
+{
+   asm volatile("mtsrin %0, %1" : : "r" (val), "r" (idx));
+}
 #endif
 
 #define proc_trap()asm volatile("trap")
-- 
2.13.3



[RFC PATCH v2 08/11] powerpc/64s: Implement KUAP for Radix MMU

2018-11-28 Thread Russell Currey
Kernel Userspace Access Prevention utilises a feature of
the Radix MMU which disallows read and write access to userspace
addresses.  By utilising this, the kernel is prevented from accessing
user data from outside of trusted paths that perform proper safety
checks, such as copy_{to/from}_user() and friends.

Userspace access is disabled from early boot and is only enabled when:

- exiting the kernel and entering userspace
- performing an operation like copy_{to/from}_user()
- context switching to a process that has access enabled

and similarly, access is disabled again when exiting userspace and
entering the kernel.

This feature has a slight performance impact which I roughly measured
to be
3% slower in the worst case (performing 1GB of 1 byte read()/write()
syscalls), and is gated behind the CONFIG_PPC_KUAP option for
performance-critical builds.

This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y)
and performing the following:

echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT

if enabled, this should send SIGSEGV to the thread.

The KUAP state is tracked in the PACA because reading the register
that manages these accesses is costly. This Has the unfortunate
downside of another layer of abstraction for platforms that implement
the locks and unlocks, but this could be useful in future for other
things too, like counters for benchmarking or smartly handling lots
of small accesses at once.

Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/kup-radix.h | 36 ++
 arch/powerpc/include/asm/exception-64s.h   | 14 --
 arch/powerpc/include/asm/kup.h |  3 +++
 arch/powerpc/include/asm/mmu.h |  9 ++-
 arch/powerpc/include/asm/reg.h |  1 +
 arch/powerpc/mm/pgtable-radix.c| 12 +
 arch/powerpc/mm/pkeys.c|  7 +++--
 arch/powerpc/platforms/Kconfig.cputype |  1 +
 8 files changed, 78 insertions(+), 5 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/kup-radix.h

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
new file mode 100644
index ..93273ca99310
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_KUP_RADIX_H
+#define _ASM_POWERPC_KUP_RADIX_H
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_PPC_KUAP
+#include 
+/*
+ * We do have the ability to individually lock/unlock reads and writes rather
+ * than both at once, however it's a significant performance hit due to needing
+ * to do a read-modify-write, which adds a mfspr, which is slow.  As a result,
+ * locking/unlocking both at once is preferred.
+ */
+static inline void unlock_user_access(void __user *to, const void __user *from,
+ unsigned long size)
+{
+   if (!mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   return;
+
+   mtspr(SPRN_AMR, 0);
+   isync();
+   get_paca()->user_access_allowed = 1;
+}
+
+static inline void lock_user_access(void __user *to, const void __user *from,
+   unsigned long size)
+{
+   if (!mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   return;
+
+   mtspr(SPRN_AMR, AMR_LOCKED);
+   get_paca()->user_access_allowed = 0;
+}
+#endif /* CONFIG_PPC_KUAP */
+#endif /* __ASSEMBLY__ */
+#endif
diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 4d971ca1e69b..d92614c66d87 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -264,8 +264,18 @@ BEGIN_FTR_SECTION_NESTED(943)  
\
std ra,offset(r13); \
 END_FTR_SECTION_NESTED(ftr,ftr,943)
 
-#define LOCK_USER_ACCESS(reg)
-#define UNLOCK_USER_ACCESS(reg)
+#define LOCK_USER_ACCESS(reg)  
\
+BEGIN_MMU_FTR_SECTION_NESTED(944)  \
+   LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \
+   mtspr   SPRN_AMR,reg;   \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KUAP,MMU_FTR_RADIX_KUAP,944)
+
+#define UNLOCK_USER_ACCESS(reg)
\
+BEGIN_MMU_FTR_SECTION_NESTED(945)  \
+   li  reg,0;  \
+   mtspr   SPRN_AMR,reg;   \
+   isync;  \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KUAP,MMU_FTR_RADIX_KUAP,945)
 
 #define EXCEPTION_PROLOG_0(area)   \
GET_PACA(r13);   

[RFC PATCH v2 07/11] powerpc/mm/radix: Use KUEP API for Radix MMU

2018-11-28 Thread Russell Currey
Execution protection already exists on radix, this just refactors
the radix init to provide the KUEP setup function instead.

Thus, the only functional change is that it can now be disabled.

Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable-radix.c| 9 ++---
 arch/powerpc/platforms/Kconfig.cputype | 1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 931156069a81..45aa9e501e76 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -535,8 +535,13 @@ static void radix_init_amor(void)
mtspr(SPRN_AMOR, (3ul << 62));
 }
 
-static void radix_init_iamr(void)
+void setup_kuep(bool disabled)
 {
+   if (disabled)
+   return;
+
+   pr_info("Activating Kernel Userspace Execution Prevention\n");
+
/*
 * Radix always uses key0 of the IAMR to determine if an access is
 * allowed. We set bit 0 (IBM bit 1) of key0, to prevent instruction
@@ -605,7 +610,6 @@ void __init radix__early_init_mmu(void)
 
memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
 
-   radix_init_iamr();
radix_init_pgtable();
/* Switch to the guard PID before turning on MMU */
radix__switch_mmu_context(NULL, _mm);
@@ -627,7 +631,6 @@ void radix__early_init_mmu_secondary(void)
  __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
radix_init_amor();
}
-   radix_init_iamr();
 
radix__switch_mmu_context(NULL, _mm);
if (cpu_has_feature(CPU_FTR_HVMODE))
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index a20669a9ec13..e6831d0ec159 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -334,6 +334,7 @@ config PPC_RADIX_MMU
bool "Radix MMU Support"
depends on PPC_BOOK3S_64
select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
+   select PPC_HAVE_KUEP
default y
help
  Enable support for the Power ISA 3.0 Radix style MMU. Currently this
-- 
2.13.3



[RFC PATCH v2 06/11] powerpc/8xx: Add Kernel Userspace Access Protection

2018-11-28 Thread Christophe Leroy
This patch adds Kernel Userspace Access Protection on the 8xx.

When a page is RO or RW, it is set RO or RW for Key 0 and NA
for Key 1.

Up to now, the User group is defined with Key 0 for both User and
Supervisor.

By changing the group to Key 0 for User and Key 1 for Supervisor,
this patch prevents the Kernel from being able to access user data.

At exception entry, the kernel saves SPRN_MD_AP in the regs struct,
and reapply the protection. At exception exit it restore SPRN_MD_AP
with the value it had on exception entry.

For the time being, the unused mq field of pt_regs struct is used for
that.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/kup.h   |  4 
 arch/powerpc/include/asm/mmu-8xx.h   |  6 +
 arch/powerpc/include/asm/nohash/32/kup-8xx.h | 34 
 arch/powerpc/mm/8xx_mmu.c| 12 ++
 arch/powerpc/platforms/Kconfig.cputype   |  1 +
 5 files changed, 57 insertions(+)
 create mode 100644 arch/powerpc/include/asm/nohash/32/kup-8xx.h

diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 2ac540fb488f..f7262f4c427e 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -2,6 +2,10 @@
 #ifndef _ASM_POWERPC_KUP_H_
 #define _ASM_POWERPC_KUP_H_
 
+#ifdef CONFIG_PPC_8xx
+#include 
+#endif
+
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/powerpc/include/asm/mmu-8xx.h 
b/arch/powerpc/include/asm/mmu-8xx.h
index 53dbf0788fce..01a0a1694ebd 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -120,6 +120,12 @@
  */
 #define MD_APG_INIT0x
 
+/*
+ * 0 => No user => 01 (all accesses performed according to page definition)
+ * 1 => User => 10 (all accesses performed according to swaped page definition)
+ */
+#define MD_APG_KUAP0x
+
 /* The effective page number register.  When read, contains the information
  * about the last instruction TLB miss.  When MD_RPN is written, bits in
  * this register are used to create the TLB entry.
diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h 
b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
new file mode 100644
index ..8f4975c0de22
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_KUP_8XX_H_
+#define _ASM_POWERPC_KUP_8XX_H_
+
+#ifdef CONFIG_PPC_KUAP
+#define LOCK_USER_ACCESS(val, sp, sr, srmax, current)  \
+   mfspr   val, SPRN_MD_AP;\
+   stw val, _KUAP(sp); \
+   lis val, MD_APG_KUAP@h; \
+   ori val, val, MD_APG_KUAP@l;\
+   mtspr   SPRN_MD_AP, val
+
+#define REST_USER_ACCESS(val, sp, sr, srmax, current)  \
+   lwz val, _KUAP(sp); \
+   mtspr   SPRN_MD_AP, val
+
+#define KUAP_START MD_APG_KUAP
+
+#ifndef __ASSEMBLY__
+static inline void lock_user_access(void __user *to, const void __user *from,
+   unsigned long size)
+{
+   mtspr(SPRN_MD_AP, MD_APG_KUAP);
+}
+
+static inline void unlock_user_access(void __user *to, const void __user *from,
+ unsigned long size)
+{
+   mtspr(SPRN_MD_AP, MD_APG_INIT);
+}
+#endif /* !__ASSEMBLY__ */
+#endif /* CONFIG_PPC_KUAP */
+
+#endif /* _ASM_POWERPC_KUP_8XX_H_ */
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index f14ceb507d98..2bba4fd2eed7 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -206,3 +206,15 @@ void __init setup_kuep(bool disabled)
mtspr(SPRN_MI_AP, MI_APG_KUEP);
 }
 #endif
+
+#ifdef CONFIG_PPC_KUAP
+void __init setup_kuap(bool disabled)
+{
+   pr_info("Activating Kernel Userspace Access Protection\n");
+
+   if (disabled)
+   pr_warn("KUAP cannot be disabled yet on 8xx when compiled 
in\n");
+
+   mtspr(SPRN_MD_AP, MD_APG_KUAP);
+}
+#endif
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index d1757cedf60b..a20669a9ec13 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -34,6 +34,7 @@ config PPC_8xx
select FSL_SOC
select SYS_SUPPORTS_HUGETLBFS
select PPC_HAVE_KUEP
+   select PPC_HAVE_KUAP
 
 config 40x
bool "AMCC 40x"
-- 
2.13.3



[RFC PATCH v2 05/11] powerpc/8xx: Add Kernel Userspace Execution Prevention

2018-11-28 Thread Christophe Leroy
This patch adds Kernel Userspace Execution Prevention on the 8xx.

When a page is Executable, it is set Executable for Key 0 and NX
for Key 1.

Up to now, the User group is defined with Key 0 for both User and
Supervisor.

By changing the group to Key 0 for User and Key 1 for Supervisor,
this patch prevents the Kernel from being able to execute user code.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/mmu-8xx.h |  6 ++
 arch/powerpc/mm/8xx_mmu.c  | 12 
 arch/powerpc/platforms/Kconfig.cputype |  1 +
 3 files changed, 19 insertions(+)

diff --git a/arch/powerpc/include/asm/mmu-8xx.h 
b/arch/powerpc/include/asm/mmu-8xx.h
index fa05aa566ece..53dbf0788fce 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -41,6 +41,12 @@
  */
 #define MI_APG_INIT0x
 
+/*
+ * 0 => No user => 01 (all accesses performed according to page definition)
+ * 1 => User => 10 (all accesses performed according to swaped page definition)
+ */
+#define MI_APG_KUEP0x
+
 /* The effective page number register.  When read, contains the information
  * about the last instruction TLB miss.  When MI_RPN is written, bits in
  * this register are used to create the TLB entry.
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 01b7f5107c3a..f14ceb507d98 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -194,3 +194,15 @@ void flush_instruction_cache(void)
mtspr(SPRN_IC_CST, IDC_INVALL);
isync();
 }
+
+#ifdef CONFIG_PPC_KUEP
+void __init setup_kuep(bool disabled)
+{
+   if (disabled)
+   return;
+
+   pr_info("Activating Kernel Userspace Execution Prevention\n");
+
+   mtspr(SPRN_MI_AP, MI_APG_KUEP);
+}
+#endif
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 68eaafd54aca..d1757cedf60b 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -33,6 +33,7 @@ config PPC_8xx
bool "Freescale 8xx"
select FSL_SOC
select SYS_SUPPORTS_HUGETLBFS
+   select PPC_HAVE_KUEP
 
 config 40x
bool "AMCC 40x"
-- 
2.13.3



[RFC PATCH v2 04/11] powerpc/mm: Add a framework for Kernel Userspace Access Protection

2018-11-28 Thread Christophe Leroy
This patch implements a framework for Kernel Userspace Access
Protection.

Then subarches will have to possibility to provide their own
implementation by providing setup_kuap() and lock/unlock_user_access()

Some platform will need to know the area accessed and whether it is
accessed from read, write or both. Therefore source, destination and
size and handed over to the two functions.

Signed-off-by: Christophe Leroy 
---
 Documentation/admin-guide/kernel-parameters.txt |  2 +-
 arch/powerpc/include/asm/exception-64e.h|  3 ++
 arch/powerpc/include/asm/exception-64s.h|  9 +-
 arch/powerpc/include/asm/futex.h|  4 +++
 arch/powerpc/include/asm/kup.h  | 21 ++
 arch/powerpc/include/asm/paca.h |  3 ++
 arch/powerpc/include/asm/processor.h|  3 ++
 arch/powerpc/include/asm/ptrace.h   |  3 ++
 arch/powerpc/include/asm/uaccess.h  | 38 +++--
 arch/powerpc/kernel/asm-offsets.c   |  7 +
 arch/powerpc/kernel/entry_32.S  |  8 +-
 arch/powerpc/kernel/entry_64.S  | 16 +--
 arch/powerpc/kernel/process.c   |  3 ++
 arch/powerpc/lib/checksum_wrappers.c|  4 +++
 arch/powerpc/mm/fault.c | 17 ---
 arch/powerpc/mm/init-common.c   | 10 +++
 arch/powerpc/platforms/Kconfig.cputype  | 12 
 17 files changed, 146 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 1103549363bb..0d059b141ff8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2792,7 +2792,7 @@
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings
 
-   nosmap  [X86]
+   nosmap  [X86,PPC]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
 
diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index 555e22d5e07f..bf25015834ee 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -215,5 +215,8 @@ exc_##label##_book3e:
 #define RFI_TO_USER\
rfi
 
+#define UNLOCK_USER_ACCESS(reg)
+#define LOCK_USER_ACCESS(reg)
+
 #endif /* _ASM_POWERPC_EXCEPTION_64E_H */
 
diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..4d971ca1e69b 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -264,6 +264,9 @@ BEGIN_FTR_SECTION_NESTED(943)   
\
std ra,offset(r13); \
 END_FTR_SECTION_NESTED(ftr,ftr,943)
 
+#define LOCK_USER_ACCESS(reg)
+#define UNLOCK_USER_ACCESS(reg)
+
 #define EXCEPTION_PROLOG_0(area)   \
GET_PACA(r13);  \
std r9,area+EX_R9(r13); /* save r9 */   \
@@ -500,7 +503,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
beq 4f; /* if from kernel mode  */ \
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10);  \
SAVE_PPR(area, r9);\
-4: EXCEPTION_PROLOG_COMMON_2(area)\
+4: lbz r9,PACA_USER_ACCESS_ALLOWED(r13);  \
+   cmpwi   cr1,r9,0;  \
+   beq 5f;\
+   LOCK_USER_ACCESS(r9);  \
+5: EXCEPTION_PROLOG_COMMON_2(area) \
EXCEPTION_PROLOG_COMMON_3(n)   \
ACCOUNT_STOLEN_TIME
 
diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h
index 94542776a62d..32230f9a1c32 100644
--- a/arch/powerpc/include/asm/futex.h
+++ b/arch/powerpc/include/asm/futex.h
@@ -35,6 +35,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int 
oparg, int *oval,
 {
int oldval = 0, ret;
 
+   unlock_user_access(uaddr, NULL, sizeof(*uaddr));
pagefault_disable();
 
switch (op) {
@@ -62,6 +63,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int 
oparg, int *oval,
if (!ret)
*oval = oldval;
 
+   lock_user_access(uaddr, NULL, sizeof(*uaddr));
return ret;
 }
 
@@ -75,6 +77,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
if (!access_ok(VERIFY_WRITE, 

[RFC PATCH v2 03/11] powerpc: Add skeleton for Kernel Userspace Execution Prevention

2018-11-28 Thread Christophe Leroy
This patch adds a skeleton for Kernel Userspace Execution Prevention.

Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP
and provide setup_kuep() function.

Signed-off-by: Christophe Leroy 
---
 Documentation/admin-guide/kernel-parameters.txt |  2 +-
 arch/powerpc/include/asm/kup.h  |  6 ++
 arch/powerpc/mm/fault.c |  3 ++-
 arch/powerpc/mm/init-common.c   | 11 +++
 arch/powerpc/platforms/Kconfig.cputype  | 12 
 5 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 81d1d5a74728..1103549363bb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2796,7 +2796,7 @@
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
 
-   nosmep  [X86]
+   nosmep  [X86,PPC]
Disable SMEP (Supervisor Mode Execution Prevention)
even if it is supported by processor.
 
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 7a88b8b9b54d..af4b5f854ca4 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -6,6 +6,12 @@
 
 void setup_kup(void);
 
+#ifdef CONFIG_PPC_KUEP
+void setup_kuep(bool disabled);
+#else
+static inline void setup_kuep(bool disabled) { }
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_KUP_H_ */
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 50e5c790d11e..e57bd46cf25b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -230,8 +230,9 @@ static bool bad_kernel_fault(bool is_exec, unsigned long 
error_code,
if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT |
  DSISR_PROTFAULT))) {
printk_ratelimited(KERN_CRIT "kernel tried to execute"
-  " exec-protected page (%lx) -"
+  " %s page (%lx) -"
   "exploit attempt? (uid: %d)\n",
+  address >= TASK_SIZE ? "exec-protected" : 
"user",
   address, from_kuid(_user_ns,
  current_uid()));
}
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index a72bbfc3add6..37f84a43b822 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -26,8 +26,19 @@
 #include 
 #include 
 
+static bool disable_kuep = !IS_ENABLED(CONFIG_PPC_KUEP);
+
+static int __init parse_nosmep(char *p)
+{
+   disable_kuep = true;
+   pr_warn("Disabling Kernel Userspace Execution Prevention\n");
+   return 0;
+}
+early_param("nosmep", parse_nosmep);
+
 void __init setup_kup(void)
 {
+   setup_kuep(disable_kuep);
 }
 
 static void pgd_ctor(void *addr)
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index f4e2c5729374..70830cb3c18a 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -351,6 +351,18 @@ config PPC_RADIX_MMU_DEFAULT
 
  If you're unsure, say Y.
 
+config PPC_HAVE_KUEP
+   bool
+
+config PPC_KUEP
+   bool "Kernel Userspace Execution Prevention"
+   depends on PPC_HAVE_KUEP
+   default y
+   help
+ Enable support for Kernel Userspace Execution Prevention (KUEP)
+
+ If you're unsure, say Y.
+
 config ARCH_ENABLE_HUGEPAGE_MIGRATION
def_bool y
depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
-- 
2.13.3



[RFC PATCH v2 02/11] powerpc: Add framework for Kernel Userspace Protection

2018-11-28 Thread Christophe Leroy
This patch adds a skeleton for Kernel Userspace Protection
functionnalities like Kernel Userspace Access Protection and
Kernel Userspace Execution Prevention

The subsequent implementation of KUAP for radix makes use of a MMU
feature in order to patch out assembly when KUAP is disabled or
unsupported.  This won't work unless there's an entry point for
KUP support before the feature magic happens, so for PPC64
setup_kup() is called early in setup.

On PPC32, feature_fixup is done too early to allow the same.

Suggested-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/kup.h | 11 +++
 arch/powerpc/kernel/setup_64.c |  7 +++
 arch/powerpc/mm/init-common.c  |  5 +
 arch/powerpc/mm/init_32.c  |  3 +++
 4 files changed, 26 insertions(+)
 create mode 100644 arch/powerpc/include/asm/kup.h

diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
new file mode 100644
index ..7a88b8b9b54d
--- /dev/null
+++ b/arch/powerpc/include/asm/kup.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_KUP_H_
+#define _ASM_POWERPC_KUP_H_
+
+#ifndef __ASSEMBLY__
+
+void setup_kup(void);
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* _ASM_POWERPC_KUP_H_ */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 236c1151a3a7..771f280a6bf6 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -68,6 +68,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "setup.h"
 
@@ -331,6 +332,12 @@ void __init early_setup(unsigned long dt_ptr)
 */
configure_exceptions();
 
+   /*
+* Configure Kernel Userspace Protection. This needs to happen before
+* feature fixups for platforms that implement this using features.
+*/
+   setup_kup();
+
/* Apply all the dynamic patching */
apply_feature_fixups();
setup_feature_keys();
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 2b656e67f2ea..a72bbfc3add6 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -24,6 +24,11 @@
 #include 
 #include 
 #include 
+#include 
+
+void __init setup_kup(void)
+{
+}
 
 static void pgd_ctor(void *addr)
 {
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 3e59e5d64b01..93cfa8cf015d 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mmu_decl.h"
 
@@ -182,6 +183,8 @@ void __init MMU_init(void)
btext_unmap();
 #endif
 
+   setup_kup();
+
/* Shortly after that, the entire linear mapping will be available */
memblock_set_current_limit(lowmem_end_addr);
 }
-- 
2.13.3



[PATCH v2 01/11] powerpc/mm: Fix reporting of kernel execute faults on the 8xx

2018-11-28 Thread Christophe Leroy
On the 8xx, no-execute is set via PPP bits in the PTE. Therefore
a no-exec fault generates DSISR_PROTFAULT error bits,
not DSISR_NOEXEC_OR_G.

This patch adds DSISR_PROTFAULT in the test mask.

Fixes: d3ca587404b3 ("powerpc/mm: Fix reporting of kernel execute faults")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/fault.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 1697e903bbf2..50e5c790d11e 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -226,7 +226,9 @@ static int mm_fault_error(struct pt_regs *regs, unsigned 
long addr,
 static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
 unsigned long address)
 {
-   if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT))) {
+   /* NX faults set DSISR_PROTFAULT on the 8xx, DSISR_NOEXEC_OR_G on 
others */
+   if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT |
+ DSISR_PROTFAULT))) {
printk_ratelimited(KERN_CRIT "kernel tried to execute"
   " exec-protected page (%lx) -"
   "exploit attempt? (uid: %d)\n",
-- 
2.13.3