Re: [PATCH V2 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update
On 11/29/18 3:40 AM, Andrew Morton wrote: On Wed, 28 Nov 2018 20:04:37 +0530 "Aneesh Kumar K.V" wrote: Signed-off-by: Aneesh Kumar K.V Some explanation of the motivation would be useful. I will update the commit message. include/linux/hugetlb.h | 18 ++ mm/hugetlb.c| 8 +--- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 087fd5f48c91..e2a3b0c854eb 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -543,6 +543,24 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr set_huge_pte_at(mm, addr, ptep, pte); } #endif + +#ifndef huge_ptep_modify_prot_start +static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); +} +#endif #define huge_ptep_modify_prot_start huge_ptep_modify_prot_start +#ifndef huge_ptep_modify_prot_commit +static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t old_pte, pte_t pte) +{ + set_huge_pte_at(vma->vm_mm, addr, ptep, pte); +} +#endif #define huge_ptep_modify_prot_commit huge_ptep_modify_prot_commit Will update. -aneesh
[PATCH AUTOSEL 4.14 30/35] ibmvnic: Fix RX queue buffer cleanup
From: Thomas Falcon [ Upstream commit b7cdec3d699db2e5985ad39de0f25d3b6111928e ] The wrong index is used when cleaning up RX buffer objects during release of RX queues. Update to use the correct index counter. Signed-off-by: Thomas Falcon Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/ibm/ibmvnic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 5c7134ccc1fd..14c53ed5cca6 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -457,8 +457,8 @@ static void release_rx_pools(struct ibmvnic_adapter *adapter) for (j = 0; j < rx_pool->size; j++) { if (rx_pool->rx_buff[j].skb) { - dev_kfree_skb_any(rx_pool->rx_buff[i].skb); - rx_pool->rx_buff[i].skb = NULL; + dev_kfree_skb_any(rx_pool->rx_buff[j].skb); + rx_pool->rx_buff[j].skb = NULL; } } -- 2.17.1
[PATCH AUTOSEL 4.19 62/68] ibmvnic: Update driver queues after change in ring size support
From: Thomas Falcon [ Upstream commit 5bf032ef08e6a110edc1e3bfb3c66a208fb55125 ] During device reset, queue memory is not being updated to accommodate changes in ring buffer sizes supported by backing hardware. Track any differences in ring buffer sizes following the reset and update queue memory when possible. Signed-off-by: Thomas Falcon Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/ibm/ibmvnic.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index f1d4d7a1278b..5ab21a1b5444 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1737,6 +1737,7 @@ static int do_reset(struct ibmvnic_adapter *adapter, struct ibmvnic_rwi *rwi, u32 reset_state) { u64 old_num_rx_queues, old_num_tx_queues; + u64 old_num_rx_slots, old_num_tx_slots; struct net_device *netdev = adapter->netdev; int i, rc; @@ -1748,6 +1749,8 @@ static int do_reset(struct ibmvnic_adapter *adapter, old_num_rx_queues = adapter->req_rx_queues; old_num_tx_queues = adapter->req_tx_queues; + old_num_rx_slots = adapter->req_rx_add_entries_per_subcrq; + old_num_tx_slots = adapter->req_tx_entries_per_subcrq; ibmvnic_cleanup(netdev); @@ -1810,7 +1813,11 @@ static int do_reset(struct ibmvnic_adapter *adapter, if (rc) return rc; } else if (adapter->req_rx_queues != old_num_rx_queues || - adapter->req_tx_queues != old_num_tx_queues) { + adapter->req_tx_queues != old_num_tx_queues || + adapter->req_rx_add_entries_per_subcrq != + old_num_rx_slots || + adapter->req_tx_entries_per_subcrq != + old_num_tx_slots) { release_rx_pools(adapter); release_tx_pools(adapter); release_napi(adapter); -- 2.17.1
[PATCH AUTOSEL 4.19 61/68] ibmvnic: Fix RX queue buffer cleanup
From: Thomas Falcon [ Upstream commit b7cdec3d699db2e5985ad39de0f25d3b6111928e ] The wrong index is used when cleaning up RX buffer objects during release of RX queues. Update to use the correct index counter. Signed-off-by: Thomas Falcon Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/ibm/ibmvnic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index a646de07cbdc..f1d4d7a1278b 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -485,8 +485,8 @@ static void release_rx_pools(struct ibmvnic_adapter *adapter) for (j = 0; j < rx_pool->size; j++) { if (rx_pool->rx_buff[j].skb) { - dev_kfree_skb_any(rx_pool->rx_buff[i].skb); - rx_pool->rx_buff[i].skb = NULL; + dev_kfree_skb_any(rx_pool->rx_buff[j].skb); + rx_pool->rx_buff[j].skb = NULL; } } -- 2.17.1
[PATCH AUTOSEL 4.19 49/68] net/ibmnvic: Fix deadlock problem in reset
From: Juliet Kim [ Upstream commit a5681e20b541a507c7d4fd48ae4a4040d32ee1ef ] This patch changes to use rtnl_lock only during a reset to avoid deadlock that could occur when a thread operating close is holding rtnl_lock and waiting for reset_lock acquired by another thread, which is waiting for rtnl_lock in order to set the number of tx/rx queues during a reset. Also, we now setting the number of tx/rx queues during a soft reset for failover or LPM events. Signed-off-by: Juliet Kim Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/ibm/ibmvnic.c | 59 +++--- drivers/net/ethernet/ibm/ibmvnic.h | 2 +- 2 files changed, 22 insertions(+), 39 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 7661064c815b..a646de07cbdc 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1103,20 +1103,15 @@ static int ibmvnic_open(struct net_device *netdev) return 0; } - mutex_lock(>reset_lock); - if (adapter->state != VNIC_CLOSED) { rc = ibmvnic_login(netdev); - if (rc) { - mutex_unlock(>reset_lock); + if (rc) return rc; - } rc = init_resources(adapter); if (rc) { netdev_err(netdev, "failed to initialize resources\n"); release_resources(adapter); - mutex_unlock(>reset_lock); return rc; } } @@ -1124,8 +1119,6 @@ static int ibmvnic_open(struct net_device *netdev) rc = __ibmvnic_open(netdev); netif_carrier_on(netdev); - mutex_unlock(>reset_lock); - return rc; } @@ -1269,10 +1262,8 @@ static int ibmvnic_close(struct net_device *netdev) return 0; } - mutex_lock(>reset_lock); rc = __ibmvnic_close(netdev); ibmvnic_cleanup(netdev); - mutex_unlock(>reset_lock); return rc; } @@ -1820,20 +1811,15 @@ static int do_reset(struct ibmvnic_adapter *adapter, return rc; } else if (adapter->req_rx_queues != old_num_rx_queues || adapter->req_tx_queues != old_num_tx_queues) { - adapter->map_id = 1; release_rx_pools(adapter); release_tx_pools(adapter); - rc = init_rx_pools(netdev); - if (rc) - return rc; - rc = init_tx_pools(netdev); - if (rc) - return rc; - release_napi(adapter); - rc = init_napi(adapter); + release_vpd_data(adapter); + + rc = init_resources(adapter); if (rc) return rc; + } else { rc = reset_tx_pools(adapter); if (rc) @@ -1917,17 +1903,8 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, adapter->state = VNIC_PROBED; return 0; } - /* netif_set_real_num_xx_queues needs to take rtnl lock here -* unless wait_for_reset is set, in which case the rtnl lock -* has already been taken before initializing the reset -*/ - if (!adapter->wait_for_reset) { - rtnl_lock(); - rc = init_resources(adapter); - rtnl_unlock(); - } else { - rc = init_resources(adapter); - } + + rc = init_resources(adapter); if (rc) return rc; @@ -1986,13 +1963,21 @@ static void __ibmvnic_reset(struct work_struct *work) struct ibmvnic_rwi *rwi; struct ibmvnic_adapter *adapter; struct net_device *netdev; + bool we_lock_rtnl = false; u32 reset_state; int rc = 0; adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset); netdev = adapter->netdev; - mutex_lock(>reset_lock); + /* netif_set_real_num_xx_queues needs to take rtnl lock here +* unless wait_for_reset is set, in which case the rtnl lock +* has already been taken before initializing the reset +*/ + if (!adapter->wait_for_reset) { + rtnl_lock(); + we_lock_rtnl = true; + } reset_state = adapter->state; rwi = get_next_rwi(adapter); @@ -2020,12 +2005,11 @@ static void __ibmvnic_reset(struct work_struct *work) if (rc) { netdev_dbg(adapter->netdev, "Reset failed\n"); free_all_rwi(adapter); - mutex_unlock(>reset_lock); - return; } adapter->resetting =
Re: [PATCH] powerpc/boot: Copy serial.c in Makefile
Right, so as both 0-day and snowpatch tell me, this patch is wrong. It turns out that this: > $(obj)/serial.c: $(obj)/autoconf.h > + $(Q)cp $< $@ is identical to: cp arch/powerpc/boot/autoconf.h arch/powerpc/boot/serial.c (Clearly my make mastery is inadequate.) Amusingly this which works for my 64e uImage but obviously not for anything that actually needs code from serial.c. Further analysis suggests that making with -j1 triggers the issue, but everything works with -j2 and above. That would make sense with the timeline of when I discovered the issue because I changed my build script to not build in parallel. Regards, Daniel
[PATCH v3 4/4] powerpc: generate uapi header and system call table files
System call table generation script must be run to gener- ate unistd_32/64.h and syscall_table_32/64/c32/spu.h files. This patch will have changes which will invokes the script. This patch will generate unistd_32/64.h and syscall_table- _32/64/c32/spu.h files by the syscall table generation script invoked by parisc/Makefile and the generated files against the removed files must be identical. The generated uapi header file will be included in uapi/- asm/unistd.h and generated system call table header file will be included by kernel/systbl.S file. Signed-off-by: Firoz Khan --- arch/powerpc/Makefile | 3 + arch/powerpc/include/asm/Kbuild | 4 + arch/powerpc/include/asm/systbl.h | 395 arch/powerpc/include/uapi/asm/Kbuild| 2 + arch/powerpc/include/uapi/asm/unistd.h | 392 +-- arch/powerpc/kernel/Makefile| 10 - arch/powerpc/kernel/systbl.S| 36 +-- arch/powerpc/kernel/systbl_chk.c| 60 - arch/powerpc/platforms/cell/spu_callbacks.c | 17 +- 9 files changed, 27 insertions(+), 892 deletions(-) delete mode 100644 arch/powerpc/include/asm/systbl.h delete mode 100644 arch/powerpc/kernel/systbl_chk.c diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 8a2ce14..34897191 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -402,6 +402,9 @@ archclean: archprepare: checkbin +archheaders: + $(Q)$(MAKE) $(build)=arch/powerpc/kernel/syscalls all + ifdef CONFIG_STACKPROTECTOR prepare: stack_protector_prepare diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index 3196d22..77ff7fb 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -1,3 +1,7 @@ +generated-y += syscall_table_32.h +generated-y += syscall_table_64.h +generated-y += syscall_table_c32.h +generated-y += syscall_table_spu.h generic-y += div64.h generic-y += export.h generic-y += irq_regs.h diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h deleted file mode 100644 index c4321b9..000 --- a/arch/powerpc/include/asm/systbl.h +++ /dev/null @@ -1,395 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * List of powerpc syscalls. For the meaning of the _SPU suffix see - * arch/powerpc/platforms/cell/spu_callbacks.c - */ - -SYSCALL(restart_syscall) -SYSCALL(exit) -PPC_SYS(fork) -SYSCALL_SPU(read) -SYSCALL_SPU(write) -COMPAT_SYS_SPU(open) -SYSCALL_SPU(close) -SYSCALL_SPU(waitpid) -SYSCALL_SPU(creat) -SYSCALL_SPU(link) -SYSCALL_SPU(unlink) -COMPAT_SYS(execve) -SYSCALL_SPU(chdir) -COMPAT_SYS_SPU(time) -SYSCALL_SPU(mknod) -SYSCALL_SPU(chmod) -SYSCALL_SPU(lchown) -SYSCALL(ni_syscall) -OLDSYS(stat) -COMPAT_SYS_SPU(lseek) -SYSCALL_SPU(getpid) -COMPAT_SYS(mount) -SYSX(sys_ni_syscall,sys_oldumount,sys_oldumount) -SYSCALL_SPU(setuid) -SYSCALL_SPU(getuid) -COMPAT_SYS_SPU(stime) -COMPAT_SYS(ptrace) -SYSCALL_SPU(alarm) -OLDSYS(fstat) -SYSCALL(pause) -COMPAT_SYS(utime) -SYSCALL(ni_syscall) -SYSCALL(ni_syscall) -SYSCALL_SPU(access) -SYSCALL_SPU(nice) -SYSCALL(ni_syscall) -SYSCALL_SPU(sync) -SYSCALL_SPU(kill) -SYSCALL_SPU(rename) -SYSCALL_SPU(mkdir) -SYSCALL_SPU(rmdir) -SYSCALL_SPU(dup) -SYSCALL_SPU(pipe) -COMPAT_SYS_SPU(times) -SYSCALL(ni_syscall) -SYSCALL_SPU(brk) -SYSCALL_SPU(setgid) -SYSCALL_SPU(getgid) -SYSCALL(signal) -SYSCALL_SPU(geteuid) -SYSCALL_SPU(getegid) -SYSCALL(acct) -SYSCALL(umount) -SYSCALL(ni_syscall) -COMPAT_SYS_SPU(ioctl) -COMPAT_SYS_SPU(fcntl) -SYSCALL(ni_syscall) -SYSCALL_SPU(setpgid) -SYSCALL(ni_syscall) -SYSX(sys_ni_syscall,sys_olduname,sys_olduname) -SYSCALL_SPU(umask) -SYSCALL_SPU(chroot) -COMPAT_SYS(ustat) -SYSCALL_SPU(dup2) -SYSCALL_SPU(getppid) -SYSCALL_SPU(getpgrp) -SYSCALL_SPU(setsid) -SYS32ONLY(sigaction) -SYSCALL_SPU(sgetmask) -SYSCALL_SPU(ssetmask) -SYSCALL_SPU(setreuid) -SYSCALL_SPU(setregid) -SYS32ONLY(sigsuspend) -SYSX(sys_ni_syscall,compat_sys_sigpending,sys_sigpending) -SYSCALL_SPU(sethostname) -COMPAT_SYS_SPU(setrlimit) -SYSX(sys_ni_syscall,compat_sys_old_getrlimit,sys_old_getrlimit) -COMPAT_SYS_SPU(getrusage) -COMPAT_SYS_SPU(gettimeofday) -COMPAT_SYS_SPU(settimeofday) -SYSCALL_SPU(getgroups) -SYSCALL_SPU(setgroups) -SYSX(sys_ni_syscall,sys_ni_syscall,ppc_select) -SYSCALL_SPU(symlink) -OLDSYS(lstat) -SYSCALL_SPU(readlink) -SYSCALL(uselib) -SYSCALL(swapon) -SYSCALL(reboot) -SYSX(sys_ni_syscall,compat_sys_old_readdir,sys_old_readdir) -SYSCALL_SPU(mmap) -SYSCALL_SPU(munmap) -COMPAT_SYS_SPU(truncate) -COMPAT_SYS_SPU(ftruncate) -SYSCALL_SPU(fchmod) -SYSCALL_SPU(fchown) -SYSCALL_SPU(getpriority) -SYSCALL_SPU(setpriority) -SYSCALL(ni_syscall) -COMPAT_SYS(statfs) -COMPAT_SYS(fstatfs) -SYSCALL(ni_syscall) -COMPAT_SYS_SPU(socketcall) -SYSCALL_SPU(syslog) -COMPAT_SYS_SPU(setitimer) -COMPAT_SYS_SPU(getitimer) -COMPAT_SYS_SPU(newstat) -COMPAT_SYS_SPU(newlstat) -COMPAT_SYS_SPU(newfstat) -SYSX(sys_ni_syscall,sys_uname,sys_uname)
[PATCH v3 3/4] powerpc: add system call table generation support
The system call tables are in different format in all architecture and it will be difficult to manually add or modify the system calls in the respective files. To make it easy by keeping a script and which will generate the uapi header and syscall table file. This change will also help to unify the implementation across all architectures. The system call table generation script is added in syscalls directory which contain the script to generate both uapi header file and system call table files. The syscall.tbl file will be the input for the scripts. syscall.tbl contains the list of available system calls along with system call number and corresponding entry point. Add a new system call in this architecture will be possible by adding new entry in the syscall.tbl file. Adding a new table entry consisting of: - System call number. - ABI. - System call name. - Entry point name. - Compat entry name, if required. syscallhdr.sh and syscalltbl.sh will generate uapi header- unistd_32/64.h and syscall_table_32/64/c32/spu.h files respectively. File syscall_table_32/64/c32/spu.h is incl- uded by syscall.S - the real system call table. Both *.sh files will parse the content syscall.tbl to generate the header and table files. ARM, s390 and x86 architecuture does have similar support. I leverage their implementation to come up with a generic solution. Signed-off-by: Firoz Khan --- arch/powerpc/kernel/syscalls/Makefile | 63 + arch/powerpc/kernel/syscalls/syscall.tbl | 427 + arch/powerpc/kernel/syscalls/syscallhdr.sh | 36 +++ arch/powerpc/kernel/syscalls/syscalltbl.sh | 36 +++ 4 files changed, 562 insertions(+) create mode 100644 arch/powerpc/kernel/syscalls/Makefile create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh diff --git a/arch/powerpc/kernel/syscalls/Makefile b/arch/powerpc/kernel/syscalls/Makefile new file mode 100644 index 000..27b4895 --- /dev/null +++ b/arch/powerpc/kernel/syscalls/Makefile @@ -0,0 +1,63 @@ +# SPDX-License-Identifier: GPL-2.0 +kapi := arch/$(SRCARCH)/include/generated/asm +uapi := arch/$(SRCARCH)/include/generated/uapi/asm + +_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') \ + $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)') + +syscall := $(srctree)/$(src)/syscall.tbl +syshdr := $(srctree)/$(src)/syscallhdr.sh +systbl := $(srctree)/$(src)/syscalltbl.sh + +quiet_cmd_syshdr = SYSHDR $@ + cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@' \ + '$(syshdr_abis_$(basetarget))' \ + '$(syshdr_pfx_$(basetarget))'\ + '$(syshdr_offset_$(basetarget))' + +quiet_cmd_systbl = SYSTBL $@ + cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@' \ + '$(systbl_abis_$(basetarget))' \ + '$(systbl_abi_$(basetarget))'\ + '$(systbl_offset_$(basetarget))' + +syshdr_abis_unistd_32 := common,nospu,32 +$(uapi)/unistd_32.h: $(syscall) $(syshdr) + $(call if_changed,syshdr) + +syshdr_abis_unistd_64 := common,nospu,64 +$(uapi)/unistd_64.h: $(syscall) $(syshdr) + $(call if_changed,syshdr) + +systbl_abis_syscall_table_32 := common,nospu,32 +systbl_abi_syscall_table_32 := 32 +$(kapi)/syscall_table_32.h: $(syscall) $(systbl) + $(call if_changed,systbl) + +systbl_abis_syscall_table_64 := common,nospu,64 +systbl_abi_syscall_table_64 := 64 +$(kapi)/syscall_table_64.h: $(syscall) $(systbl) + $(call if_changed,systbl) + +systbl_abis_syscall_table_c32 := common,nospu,32 +systbl_abi_syscall_table_c32 := c32 +$(kapi)/syscall_table_c32.h: $(syscall) $(systbl) + $(call if_changed,systbl) + +systbl_abis_syscall_table_spu := common,spu +systbl_abi_syscall_table_spu := spu +$(kapi)/syscall_table_spu.h: $(syscall) $(systbl) + $(call if_changed,systbl) + +uapisyshdr-y += unistd_32.h unistd_64.h +kapisyshdr-y += syscall_table_32.h \ + syscall_table_64.h \ + syscall_table_c32.h \ + syscall_table_spu.h + +targets+= $(uapisyshdr-y) $(kapisyshdr-y) + +PHONY += all +all: $(addprefix $(uapi)/,$(uapisyshdr-y)) +all: $(addprefix $(kapi)/,$(kapisyshdr-y)) + @: diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl new file mode 100644 index 000..db3bbb8 --- /dev/null +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -0,0 +1,427 @@ +# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +# +# system call numbers and entry vectors for powerpc +# +# The format is: +# +# +# The can be common, spu, nospu, 64, or 32 for this file. +# +0 nospu restart_syscall sys_restart_syscall
[PATCH v3 2/4] powerpc: move macro definition from asm/systbl.h
Move the macro definition for compat_sys_sigsuspend from asm/systbl.h to the file which it is getting included. One of the patch in this patch series is generating uapi header and syscall table files. In order to come up with a common implimentation across all architecture, we need to do this change. This change will simplify the implementation of system call table generation script and help to come up a common implementation across all architecture. Signed-off-by: Firoz Khan --- arch/powerpc/include/asm/systbl.h | 1 - arch/powerpc/kernel/systbl.S | 1 + 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 01b5171..c4321b9 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -76,7 +76,6 @@ SYSCALL_SPU(ssetmask) SYSCALL_SPU(setreuid) SYSCALL_SPU(setregid) -#define compat_sys_sigsuspend sys_sigsuspend SYS32ONLY(sigsuspend) SYSX(sys_ni_syscall,compat_sys_sigpending,sys_sigpending) SYSCALL_SPU(sethostname) diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S index 919a327..9ff1913 100644 --- a/arch/powerpc/kernel/systbl.S +++ b/arch/powerpc/kernel/systbl.S @@ -47,4 +47,5 @@ .globl sys_call_table sys_call_table: +#define compat_sys_sigsuspend sys_sigsuspend #include -- 1.9.1
[PATCH v3 1/4] powerpc: add __NR_syscalls along with NR_syscalls
NR_syscalls macro holds the number of system call exist in powerpc architecture. We have to change the value of NR_syscalls, if we add or delete a system call. One of the patch in this patch series has a script which will generate a uapi header based on syscall.tbl file. The syscall.tbl file contains the number of system call information. So we have two option to update NR_syscalls value. 1. Update NR_syscalls in asm/unistd.h manually by count- ing the no.of system calls. No need to update NR_sys- calls until we either add a new system call or delete existing system call. 2. We can keep this feature in above mentioned script, that will count the number of syscalls and keep it in a generated file. In this case we don't need to expli- citly update NR_syscalls in asm/unistd.h file. The 2nd option will be the recommended one. For that, I added the __NR_syscalls macro in uapi/asm/unistd.h along with NR_syscalls asm/unistd.h. The macro __NR_syscalls also added for making the name convention same across all architecture. While __NR_syscalls isn't strictly part of the uapi, having it as part of the generated header to simplifies the implementation. We also need to enclose this macro with #ifdef __KERNEL__ to avoid side effects. Signed-off-by: Firoz Khan --- arch/powerpc/include/asm/unistd.h | 3 +-- arch/powerpc/include/uapi/asm/unistd.h | 5 - 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index b0de85b..a3c35e6 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -11,8 +11,7 @@ #include - -#define NR_syscalls389 +#define NR_syscalls__NR_syscalls #define __NR__exit __NR_exit diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index 985534d..7195868 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -10,7 +10,6 @@ #ifndef _UAPI_ASM_POWERPC_UNISTD_H_ #define _UAPI_ASM_POWERPC_UNISTD_H_ - #define __NR_restart_syscall 0 #define __NR_exit1 #define __NR_fork2 @@ -401,4 +400,8 @@ #define __NR_rseq 387 #define __NR_io_pgetevents 388 +#ifdef __KERNEL__ +#define __NR_syscalls 389 +#endif + #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 1.9.1
[PATCH v3 0/4] powerpc: system call table generation support
The purpose of this patch series is, we can easily add/modify/delete system call table support by cha- nging entry in syscall.tbl file instead of manually changing many files. The other goal is to unify the system call table generation support implementation across all the architectures. The system call tables are in different format in all architecture. It will be difficult to manually add, modify or delete the system calls in the resp- ective files manually. To make it easy by keeping a script and which'll generate uapi header file and syscall table file. syscall.tbl contains the list of available system calls along with system call number and correspond- ing entry point. Add a new system call in this arch- itecture will be possible by adding new entry in the syscall.tbl file. Adding a new table entry consisting of: - System call number. - ABI. - System call name. - Entry point name. - Compat entry name, if required. - spu entry name, if required. ARM, s390 and x86 architecuture does exist the sim- ilar support. I leverage their implementation to come up with a generic solution. I have done the same support for work for alpha, ia64, m68k, microblaze, mips, parisc, sh, sparc, and xtensa. Below mentioned git repository contains more details about the workflow. https://github.com/frzkhn/system_call_table_generator/ Finally, this is the ground work to solve the Y2038 issue. We need to add two dozen of system calls to solve Y2038 issue. So this patch series will help to add new system calls easily by adding new entry in the syscall.tbl. changes since v2: - modified/optimized the syscall.tbl to avoid duplicate for the spu entries. - updated the syscalltbl.sh to meet the above point. changes since v1: - optimized/updated the syscall table generation scripts. - fixed all mixed indentation issues in syscall.tbl. - added "comments" in syscall_*.tbl. - changed from generic-y to generated-y in Kbuild. Firoz Khan (4): powerpc: add __NR_syscalls along with NR_syscalls powerpc: move macro definition from asm/systbl.h powerpc: add system call table generation support powerpc: generate uapi header and system call table files arch/powerpc/Makefile | 3 + arch/powerpc/include/asm/Kbuild | 4 + arch/powerpc/include/asm/systbl.h | 396 -- arch/powerpc/include/asm/unistd.h | 3 +- arch/powerpc/include/uapi/asm/Kbuild| 2 + arch/powerpc/include/uapi/asm/unistd.h | 389 + arch/powerpc/kernel/Makefile| 10 - arch/powerpc/kernel/syscalls/Makefile | 63 arch/powerpc/kernel/syscalls/syscall.tbl| 427 arch/powerpc/kernel/syscalls/syscallhdr.sh | 36 +++ arch/powerpc/kernel/syscalls/syscalltbl.sh | 36 +++ arch/powerpc/kernel/systbl.S| 37 +-- arch/powerpc/kernel/systbl_chk.c| 60 arch/powerpc/platforms/cell/spu_callbacks.c | 17 +- 14 files changed, 591 insertions(+), 892 deletions(-) delete mode 100644 arch/powerpc/include/asm/systbl.h create mode 100644 arch/powerpc/kernel/syscalls/Makefile create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh delete mode 100644 arch/powerpc/kernel/systbl_chk.c -- 1.9.1
[PATCH 5/6] powerpc/eeh: Improve recovery of passed-through devices
Currently, the EEH recovery process considers passed-through devices as if they were not EEH-aware, which can cause them to be removed as part of recovery. Because device removal requires cooperation from the guest, this may lead to the process stalling or deadlocking. Also, if devices are removed on the host side, they will be removed from their IOMMU group, making recovery in the guest impossible. Therefore, alter the recovery process so that passed-through devices are not removed but are instead left frozen (and marked isolated) until the guest performs it's own recovery. If firmware thaws a passed-through PE because it's parent PE has been thawed (because it was not passed through), re-freeze it. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 2 +- arch/powerpc/include/asm/ppc-pci.h | 2 +- arch/powerpc/kernel/eeh.c | 47 +++--- arch/powerpc/kernel/eeh_driver.c | 32 +--- drivers/vfio/vfio_spapr_eeh.c | 6 ++-- 5 files changed, 55 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index 2ff123f745cc..0b655810f32d 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -300,7 +300,7 @@ void eeh_dev_release(struct pci_dev *pdev); struct eeh_pe *eeh_iommu_group_to_pe(struct iommu_group *group); int eeh_pe_set_option(struct eeh_pe *pe, int option); int eeh_pe_get_state(struct eeh_pe *pe); -int eeh_pe_reset(struct eeh_pe *pe, int option); +int eeh_pe_reset(struct eeh_pe *pe, int option, bool include_passed); int eeh_pe_configure(struct eeh_pe *pe); int eeh_pe_inject_err(struct eeh_pe *pe, int type, int func, unsigned long addr, unsigned long mask); diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h index 08e094eaeccf..f191ef0d2a0a 100644 --- a/arch/powerpc/include/asm/ppc-pci.h +++ b/arch/powerpc/include/asm/ppc-pci.h @@ -53,7 +53,7 @@ void eeh_addr_cache_rmv_dev(struct pci_dev *dev); struct eeh_dev *eeh_addr_cache_get_dev(unsigned long addr); void eeh_slot_error_detail(struct eeh_pe *pe, int severity); int eeh_pci_enable(struct eeh_pe *pe, int function); -int eeh_pe_reset_full(struct eeh_pe *pe); +int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed); void eeh_save_bars(struct eeh_dev *edev); int rtas_write_config(struct pci_dn *, int where, int size, u32 val); int rtas_read_config(struct pci_dn *, int where, int size, u32 *val); diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 052512e58b05..df02f55fdfa1 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -877,6 +877,24 @@ static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag) return NULL; } +static void eeh_pe_refreeze_passed(struct eeh_pe *root) +{ + struct eeh_pe *pe; + int state; + + eeh_for_each_pe(root, pe) { + if (eeh_pe_passed(pe)) { + state = eeh_ops->get_state(pe, NULL); + if (state & + (EEH_STATE_MMIO_ACTIVE | EEH_STATE_MMIO_ENABLED)) { + pr_info("EEH: Passed-through PE PHB#%x-PE#%x was thawed by reset, re-freezing for safety.\n", + pe->phb->global_number, pe->addr); + eeh_pe_set_option(pe, EEH_OPT_FREEZE_PE); + } + } + } +} + /** * eeh_pe_reset_full - Complete a full reset process on the indicated PE * @pe: EEH PE @@ -889,7 +907,7 @@ static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag) * * This function will attempt to reset a PE three times before failing. */ -int eeh_pe_reset_full(struct eeh_pe *pe) +int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed) { int reset_state = (EEH_PE_RESET | EEH_PE_CFG_BLOCKED); int type = EEH_RESET_HOT; @@ -911,11 +929,11 @@ int eeh_pe_reset_full(struct eeh_pe *pe) /* Make three attempts at resetting the bus */ for (i = 0; i < 3; i++) { - ret = eeh_pe_reset(pe, type); + ret = eeh_pe_reset(pe, type, include_passed); if (ret) break; - ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE); + ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE, include_passed); if (ret) break; @@ -936,6 +954,12 @@ int eeh_pe_reset_full(struct eeh_pe *pe) __func__, state, pe->phb->global_number, pe->addr, (i + 1)); } + /* Resetting the PE may have unfrozen child PEs. If those PEs have been +* (potentially) passed through to a guest, re-freeze them: +*/ + if (!include_passed) + eeh_pe_refreeze_passed(pe); + eeh_pe_state_clear(pe, reset_state, true); return ret; } @@ -1611,13 +1635,12 @@ int
[PATCH 2/6] powerpc/eeh: remove sw_state from eeh_unfreeze_pe()
eeh_unfreeze_pe() performs two operations: unfreezing a PE (which may cause firmware to unfreeze child PEs as well) and de-isolating the PE and it's children. To simplify this and support future work, separate out the de-isolation and perform it at the call sites (when necessary). There should be no change in behaviour. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 2 +- arch/powerpc/kernel/eeh.c| 18 ++ arch/powerpc/kernel/eeh_driver.c | 2 +- arch/powerpc/kernel/eeh_sysfs.c | 3 ++- 4 files changed, 14 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index 8b596d096ebe..2ff123f745cc 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -293,7 +293,7 @@ void eeh_add_device_late(struct pci_dev *); void eeh_add_device_tree_late(struct pci_bus *); void eeh_add_sysfs_files(struct pci_bus *); void eeh_remove_device(struct pci_dev *); -int eeh_unfreeze_pe(struct eeh_pe *pe, bool sw_state); +int eeh_unfreeze_pe(struct eeh_pe *pe); int eeh_pe_reset_and_recover(struct eeh_pe *pe); int eeh_dev_open(struct pci_dev *pdev); void eeh_dev_release(struct pci_dev *pdev); diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 6cae6b56ffd6..ac8e69ee93a7 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -823,7 +823,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat switch (state) { case pcie_deassert_reset: eeh_ops->reset(pe, EEH_RESET_DEACTIVATE); - eeh_unfreeze_pe(pe, false); + eeh_unfreeze_pe(pe); if (!(pe->type & EEH_PE_VF)) eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED); eeh_pe_dev_traverse(pe, eeh_restore_dev_state, dev); @@ -1309,7 +1309,7 @@ void eeh_remove_device(struct pci_dev *dev) edev->mode &= ~EEH_DEV_SYSFS; } -int eeh_unfreeze_pe(struct eeh_pe *pe, bool sw_state) +int eeh_unfreeze_pe(struct eeh_pe *pe) { int ret; @@ -1327,10 +1327,6 @@ int eeh_unfreeze_pe(struct eeh_pe *pe, bool sw_state) return ret; } - /* Clear software isolated state */ - if (sw_state && (pe->state & EEH_PE_ISOLATED)) - eeh_pe_state_clear(pe, EEH_PE_ISOLATED); - return ret; } @@ -1382,7 +1378,10 @@ static int eeh_pe_change_owner(struct eeh_pe *pe) } } - return eeh_unfreeze_pe(pe, true); + ret = eeh_unfreeze_pe(pe); + if (!ret) + eeh_pe_state_clear(pe, EEH_PE_ISOLATED); + return ret; } /** @@ -1639,7 +1638,10 @@ static int eeh_pe_reenable_devices(struct eeh_pe *pe) } /* The PE is still in frozen state */ - return eeh_unfreeze_pe(pe, true); + ret = eeh_unfreeze_pe(pe); + if (!ret) + eeh_pe_state_clear(pe, EEH_PE_ISOLATED); + return ret; } diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index aa86a42d98f2..0109d5d7fe63 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -598,7 +598,7 @@ static int eeh_clear_pe_frozen_state(struct eeh_pe *root) eeh_for_each_pe(root, pe) { for (i = 0; i < 3; i++) - if (!eeh_unfreeze_pe(pe, false)) + if (!eeh_unfreeze_pe(pe)) break; if (i >= 3) return -EIO; diff --git a/arch/powerpc/kernel/eeh_sysfs.c b/arch/powerpc/kernel/eeh_sysfs.c index deed906dd8f1..0731d2f01dd9 100644 --- a/arch/powerpc/kernel/eeh_sysfs.c +++ b/arch/powerpc/kernel/eeh_sysfs.c @@ -82,8 +82,9 @@ static ssize_t eeh_pe_state_store(struct device *dev, if (!(edev->pe->state & EEH_PE_ISOLATED)) return count; - if (eeh_unfreeze_pe(edev->pe, true)) + if (eeh_unfreeze_pe(edev->pe)) return -EIO; + eeh_pe_state_clear(edev->pe, EEH_PE_ISOLATED); return count; } -- 2.19.0.2.gcad72f5712
[PATCH 3/6] powerpc/eeh: Add include_passed to eeh_pe_state_clear()
Add a parameter to eeh_pe_state_clear() that allows passed-through PEs to be excluded. Update callers to always pass true so that there is no change in behaviour. Also refactor to use direct traversal, to allow the removal of some boilerplate. This is to prepare for follow-up work for passed-through devices. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/ppc-pci.h | 2 +- arch/powerpc/kernel/eeh.c | 18 arch/powerpc/kernel/eeh_driver.c | 20 - arch/powerpc/kernel/eeh_pe.c | 68 +- arch/powerpc/kernel/eeh_sysfs.c| 2 +- 5 files changed, 50 insertions(+), 60 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h index f67da277d652..08e094eaeccf 100644 --- a/arch/powerpc/include/asm/ppc-pci.h +++ b/arch/powerpc/include/asm/ppc-pci.h @@ -59,7 +59,7 @@ int rtas_write_config(struct pci_dn *, int where, int size, u32 val); int rtas_read_config(struct pci_dn *, int where, int size, u32 *val); void eeh_pe_state_mark(struct eeh_pe *pe, int state); void eeh_pe_mark_isolated(struct eeh_pe *pe); -void eeh_pe_state_clear(struct eeh_pe *pe, int state); +void eeh_pe_state_clear(struct eeh_pe *pe, int state, bool include_passed); void eeh_pe_state_mark_with_cfg(struct eeh_pe *pe, int state); void eeh_pe_dev_mode_mark(struct eeh_pe *pe, int mode); diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index ac8e69ee93a7..052512e58b05 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -825,13 +825,13 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat eeh_ops->reset(pe, EEH_RESET_DEACTIVATE); eeh_unfreeze_pe(pe); if (!(pe->type & EEH_PE_VF)) - eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED); + eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED, true); eeh_pe_dev_traverse(pe, eeh_restore_dev_state, dev); - eeh_pe_state_clear(pe, EEH_PE_ISOLATED); + eeh_pe_state_clear(pe, EEH_PE_ISOLATED, true); break; case pcie_hot_reset: eeh_pe_mark_isolated(pe); - eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED); + eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED, true); eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE); eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev); if (!(pe->type & EEH_PE_VF)) @@ -840,7 +840,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat break; case pcie_warm_reset: eeh_pe_mark_isolated(pe); - eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED); + eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED, true); eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE); eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev); if (!(pe->type & EEH_PE_VF)) @@ -848,7 +848,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL); break; default: - eeh_pe_state_clear(pe, EEH_PE_ISOLATED | EEH_PE_CFG_BLOCKED); + eeh_pe_state_clear(pe, EEH_PE_ISOLATED | EEH_PE_CFG_BLOCKED, true); return -EINVAL; }; @@ -936,7 +936,7 @@ int eeh_pe_reset_full(struct eeh_pe *pe) __func__, state, pe->phb->global_number, pe->addr, (i + 1)); } - eeh_pe_state_clear(pe, reset_state); + eeh_pe_state_clear(pe, reset_state, true); return ret; } @@ -1380,7 +1380,7 @@ static int eeh_pe_change_owner(struct eeh_pe *pe) ret = eeh_unfreeze_pe(pe); if (!ret) - eeh_pe_state_clear(pe, EEH_PE_ISOLATED); + eeh_pe_state_clear(pe, EEH_PE_ISOLATED, true); return ret; } @@ -1640,7 +1640,7 @@ static int eeh_pe_reenable_devices(struct eeh_pe *pe) /* The PE is still in frozen state */ ret = eeh_unfreeze_pe(pe); if (!ret) - eeh_pe_state_clear(pe, EEH_PE_ISOLATED); + eeh_pe_state_clear(pe, EEH_PE_ISOLATED, true); return ret; } @@ -1668,7 +1668,7 @@ int eeh_pe_reset(struct eeh_pe *pe, int option) switch (option) { case EEH_RESET_DEACTIVATE: ret = eeh_ops->reset(pe, option); - eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED); + eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED, true); if (ret) break; diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 0109d5d7fe63..b2687c14dc40 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -603,7 +603,7 @@ static int eeh_clear_pe_frozen_state(struct eeh_pe *root) if (i >= 3)
[PATCH 6/6] powerpc/eeh: Correct retries in eeh_pe_reset_full()
Currently, eeh_pe_reset_full() will only attempt to reset a PE more than once if activating the reset state and deactivating it both succeed, but later polling shows that it hasn't become active. Change this so that it will try up to three times for any reason other than an unrecoverable slot error and adjust the message generation so that it's clear weather the reset has ultimately succeeded or failed. This allows the reset to succeed in some situations where it would currently fail. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 32 ++-- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index df02f55fdfa1..528caf857428 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -912,7 +912,7 @@ int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed) int reset_state = (EEH_PE_RESET | EEH_PE_CFG_BLOCKED); int type = EEH_RESET_HOT; unsigned int freset = 0; - int i, state, ret; + int i, state = 0, ret; /* * Determine the type of reset to perform - hot or fundamental. @@ -930,28 +930,32 @@ int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed) /* Make three attempts at resetting the bus */ for (i = 0; i < 3; i++) { ret = eeh_pe_reset(pe, type, include_passed); - if (ret) - break; - - ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE, include_passed); - if (ret) - break; + if (!ret) + ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE, + include_passed); + if (ret) { + ret = -EIO; + pr_warn("EEH: Failure %d resetting PHB#%x-PE#%x (attempt %d)\n\n", + state, pe->phb->global_number, pe->addr, i + 1); + continue; + } + if (i) + pr_warn("EEH: PHB#%x-PE#%x: Successful reset (attempt %d)\n", + pe->phb->global_number, pe->addr, i + 1); /* Wait until the PE is in a functioning state */ state = eeh_wait_state(pe, PCI_BUS_RESET_WAIT_MSEC); if (state < 0) { - pr_warn("%s: Unrecoverable slot failure on PHB#%x-PE#%x", - __func__, pe->phb->global_number, pe->addr); + pr_warn("EEH: Unrecoverable slot failure on PHB#%x-PE#%x", + pe->phb->global_number, pe->addr); ret = -ENOTRECOVERABLE; break; } if (eeh_state_active(state)) break; - - /* Set error in case this is our last attempt */ - ret = -EIO; - pr_warn("%s: Failure %d resetting PHB#%x-PE#%x\n (%d)\n", - __func__, state, pe->phb->global_number, pe->addr, (i + 1)); + else + pr_warn("EEH: PHB#%x-PE#%x: Slot inactive after reset: 0x%x (attempt %d)\n", + pe->phb->global_number, pe->addr, state, i + 1); } /* Resetting the PE may have unfrozen child PEs. If those PEs have been -- 2.19.0.2.gcad72f5712
[PATCH 0/6] powerpc/eeh: Improve recovery of passed-through devices
Hello, Here are changes that allow EEH to successfully recover after a failure that affects of both host and guest devices. This happens, for example, when a PHB containing passed-through devices is fenced. (Failures that include only passed-through devices are ignored by the host.) Currently, when an error affects both passed-through and un-passed-through devices, the passed-through devices are treated as if their driver was not EEH aware. This causes them to be hot-unplugged as part of recovery. The hot unplug request is forwarded to the guest which checks the device status before releasing the device. Because the host is recovering the device, it reports the device status as EEH_STATE_UNAVAILABLE which causes the guest to wait for the device to become available. This deadlocks the recovery process. This change causes the host to recover it's own devices but leave passed-through devices frozen until the guest performs it's own recovery. (They are not removed.) If the guest detects the error and begins recovery itself, waiting for the device state to change away from EEH_STATE_UNAVAILABLE causes it to wait until the host has finished it's recovery and the guest's subsequent recovery can then succeed. Note that resetting a PE may implicitly thaw both it and child PEs, and to prevent the device from being accidentally used by the guest (which may be unaware of the failure and reset) when in this state, we re-freeze those devices. This does leave a small window of opportunity but that will need to be addressed with a firmware change. I've also included a fix to the reset function (the last patch), because without it some scenarios still fail. An example is injecting an error into a PHB and then exiting a guest that contains passed-through devices from that PHB so that an EEH event is raised during the process of passing the device back to the host. Cheers, Sam. Sam Bobroff (6): powerpc/eeh: Cleanup eeh_pe_clear_frozen_state() powerpc/eeh: remove sw_state from eeh_unfreeze_pe() powerpc/eeh: Add include_passed to eeh_pe_state_clear() powerpc/eeh: Add include_passed to eeh_clear_pe_frozen_state() powerpc/eeh: Improve recovery of passed-through devices powerpc/eeh: Correct retries in eeh_pe_reset_full() arch/powerpc/include/asm/eeh.h | 4 +- arch/powerpc/include/asm/ppc-pci.h | 4 +- arch/powerpc/kernel/eeh.c | 103 +++-- arch/powerpc/kernel/eeh_driver.c | 86 ++-- arch/powerpc/kernel/eeh_pe.c | 68 --- arch/powerpc/kernel/eeh_sysfs.c| 3 +- drivers/vfio/vfio_spapr_eeh.c | 6 +- 7 files changed, 140 insertions(+), 134 deletions(-) -- 2.19.0.2.gcad72f5712
[PATCH 4/6] powerpc/eeh: Add include_passed to eeh_clear_pe_frozen_state()
Add a parameter to eeh_clear_pe_frozen_state() that allows passed-through PEs to be excluded. Update callers to always pass true so that there is no change in behaviour. This is to prepare for follow-up work for passed-through devices. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_driver.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index b2687c14dc40..61c177ebb230 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -591,19 +591,21 @@ static void *eeh_pe_detach_dev(struct eeh_pe *pe, void *userdata) * PE reset (for 3 times), we try to clear the frozen state * for 3 times as well. */ -static int eeh_clear_pe_frozen_state(struct eeh_pe *root) +static int eeh_clear_pe_frozen_state(struct eeh_pe *root, bool include_passed) { struct eeh_pe *pe; int i; eeh_for_each_pe(root, pe) { - for (i = 0; i < 3; i++) - if (!eeh_unfreeze_pe(pe)) - break; - if (i >= 3) - return -EIO; + if (include_passed || !eeh_pe_passed(pe)) { + for (i = 0; i < 3; i++) + if (!eeh_unfreeze_pe(pe)) + break; + if (i >= 3) + return -EIO; + } } - eeh_pe_state_clear(root, EEH_PE_ISOLATED, true); + eeh_pe_state_clear(root, EEH_PE_ISOLATED, include_passed); return 0; } @@ -629,7 +631,7 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe) } /* Unfreeze the PE */ - ret = eeh_clear_pe_frozen_state(pe); + ret = eeh_clear_pe_frozen_state(pe, true); if (ret) { eeh_pe_state_clear(pe, EEH_PE_RECOVERING, true); return ret; @@ -702,7 +704,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, eeh_pe_restore_bars(pe); /* Clear frozen state */ - rc = eeh_clear_pe_frozen_state(pe); + rc = eeh_clear_pe_frozen_state(pe, true); if (rc) { pci_unlock_rescan_remove(); return rc; -- 2.19.0.2.gcad72f5712
[PATCH 1/6] powerpc/eeh: Cleanup eeh_pe_clear_frozen_state()
The 'clear_sw_state' parameter for eeh_pe_clear_frozen_state() is redundant because it has no effect (except in the rare case of a hardware error part way through unfreezing a tree of PEs, where it would dangerously allow partial de-isolation before returning failure). It is passed down to __eeh_pe_clear_frozen_state(), and from there to eeh_unfreeze_pe(), where it causes EEH_PE_ISOLATED to be removed from the state of each PE during the traversal. However, when the traversal finishes, EEH_PE_ISOLATED is unconditionally removed by a call to eeh_pe_state_clear() regardless of the parameter's value. So remove the flag and pass false to eeh_unfreeze_pe() (to avoid the rare case described above, as it was before the flag was introduced). Also, perform the recursion directly in the function and eliminate a bit of boilerplate. There should be no change in functionality, except as mentioned above. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_driver.c | 40 +++- 1 file changed, 13 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 9446248eb6b8..aa86a42d98f2 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -591,34 +591,20 @@ static void *eeh_pe_detach_dev(struct eeh_pe *pe, void *userdata) * PE reset (for 3 times), we try to clear the frozen state * for 3 times as well. */ -static void *__eeh_clear_pe_frozen_state(struct eeh_pe *pe, void *flag) +static int eeh_clear_pe_frozen_state(struct eeh_pe *root) { - bool clear_sw_state = *(bool *)flag; - int i, rc = 1; - - for (i = 0; rc && i < 3; i++) - rc = eeh_unfreeze_pe(pe, clear_sw_state); + struct eeh_pe *pe; + int i; - /* Stop immediately on any errors */ - if (rc) { - pr_warn("%s: Failure %d unfreezing PHB#%x-PE#%x\n", - __func__, rc, pe->phb->global_number, pe->addr); - return (void *)pe; + eeh_for_each_pe(root, pe) { + for (i = 0; i < 3; i++) + if (!eeh_unfreeze_pe(pe, false)) + break; + if (i >= 3) + return -EIO; } - - return NULL; -} - -static int eeh_clear_pe_frozen_state(struct eeh_pe *pe, -bool clear_sw_state) -{ - void *rc; - - rc = eeh_pe_traverse(pe, __eeh_clear_pe_frozen_state, _sw_state); - if (!rc) - eeh_pe_state_clear(pe, EEH_PE_ISOLATED); - - return rc ? -EIO : 0; + eeh_pe_state_clear(root, EEH_PE_ISOLATED); + return 0; } int eeh_pe_reset_and_recover(struct eeh_pe *pe) @@ -643,7 +629,7 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe) } /* Unfreeze the PE */ - ret = eeh_clear_pe_frozen_state(pe, true); + ret = eeh_clear_pe_frozen_state(pe); if (ret) { eeh_pe_state_clear(pe, EEH_PE_RECOVERING); return ret; @@ -716,7 +702,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, eeh_pe_restore_bars(pe); /* Clear frozen state */ - rc = eeh_clear_pe_frozen_state(pe, false); + rc = eeh_clear_pe_frozen_state(pe); if (rc) { pci_unlock_rescan_remove(); return rc; -- 2.19.0.2.gcad72f5712
Re: [PATCH kernel] powerpc/powernv/npu: Remove unused headers and a macro.
Thanks! Looks like a reasonable cleanup. Assuming it compiles I can't see any reason not to add: Acked-by: Alistair Popple On Thursday, 29 November 2018 1:11:34 PM AEDT Alexey Kardashevskiy wrote: > Ping? > > On 28/09/2018 16:48, Alexey Kardashevskiy wrote: > > The macro and few headers are not used so remove them. > > > > Signed-off-by: Alexey Kardashevskiy > > --- > > > > arch/powerpc/platforms/powernv/npu-dma.c | 14 -- > > 1 file changed, 14 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > > b/arch/powerpc/platforms/powernv/npu-dma.c index 8006c54..3a5c4ed 100644 > > --- a/arch/powerpc/platforms/powernv/npu-dma.c > > +++ b/arch/powerpc/platforms/powernv/npu-dma.c > > @@ -9,32 +9,18 @@ > > > > * License as published by the Free Software Foundation. > > */ > > > > -#include > > > > #include > > #include > > #include > > > > -#include > > > > #include > > #include > > > > -#include > > -#include > > > > #include > > > > -#include > > > > #include > > > > -#include > > -#include > > -#include > > -#include > > -#include > > -#include > > > > #include > > > > -#include "powernv.h" > > > > #include "pci.h" > > > > -#define npu_to_phb(x) container_of(x, struct pnv_phb, npu) > > - > > > > /* > > > > * spinlock to protect initialisation of an npu_context for a particular > > * mm_struct.
Re: [PATCH] selftests/powerpc: New TM signal self test
On Wed, 2018-11-28 at 11:23 -0200, Breno Leitao wrote: > A new self test that forces MSR[TS] to be set without calling any TM > instruction. This test also tries to cause a page fault at a signal > handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing > thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG > when tm_recheckpoint() is called. > > This test is not deterministic since it is hard to guarantee that the page > access will cause a page fault. Tests have shown that the bug could be > exposed with few interactions in a buggy kernel. This test is configured to > loop 5000x, having a good chance to hit the kernel issue in just one run. > This self test takes less than two seconds to run. You could try using sigaltstack() to put the ucontext somewhere else. Then you could play tricks with that memory to try to force a fault. madvise()+MADV_DONTNEED or fadvise()+POSIX_FADV_DONTNEED might do the trick. This is more extra credit to make it more reliable. Not a requirement. > This test uses set/getcontext because the kernel will recheckpoint > zeroed structures, causing the test to segfault, which is undesired because > the test needs to rerun, so, there is a signal handler for SIGSEGV which > will restart the test. Please put this description at the top of the test also. Other than that, it looks good. Mikey > > Signed-off-by: Breno Leitao > --- > tools/testing/selftests/powerpc/tm/.gitignore | 1 + > tools/testing/selftests/powerpc/tm/Makefile | 3 +- > .../powerpc/tm/tm-signal-force-msr.c | 115 ++ > 3 files changed, 118 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c > > diff --git a/tools/testing/selftests/powerpc/tm/.gitignore > b/tools/testing/selftests/powerpc/tm/.gitignore > index c3ee8393dae8..89679822ebc9 100644 > --- a/tools/testing/selftests/powerpc/tm/.gitignore > +++ b/tools/testing/selftests/powerpc/tm/.gitignore > @@ -11,6 +11,7 @@ tm-signal-context-chk-fpu > tm-signal-context-chk-gpr > tm-signal-context-chk-vmx > tm-signal-context-chk-vsx > +tm-signal-force-msr > tm-vmx-unavail > tm-unavailable > tm-trap > diff --git a/tools/testing/selftests/powerpc/tm/Makefile > b/tools/testing/selftests/powerpc/tm/Makefile > index 9fc2cf6fbc92..58a2ebd13958 100644 > --- a/tools/testing/selftests/powerpc/tm/Makefile > +++ b/tools/testing/selftests/powerpc/tm/Makefile > @@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm- > signal-context-chk-fpu > > TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal- > stack \ > tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable > tm-trap > \ > - $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn > + $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr > > top_srcdir = ../../../../.. > include ../../lib.mk > @@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 > $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c > $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno- > error=uninitialized -mvsx > $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64 > +$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread > > SIGNAL_CONTEXT_CHK_TESTS := $(patsubst > %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) > $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S > diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c > b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c > new file mode 100644 > index ..4441d61c2328 > --- /dev/null > +++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c > @@ -0,0 +1,115 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp. > + */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "tm.h" > +#include "utils.h" > + > +#define __MASK(X) (1UL<<(X)) > +#define MSR_TS_S_LG 33 /* Trans Mem state: Suspended */ > +#define MSR_TM __MASK(MSR_TM_LG) /* Transactional Mem Available */ > +#define MSR_TS_S__MASK(MSR_TS_S_LG) /* Transaction Suspended */ Surely we have these defined somewhere else in selftests? > + > +#define COUNT_MAX 5000/* Number of interactions */ > + > +/* Setting contexts because the test will crash and we want to recover */ > +ucontext_t init_context, main_context; > + > +static int count, first_time; > + > +void trap_signal_handler(int signo, siginfo_t *si, void *uc) > +{ > + ucontext_t *ucp = uc; > + > + /* > + * Allocating memory in a signal handler, and never freeing it on > + * purpose, forcing the heap increase, so, the memory leak is what > + * we want here. > + */ > + ucp->uc_link = malloc(sizeof(ucontext_t)); > + memcpy(>uc_link, >uc_mcontext, sizeof(ucp->uc_mcontext)); > + > + /* Forcing to enable MSR[TM] */ > + ucp->uc_mcontext.gp_regs[PT_MSR]
Re: [PATCH kernel] powerpc/powernv/npu: Remove unused headers and a macro.
Ping? On 28/09/2018 16:48, Alexey Kardashevskiy wrote: > The macro and few headers are not used so remove them. > > Signed-off-by: Alexey Kardashevskiy > --- > arch/powerpc/platforms/powernv/npu-dma.c | 14 -- > 1 file changed, 14 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > b/arch/powerpc/platforms/powernv/npu-dma.c > index 8006c54..3a5c4ed 100644 > --- a/arch/powerpc/platforms/powernv/npu-dma.c > +++ b/arch/powerpc/platforms/powernv/npu-dma.c > @@ -9,32 +9,18 @@ > * License as published by the Free Software Foundation. > */ > > -#include > #include > #include > #include > -#include > #include > #include > -#include > -#include > > #include > -#include > #include > -#include > -#include > -#include > -#include > -#include > -#include > #include > > -#include "powernv.h" > #include "pci.h" > > -#define npu_to_phb(x) container_of(x, struct pnv_phb, npu) > - > /* > * spinlock to protect initialisation of an npu_context for a particular > * mm_struct. > -- Alexey
[PATCH] powerpc: annotate implicit fall throughs
There is a plan to build the kernel with -Wimplicit-fallthrough and these places in the code produced warnings, but because we build arch/powerpc with -Werror, they became errors. Fix them up. This patch produces no change in behaviour, but should be reviewed in case these are actually bugs not intentional fallthoughs. Cc: Kees Cook Cc: Gustavo A. R. Silva Cc: Nicholas Piggin Cc: Balbir Singh Signed-off-by: Stephen Rothwell --- arch/powerpc/kernel/nvram_64.c| 1 + arch/powerpc/platforms/powermac/feature.c | 1 + arch/powerpc/xmon/xmon.c | 1 + 3 files changed, 3 insertions(+) The patch is relative to v4.20-rc4 and has been in linux-next for a few days. diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c index 22e9d281324d..06e2eda2430e 100644 --- a/arch/powerpc/kernel/nvram_64.c +++ b/arch/powerpc/kernel/nvram_64.c @@ -809,6 +809,7 @@ static long dev_nvram_ioctl(struct file *file, unsigned int cmd, #ifdef CONFIG_PPC_PMAC case OBSOLETE_PMAC_NVRAM_GET_OFFSET: printk(KERN_WARNING "nvram: Using obsolete PMAC_NVRAM_GET_OFFSET ioctl\n"); + /* fall through */ case IOC_NVRAM_GET_OFFSET: { int part, offset; diff --git a/arch/powerpc/platforms/powermac/feature.c b/arch/powerpc/platforms/powermac/feature.c index ed2f54b3f173..a7ec06876b53 100644 --- a/arch/powerpc/platforms/powermac/feature.c +++ b/arch/powerpc/platforms/powermac/feature.c @@ -1471,6 +1471,7 @@ static long g5_i2s_enable(struct device_node *node, long param, long value) case 2: if (macio->type == macio_shasta) break; + /* fall through */ default: return -ENODEV; } diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 36b8dc47a3c3..308326f8b7ed 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -4033,6 +4033,7 @@ static int do_spu_cmd(void) subcmd = inchar(); if (isxdigit(subcmd) || subcmd == '\n') termch = subcmd; + /* fall through */ case 'f': scanhex(); if (num >= XMON_NUM_SPUS || !spu_info[num].spu) { -- 2.20.0.rc1 -- Cheers, Stephen Rothwell pgpuIEL50L1jB.pgp Description: OpenPGP digital signature
Re: [PATCH RESEND] powerpc/perf: Update perf_regs structure to include SIER
Madhavan Srinivasan writes: > On 28/11/18 9:04 AM, Michael Ellerman wrote: >> Arnaldo Carvalho de Melo writes: >> >>> Em Mon, Nov 26, 2018 at 11:34:08PM +0530, Madhavan Srinivasan escreveu: On each sample, Sample Instruction Event Register (SIER) content is saved in pt_regs. SIER does not have a entry as-is in the pt_regs but instead, SIER content is saved in the "dar" register of pt_regs. Patch adds another entry to the perf_regs structure to include the "SIER" printing which internally maps to the "dar" of pt_regs. >>> I think the patch is ok, when we talked in Vancouver I thought I saw >>> something like this before, i.e. adding more registers to a perf_regs.h >>> file, this was the cset: >>> >>>commit 0da0017f72554c005c1a04c3adc5da9eb64fa7e5 >>>Author: Hendrik Brueckner >>>Date: Wed Nov 8 07:30:15 2017 +0100 >>> >>>s390/perf: extend perf_regs support to include floating-point >>> registers >>> >>> That I came across because it broke the perf build, making me add this >>> cset: >>> >>>commit 10b9baa701d5023897f70a4acb3bf0235da3dc4f >>>Author: Arnaldo Carvalho de Melo >>>Date: Tue Nov 28 11:08:41 2017 -0300 >>> >>> tools arch s390: Do not include header files from the kernel sources >>> >>> :-) >>> >>> Michael? What about the ppc specific details? >> The only possible objection is that not all CPUs have an SIER register, >> so on CPUs without it you'll get the content of the DAR register rather >> than the SIER (because we (ab)use the DAR slot of pt_regs for the SIER). >> >> Perhaps we should make sure that we return 0 on CPUs that don't have the >> register? > > > Yes this make sense. We should make it zero instead of having the DAR > value. I will respin the patch with that change. Thanks. There's three cases we need to cover. The first is when we're not using core-book3s.c at all, ie. when we're using the Freescale PMU code. You can probably handle that at build time using CONFIG_FSL_EMB_PERF_EVENT? Then there's the 32-bit case in core-book3s.c, currently that version of perf_read_regs() never sets a value in dar. And then there's the 64-bit case but PPMU_HAS_SIER is not set. cheers
Re: [PATCH V2 0/5] NestMMU pte upgrade workaround for mprotect
On Wed, 28 Nov 2018 20:04:33 +0530 "Aneesh Kumar K.V" wrote: > We can upgrade pte access (R -> RW transition) via mprotect. We need > to make sure we follow the recommended pte update sequence as outlined in > commit: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle > nest MMU hang") > for such updates. This patch series do that. The mm bits look (mostly) OK to me. I suggest all these be merged via the appropriate powerpc tree.
Re: [PATCH V2 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update
On Wed, 28 Nov 2018 20:04:37 +0530 "Aneesh Kumar K.V" wrote: > Signed-off-by: Aneesh Kumar K.V Some explanation of the motivation would be useful. > include/linux/hugetlb.h | 18 ++ > mm/hugetlb.c| 8 +--- > 2 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 087fd5f48c91..e2a3b0c854eb 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -543,6 +543,24 @@ static inline void set_huge_swap_pte_at(struct mm_struct > *mm, unsigned long addr > set_huge_pte_at(mm, addr, ptep, pte); > } > #endif > + > +#ifndef huge_ptep_modify_prot_start > +static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, > + unsigned long addr, pte_t *ptep) > +{ > + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); > +} > +#endif #define huge_ptep_modify_prot_start huge_ptep_modify_prot_start > +#ifndef huge_ptep_modify_prot_commit > +static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, > + unsigned long addr, pte_t *ptep, > + pte_t old_pte, pte_t pte) > +{ > + set_huge_pte_at(vma->vm_mm, addr, ptep, pte); > +} > +#endif #define huge_ptep_modify_prot_commit huge_ptep_modify_prot_commit
Re: powerpc: Fix COFF zImage booting on old powermacs
On Mon, 2018-11-26 at 22:01:54 UTC, Paul Mackerras wrote: > Commit 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper > as a relocatable ET_DYN", 2011-04-12) changed the procedure descriptor > at the start of crt0.S to have a hard-coded start address of 0x50 > rather than a reference to _zimage_start, presumably because having > a reference to a symbol introduced a relocation which is awkward to > handle in a position-independent executable. Unfortunately, what is > at 0x50 in the COFF image is not the first instruction, but the > procedure descriptor itself, that is, a word containing 0x50, > which is not a valid instruction. Hence, booting a COFF zImage > results in a "DEFAULT CATCH!, code=FFF00700" message from Open > Firmware. > > This fixes the problem by (a) putting the procedure descriptor in the > data section and (b) adding a branch to _zimage_start as the first > instruction in the program. > > Fixes: 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper as a > relocatable ET_DYN") > Signed-off-by: Paul Mackerras Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/5564597d51c8ff5b88d95c76255e18 cheers
Re: powerpc/mm: Fix linux page tables build with some configs
On Mon, 2018-11-26 at 01:59:16 UTC, Michael Ellerman wrote: > For some configs the build fails with: > > arch/powerpc/mm/dump_linuxpagetables.c: In function 'populate_markers': > arch/powerpc/mm/dump_linuxpagetables.c:306:39: error: 'PKMAP_BASE' > undeclared (first use in this function) > arch/powerpc/mm/dump_linuxpagetables.c:314:50: error: 'LAST_PKMAP' > undeclared (first use in this function) > > These come from highmem.h, including that fixes the build. > > Signed-off-by: Michael Ellerman Applied to powerpc fixes. https://git.kernel.org/powerpc/c/462951cd32e1496dc64b00051dfb77 cheers
Re: [PATCH] powerpc/boot: Copy serial.c in Makefile
Hi Daniel, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v4.20-rc4 next-20181128] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Daniel-Axtens/powerpc-boot-Copy-serial-c-in-Makefile/20181129-025021 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-defconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/boot/epapr.o: In function `epapr_platform_init': >> epapr.c:(.text+0x184): undefined reference to `serial_console_init' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: use generic DMA mapping code in powerpc V4
On Wed, 28 Nov 2018 16:55:30 +0100 Christian Zigotzky wrote: > On 28 November 2018 at 12:05PM, Michael Ellerman wrote: > > Nothing specific yet. > > > > I'm a bit worried it might break one of the many old obscure platforms > > we have that aren't well tested. > > > Please don't apply the new DMA mapping code if you don't be sure if it > works on all supported PowerPC machines. Is the new DMA mapping code > really necessary? It's not really nice, to rewrote code if the old code > works perfect. We must not forget, that we work for the end users. Does > the end user have advantages with this new code? Is it faster? The old > code works without any problems. There is another service provided to the users as well: new code that is cleaner and simpler which allows easier bug fixes and new features. Without being familiar with the DMA mapping code I cannot really say if that's the case here. > I am also worried about this code. How > can I test this new DMA mapping code? I suppose if your machine works it works for you. Thanks Michal
Re: use generic DMA mapping code in powerpc V4
I will compile and test the kernel from the following Git on my PowerPC machines. http://git.infradead.org/users/hch/misc.git On 28 November 2018 at 12:05PM, Michael Ellerman wrote: Nothing specific yet. I'm a bit worried it might break one of the many old obscure platforms we have that aren't well tested.
[PATCH v3] powerpc/mm: add exec protection on powerpc 603
The 603 doesn't have a HASH table, TLB misses are handled by software. It is then possible to generate page fault when _PAGE_EXEC is not set like in nohash/32. There is one "reserved" PTE bit available, this patch uses it for _PAGE_EXEC. In order to support it, set_pte_filter() and set_access_flags_filter() are made common, and the handling is made dependent on MMU_FTR_HPTE_TABLE Signed-off-by: Christophe Leroy --- v3: Included the _PAGE_EXEC flag in the existing test in TLB handler, no additional insns. v2: Amended commit log and removed #ifdef in pagetable dump arch/powerpc/include/asm/book3s/32/hash.h | 1 + arch/powerpc/include/asm/book3s/32/pgtable.h | 18 +- arch/powerpc/include/asm/cputable.h| 8 arch/powerpc/kernel/head_32.S | 2 +- arch/powerpc/mm/dump_linuxpagetables-generic.c | 2 -- arch/powerpc/mm/pgtable.c | 20 +++- 6 files changed, 26 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/hash.h b/arch/powerpc/include/asm/book3s/32/hash.h index f2892c7ab73e..2a0a467d2985 100644 --- a/arch/powerpc/include/asm/book3s/32/hash.h +++ b/arch/powerpc/include/asm/book3s/32/hash.h @@ -26,6 +26,7 @@ #define _PAGE_WRITETHRU0x040 /* W: cache write-through */ #define _PAGE_DIRTY0x080 /* C: page changed */ #define _PAGE_ACCESSED 0x100 /* R: page referenced */ +#define _PAGE_EXEC 0x200 /* software: exec allowed */ #define _PAGE_RW 0x400 /* software: user write access allowed */ #define _PAGE_SPECIAL 0x800 /* software: Special page */ diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index b849b45429d5..6accf3e686af 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -10,9 +10,9 @@ /* And here we include common definitions */ #define _PAGE_KERNEL_RO0 -#define _PAGE_KERNEL_ROX 0 +#define _PAGE_KERNEL_ROX (_PAGE_EXEC) #define _PAGE_KERNEL_RW(_PAGE_DIRTY | _PAGE_RW) -#define _PAGE_KERNEL_RWX (_PAGE_DIRTY | _PAGE_RW) +#define _PAGE_KERNEL_RWX (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC) #define _PAGE_HPTEFLAGS _PAGE_HASHPTE @@ -66,11 +66,11 @@ static inline bool pte_user(pte_t pte) */ #define PAGE_NONE __pgprot(_PAGE_BASE) #define PAGE_SHARED__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW) -#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW) +#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | _PAGE_EXEC) #define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER) -#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) -#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) /* Permission masks used for kernel mappings */ #define PAGE_KERNEL__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) @@ -318,7 +318,7 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma, int psize) { unsigned long set = pte_val(entry) & - (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW); + (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC); pte_update(ptep, 0, set); @@ -384,7 +384,7 @@ static inline int pte_dirty(pte_t pte) { return !!(pte_val(pte) & _PAGE_DIRTY); static inline int pte_young(pte_t pte) { return !!(pte_val(pte) & _PAGE_ACCESSED); } static inline int pte_special(pte_t pte) { return !!(pte_val(pte) & _PAGE_SPECIAL); } static inline int pte_none(pte_t pte) { return (pte_val(pte) & ~_PTE_NONE_MASK) == 0; } -static inline bool pte_exec(pte_t pte) { return true; } +static inline bool pte_exec(pte_t pte) { return pte_val(pte) & _PAGE_EXEC; } static inline int pte_present(pte_t pte) { @@ -451,7 +451,7 @@ static inline pte_t pte_wrprotect(pte_t pte) static inline pte_t pte_exprotect(pte_t pte) { - return pte; + return __pte(pte_val(pte) & ~_PAGE_EXEC); } static inline pte_t pte_mkclean(pte_t pte) @@ -466,7 +466,7 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_mkexec(pte_t pte) { - return pte; + return __pte(pte_val(pte) | _PAGE_EXEC); } static inline pte_t pte_mkpte(pte_t pte) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 29f49a35d6ee..a0395ccbbe9e 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -296,7 +296,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_PPC601(CPU_FTR_COMMON | CPU_FTR_601 | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_UNIFIED_ID_CACHE | CPU_FTR_USE_RTC)
[PATCH 0/4] powerpc/perf: IMC trace-mode support
IMC (In-Memory collection counters) is a hardware monitoring facility that collects large number of hardware performance events. POWER9 support two modes for IMC which are the Accumulation mode and Trace mode. In Accumulation mode, event counts are accumulated in system Memory. Hypervisor then reads the posted counts periodically or when requested. In IMC Trace mode, event counted is fixed for cycles and on each overflow, hardware snapshots the program counter along with other details and writes into memory pointed by LDBAR(ring buffer memory, hardware wraps around). LDBAR has bit which indicates the IMC trace-mode. Trace-IMC Implementation: -- To enable trace-imc, we need to * Add trace node in the DTS file for power9, so that the new trace node can be discovered by the kernel. Information included in the DTS file are as follows, (a snippet from the ima-catalog) TRACE_IMC: trace-events { #address-cells = <0x1>; #size-cells = <0x1>; event@1020 { event-name = "cycles" ; reg = <0x1020 0x8>; desc = "Reference cycles" ; }; }; trace@0 { compatible = "ibm,imc-counters"; events-prefix = "trace_"; reg = <0x0 0x8>; events = < _IMC >; type = <0x2>; size = <0x4>; }; OP-BUILD changes needed to include the "trace node" is already pulled in to the ima-catalog repo. ps://github.com/open-power/op-build/commit/d3e75dc26d1283d7d5eb444bff1ec9e40d5dfc07 * Enchance the opal_imc_counters_* calls to support this new trace mode in imc. Add support to initialize the trace-mode scom. TRACE_IMC_SCOM bit representation: 0:1 : SAMPSEL 2:33: CPMC_LOAD 34:40 : CPMC1SEL 41:47 : CPMC2SEL 48:50 : BUFFERSIZE 51:63 : RESERVED CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL determines the event to count. BUFFRSIZE indicates the memory range. On each overflow, hardware snapshots program counter along with other details and update the memory and reloads the CMPC_LOAD value for the next sampling duration. IMC hardware does not support exceptions, so it quietly wraps around if memory buffer reaches the end. Link to the skiboot patches to enhance the opal_imc_counters_* calls: https://lists.ozlabs.org/pipermail/skiboot/2018-October/012442.html https://lists.ozlabs.org/pipermail/skiboot/2018-October/012441.html https://lists.ozlabs.org/pipermail/skiboot/2018-October/012439.html https://lists.ozlabs.org/pipermail/skiboot/2018-October/012440.html * Set LDBAR spr to enable imc-trace mode. LDBAR Layout: 0 : Enable/Disable 1 : 0 -> Accumulation Mode 1 -> Trace Mode 2:3 : Reserved 4-6 : PB scope 7 :
[PATCH 4/4] powerpc/perf: Trace imc PMU functions
Add PMU functions to support trace-imc. Signed-off-by: Anju T Sudhakar --- arch/powerpc/perf/imc-pmu.c | 172 1 file changed, 172 insertions(+) diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index d9ffe7f03f1e..18af7c3e2345 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -1117,6 +1117,170 @@ static int trace_imc_cpu_init(void) ppc_trace_imc_cpu_offline); } +static u64 get_trace_imc_event_base_addr(void) +{ + return (u64)per_cpu(trace_imc_mem, smp_processor_id()); +} + +/* + * Function to parse trace-imc data obtained + * and to prepare the perf sample. + */ +static int trace_imc_prepare_sample(struct trace_imc_data *mem, + struct perf_sample_data *data, + u64 *prev_tb, + struct perf_event_header *header, + struct perf_event *event) +{ + /* Sanity checks for a valid record */ + if (be64_to_cpu(READ_ONCE(mem->tb1)) > *prev_tb) + *prev_tb = be64_to_cpu(READ_ONCE(mem->tb1)); + else + return -EINVAL; + + if ((be64_to_cpu(READ_ONCE(mem->tb1)) & IMC_TRACE_RECORD_TB1_MASK) != +be64_to_cpu(READ_ONCE(mem->tb2))) + return -EINVAL; + + /* Prepare perf sample */ + data->ip = be64_to_cpu(READ_ONCE(mem->ip)); + data->period = event->hw.last_period; + + header->type = PERF_RECORD_SAMPLE; + header->size = sizeof(*header) + event->header_size; + header->misc = 0; + + if (is_kernel_addr(data->ip)) + header->misc |= PERF_RECORD_MISC_KERNEL; + else + header->misc |= PERF_RECORD_MISC_USER; + + perf_event_header__init_id(header, data, event); + + return 0; +} + +static void dump_trace_imc_data(struct perf_event *event) +{ + struct trace_imc_data *mem; + int i, ret; + u64 prev_tb = 0; + + mem = (struct trace_imc_data *)get_trace_imc_event_base_addr(); + for (i = 0; i < (trace_imc_mem_size / sizeof(struct trace_imc_data)); + i++, mem++) { + struct perf_sample_data data; + struct perf_event_header header; + + ret = trace_imc_prepare_sample(mem, , _tb, , event); + if (ret) /* Exit, if not a valid record */ + break; + else { + /* If this is a valid record, create the sample */ + struct perf_output_handle handle; + + if (perf_output_begin(, event, header.size)) + return; + + perf_output_sample(, , , event); + perf_output_end(); + } + } +} + +static int trace_imc_event_add(struct perf_event *event, int flags) +{ + /* Enable the sched_task to start the engine */ + perf_sched_cb_inc(event->ctx->pmu); + return 0; +} + +static void trace_imc_event_read(struct perf_event *event) +{ + dump_trace_imc_data(event); +} + +static void trace_imc_event_stop(struct perf_event *event, int flags) +{ + trace_imc_event_read(event); +} + +static void trace_imc_event_start(struct perf_event *event, int flags) +{ + return; +} + +static void trace_imc_event_del(struct perf_event *event, int flags) +{ + perf_sched_cb_dec(event->ctx->pmu); +} + +void trace_imc_pmu_sched_task(struct perf_event_context *ctx, + bool sched_in) +{ + int core_id = smp_processor_id() / threads_per_core; + struct imc_pmu_ref *ref; + u64 local_mem, ldbar_value; + + /* Set trace-imc bit in ldbar and load ldbar with per-thread memory address */ + local_mem = get_trace_imc_event_base_addr(); + ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | TRACE_IMC_ENABLE; + + ref = _imc_refc[core_id]; + if (!ref) + return; + + if (sched_in) { + mtspr(SPRN_LDBAR, ldbar_value); + mutex_lock(>lock); + if (ref->refc == 0) { + if (opal_imc_counters_start(OPAL_IMC_COUNTERS_TRACE, + get_hard_smp_processor_id(smp_processor_id( { + mutex_unlock(>lock); + pr_err("trace-imc: Unable to start the counters for core %d\n", core_id); + mtspr(SPRN_LDBAR, 0); + return; + } + } + ++ref->refc; + mutex_unlock(>lock); + } else { + mtspr(SPRN_LDBAR, 0); + mutex_lock(>lock); + ref->refc--; + if (ref->refc == 0) { + if (opal_imc_counters_stop(OPAL_IMC_COUNTERS_TRACE, +
[PATCH 3/4] powerpc/perf: Trace imc events detection and cpuhotplug
Patch detects trace-imc events, does memory initilizations for each online cpu, and registers cpuhotplug call-backs. Signed-off-by: Anju T Sudhakar --- arch/powerpc/perf/imc-pmu.c | 91 +++ arch/powerpc/platforms/powernv/opal-imc.c | 3 + include/linux/cpuhotplug.h| 1 + 3 files changed, 95 insertions(+) diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 3bef46f8417d..d9ffe7f03f1e 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -43,6 +43,10 @@ static DEFINE_PER_CPU(u64 *, thread_imc_mem); static struct imc_pmu *thread_imc_pmu; static int thread_imc_mem_size; +/* Trace IMC data structures */ +static DEFINE_PER_CPU(u64 *, trace_imc_mem); +static int trace_imc_mem_size; + static struct imc_pmu *imc_event_to_pmu(struct perf_event *event) { return container_of(event->pmu, struct imc_pmu, pmu); @@ -1065,6 +1069,54 @@ static void thread_imc_event_del(struct perf_event *event, int flags) imc_event_update(event); } +/* + * Allocate a page of memory for each cpu, and load LDBAR with 0. + */ +static int trace_imc_mem_alloc(int cpu_id, int size) +{ + u64 *local_mem = per_cpu(trace_imc_mem, cpu_id); + int phys_id = cpu_to_node(cpu_id), rc = 0; + + if (!local_mem) { + local_mem = page_address(alloc_pages_node(phys_id, + GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE | + __GFP_NOWARN, get_order(size))); + if (!local_mem) + return -ENOMEM; + per_cpu(trace_imc_mem, cpu_id) = local_mem; + + /* Initialise the counters for trace mode */ + rc = opal_imc_counters_init(OPAL_IMC_COUNTERS_TRACE, __pa((void *)local_mem), + get_hard_smp_processor_id(cpu_id)); + if (rc) { + pr_info("IMC:opal init failed for trace imc\n"); + return rc; + } + } + + mtspr(SPRN_LDBAR, 0); + return 0; +} + +static int ppc_trace_imc_cpu_online(unsigned int cpu) +{ + return trace_imc_mem_alloc(cpu, trace_imc_mem_size); +} + +static int ppc_trace_imc_cpu_offline(unsigned int cpu) +{ + mtspr(SPRN_LDBAR, 0); + return 0; +} + +static int trace_imc_cpu_init(void) +{ + return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE, + "perf/powerpc/imc_trace:online", + ppc_trace_imc_cpu_online, + ppc_trace_imc_cpu_offline); +} + /* update_pmu_ops : Populate the appropriate operations for "pmu" */ static int update_pmu_ops(struct imc_pmu *pmu) { @@ -1186,6 +1238,17 @@ static void cleanup_all_thread_imc_memory(void) } } +static void cleanup_all_trace_imc_memory(void) +{ + int i, order = get_order(trace_imc_mem_size); + + for_each_online_cpu(i) { + if (per_cpu(trace_imc_mem, i)) + free_pages((u64)per_cpu(trace_imc_mem, i), order); + + } +} + /* Function to free the attr_groups which are dynamically allocated */ static void imc_common_mem_free(struct imc_pmu *pmu_ptr) { @@ -1227,6 +1290,11 @@ static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr) cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE); cleanup_all_thread_imc_memory(); } + + if (pmu_ptr->domain == IMC_DOMAIN_TRACE) { + cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE); + cleanup_all_trace_imc_memory(); + } } /* @@ -1309,6 +1377,21 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent, thread_imc_pmu = pmu_ptr; break; + case IMC_DOMAIN_TRACE: + /* Update the pmu name */ + pmu_ptr->pmu.name = kasprintf(GFP_KERNEL, "%s%s", s, "_imc"); + if (!pmu_ptr->pmu.name) + return -ENOMEM; + + trace_imc_mem_size = pmu_ptr->counter_mem_size; + for_each_online_cpu(cpu) { + res = trace_imc_mem_alloc(cpu, trace_imc_mem_size); + if (res) { + cleanup_all_trace_imc_memory(); + goto err; + } + } + break; default: return -EINVAL; } @@ -1381,6 +1464,14 @@ int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id goto err_free_mem; } + break; + case IMC_DOMAIN_TRACE: + ret = trace_imc_cpu_init(); + if (ret) { + cleanup_all_trace_imc_memory(); + goto err_free_mem; + } + break; default:
[PATCH 1/4] powerpc/include: Add data structures and macros for IMC trace mode
Add the macros needed for IMC (In-Memory Collection Counters) trace-mode and data structure to hold the trace-imc record data. Also, add the new type "OPAL_IMC_COUNTERS_TRACE" in 'opal-api.h', since there is a new switch case added in the opal-calls for IMC. Signed-off-by: Anju T Sudhakar --- arch/powerpc/include/asm/imc-pmu.h | 39 + arch/powerpc/include/asm/opal-api.h | 1 + 2 files changed, 40 insertions(+) diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h index 69f516ecb2fd..7c2ef0e42661 100644 --- a/arch/powerpc/include/asm/imc-pmu.h +++ b/arch/powerpc/include/asm/imc-pmu.h @@ -33,6 +33,7 @@ */ #define THREAD_IMC_LDBAR_MASK 0x0003e000ULL #define THREAD_IMC_ENABLE 0x8000ULL +#define TRACE_IMC_ENABLE 0x4000ULL /* * For debugfs interface for imc-mode and imc-command @@ -59,6 +60,34 @@ struct imc_events { char *scale; }; +/* + * Trace IMC hardware updates a 64bytes record on + * Core Performance Monitoring Counter (CPMC) + * overflow. Here is the layout for the trace imc record + * + * DW 0 : Timebase + * DW 1 : Program Counter + * DW 2 : PIDR information + * DW 3 : CPMC1 + * DW 4 : CPMC2 + * DW 5 : CPMC3 + * Dw 6 : CPMC4 + * DW 7 : Timebase + * . + * + * The following is the data structure to hold trace imc data. + */ +struct trace_imc_data { + u64 tb1; + u64 ip; + u64 val; + u64 cpmc1; + u64 cpmc2; + u64 cpmc3; + u64 cpmc4; + u64 tb2; +}; + /* Event attribute array index */ #define IMC_FORMAT_ATTR0 #define IMC_EVENT_ATTR 1 @@ -68,6 +97,13 @@ struct imc_events { /* PMU Format attribute macros */ #define IMC_EVENT_OFFSET_MASK 0xULL +/* + * Macro to mask bits 0:21 of first double word(which is the timebase) to + * compare with 8th double word (timebase) of trace imc record data. + */ +#define IMC_TRACE_RECORD_TB1_MASK 0x3ffULL + + /* * Device tree parser code detects IMC pmu support and * registers new IMC pmus. This structure will hold the @@ -113,6 +149,7 @@ struct imc_pmu_ref { enum { IMC_TYPE_THREAD = 0x1, + IMC_TYPE_TRACE = 0x2, IMC_TYPE_CORE = 0x4, IMC_TYPE_CHIP = 0x10, }; @@ -123,6 +160,8 @@ enum { #define IMC_DOMAIN_NEST1 #define IMC_DOMAIN_CORE2 #define IMC_DOMAIN_THREAD 3 +/* For trace-imc the domain is still thread but it operates in trace-mode */ +#define IMC_DOMAIN_TRACE 4 extern int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id); diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 870fb7b239ea..a4130b21b159 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -1118,6 +1118,7 @@ enum { enum { OPAL_IMC_COUNTERS_NEST = 1, OPAL_IMC_COUNTERS_CORE = 2, + OPAL_IMC_COUNTERS_TRACE = 3, }; -- 2.17.1
[PATCH 2/4] powerpc/perf: Rearrange setting of ldbar for thread-imc
LDBAR holds the memory address allocated for each cpu. For thread-imc the mode bit (i.e bit 1) of LDBAR is set to accumulation. Currently, ldbar is loaded with per cpu memory address and mode set to accumulation at boot time. To enable trace-imc, the mode bit of ldbar should be set to 'trace'. So to accommodate trace-mode of IMC, reposition setting of ldbar for thread-imc to thread_imc_event_add(). Also reset ldbar at thread_imc_event_del(). Signed-off-by: Anju T Sudhakar --- arch/powerpc/perf/imc-pmu.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index f292a3f284f1..3bef46f8417d 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -806,8 +806,11 @@ static int core_imc_event_init(struct perf_event *event) } /* - * Allocates a page of memory for each of the online cpus, and write the - * physical base address of that page to the LDBAR for that cpu. + * Allocates a page of memory for each of the online cpus, and load + * LDBAR with 0. + * The physical base address of the page allocated for a cpu will be + * written to the LDBAR for that cpu, when the thread-imc event + * is added. * * LDBAR Register Layout: * @@ -825,7 +828,7 @@ static int core_imc_event_init(struct perf_event *event) */ static int thread_imc_mem_alloc(int cpu_id, int size) { - u64 ldbar_value, *local_mem = per_cpu(thread_imc_mem, cpu_id); + u64 *local_mem = per_cpu(thread_imc_mem, cpu_id); int nid = cpu_to_node(cpu_id); if (!local_mem) { @@ -842,9 +845,7 @@ static int thread_imc_mem_alloc(int cpu_id, int size) per_cpu(thread_imc_mem, cpu_id) = local_mem; } - ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | THREAD_IMC_ENABLE; - - mtspr(SPRN_LDBAR, ldbar_value); + mtspr(SPRN_LDBAR, 0); return 0; } @@ -995,6 +996,7 @@ static int thread_imc_event_add(struct perf_event *event, int flags) { int core_id; struct imc_pmu_ref *ref; + u64 ldbar_value, *local_mem = per_cpu(thread_imc_mem, smp_processor_id()); if (flags & PERF_EF_START) imc_event_start(event, flags); @@ -1003,6 +1005,9 @@ static int thread_imc_event_add(struct perf_event *event, int flags) return -EINVAL; core_id = smp_processor_id() / threads_per_core; + ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | THREAD_IMC_ENABLE; + mtspr(SPRN_LDBAR, ldbar_value); + /* * imc pmus are enabled only when it is used. * See if this is triggered for the first time. @@ -1034,11 +1039,7 @@ static void thread_imc_event_del(struct perf_event *event, int flags) int core_id; struct imc_pmu_ref *ref; - /* -* Take a snapshot and calculate the delta and update -* the event counter values. -*/ - imc_event_update(event); + mtspr(SPRN_LDBAR, 0); core_id = smp_processor_id() / threads_per_core; ref = _imc_refc[core_id]; @@ -1057,6 +1058,11 @@ static void thread_imc_event_del(struct perf_event *event, int flags) ref->refc = 0; } mutex_unlock(>lock); + /* +* Take a snapshot and calculate the delta and update +* the event counter values. +*/ + imc_event_update(event); } /* update_pmu_ops : Populate the appropriate operations for "pmu" */ -- 2.17.1
Re: use generic DMA mapping code in powerpc V4
On 28 November 2018 at 12:05PM, Michael Ellerman wrote: Nothing specific yet. I'm a bit worried it might break one of the many old obscure platforms we have that aren't well tested. Please don't apply the new DMA mapping code if you don't be sure if it works on all supported PowerPC machines. Is the new DMA mapping code really necessary? It's not really nice, to rewrote code if the old code works perfect. We must not forget, that we work for the end users. Does the end user have advantages with this new code? Is it faster? The old code works without any problems. I am also worried about this code. How can I test this new DMA mapping code? Thanks
Re: [PATCH 0/1] Fix NULL pointer access in PowerPC MSI teardown code
Hi Michael, On Wed, Nov 28, 2018 at 6:00 AM Michael Ellerman wrote: > > Radu Rendec writes: > > > > The assumption in arch_teardown_msi_irqs() is wrong and results in a > > function call on a NULL pointer. An example of how this can happen is > > included in the actual patch header. In my case, it happens when the PCI > > hardware is configured during kernel start-up, because my controller > > doesn't support MSI and the ops are NULL. > > What hardware are you on? I'm on Freescale MPC8378 - old stuff, but still going strong :) The MSI capable device is a Broadcom PEX 8613 (a 3-port PCIe switch). > > I'm proposing the attached patch to fix the problem. It basically just > > checks the pointer before the function call. > > Yeah that patch looks good to me. > > I suspect this bug was introduced in: > > 6b2fd7efeb88 ("PCI/MSI/PPC: Remove arch_msi_check_device()") > > Previously we had that check routine which would run before any of the > MSI setup had been done, and so if there were no MSI ops then we bailed > out early and didn't call teardown. > > I guess since then (2014) we haven't tested an MSI capable device on a > system that isn't MSI capable? Thanks for looking into this. You're probably right, it really looks like that patch could have introduced this bug. Cheers, Radu
[PATCH V2 5/5] arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgrade
NestMMU requires us to mark the pte invalid and flush the tlb when we do a RW upgrade of pte. We fixed a variant of this in the fault path in commit Fixes: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang") Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/hugetlb.h | 12 arch/powerpc/mm/hugetlbpage-radix.c | 17 arch/powerpc/mm/hugetlbpage.c| 29 3 files changed, 58 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h b/arch/powerpc/include/asm/book3s/64/hugetlb.h index 5b0177733994..66c1e4f88d65 100644 --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h @@ -13,6 +13,10 @@ radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); +extern void radix__huge_ptep_modify_prot_commit(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t old_pte, pte_t pte); + static inline int hstate_get_psize(struct hstate *hstate) { unsigned long shift; @@ -42,4 +46,12 @@ static inline bool gigantic_page_supported(void) /* hugepd entry valid bit */ #define HUGEPD_VAL_BITS(0x8000UL) +#define huge_ptep_modify_prot_start huge_ptep_modify_prot_start +extern pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, +unsigned long addr, pte_t *ptep); + +#define huge_ptep_modify_prot_commit huge_ptep_modify_prot_commit +extern void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, +unsigned long addr, pte_t *ptep, +pte_t old_pte, pte_t new_pte); #endif diff --git a/arch/powerpc/mm/hugetlbpage-radix.c b/arch/powerpc/mm/hugetlbpage-radix.c index 2486bee0f93e..1f77d71e7708 100644 --- a/arch/powerpc/mm/hugetlbpage-radix.c +++ b/arch/powerpc/mm/hugetlbpage-radix.c @@ -90,3 +90,20 @@ radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr, return vm_unmapped_area(); } + +void radix__huge_ptep_modify_prot_commit(struct vm_area_struct *vma, +unsigned long addr, pte_t *ptep, +pte_t old_pte, pte_t pte) +{ + struct mm_struct *mm = vma->vm_mm; + + /* +* To avoid NMMU hang while relaxing access we need to flush the tlb before +* we set the new value. +*/ + if (is_pte_rw_upgrade(pte_val(old_pte), pte_val(pte)) && + (atomic_read(>context.copros) > 0)) + flush_hugetlb_page(vma, addr); + + set_huge_pte_at(vma->vm_mm, addr, ptep, pte); +} diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 8cf035e68378..39d33a3d0dc6 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -912,3 +912,32 @@ int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, return 1; } + +#ifdef CONFIG_PPC_BOOK3S_64 +pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + unsigned long pte_val; + /* +* Clear the _PAGE_PRESENT so that no hardware parallel update is +* possible. Also keep the pte_present true so that we don't take +* wrong fault. +*/ + pte_val = pte_update(vma->vm_mm, addr, ptep, +_PAGE_PRESENT, _PAGE_INVALID, 1); + + return __pte(pte_val); +} +EXPORT_SYMBOL(huge_ptep_modify_prot_start); + +void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, + pte_t *ptep, pte_t old_pte, pte_t pte) +{ + + if (radix_enabled()) + return radix__huge_ptep_modify_prot_commit(vma, addr, ptep, + old_pte, pte); + set_huge_pte_at(vma->vm_mm, addr, ptep, pte); +} +EXPORT_SYMBOL(huge_ptep_modify_prot_commit); +#endif -- 2.19.1
[PATCH V2 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update
Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 18 ++ mm/hugetlb.c| 8 +--- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 087fd5f48c91..e2a3b0c854eb 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -543,6 +543,24 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr set_huge_pte_at(mm, addr, ptep, pte); } #endif + +#ifndef huge_ptep_modify_prot_start +static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); +} +#endif + +#ifndef huge_ptep_modify_prot_commit +static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t old_pte, pte_t pte) +{ + set_huge_pte_at(vma->vm_mm, addr, ptep, pte); +} +#endif + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; #define alloc_huge_page(v, a, r) NULL diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7f2a28ab46d5..e5f5cda14f28 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4388,10 +4388,12 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, continue; } if (!huge_pte_none(pte)) { - pte = huge_ptep_get_and_clear(mm, address, ptep); - pte = pte_mkhuge(huge_pte_modify(pte, newprot)); + pte_t old_pte; + + old_pte = huge_ptep_modify_prot_start(vma, address, ptep); + pte = pte_mkhuge(huge_pte_modify(old_pte, newprot)); pte = arch_make_huge_pte(pte, vma, NULL, 0); - set_huge_pte_at(mm, address, ptep, pte); + huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; } spin_unlock(ptl); -- 2.19.1
[PATCH V2 3/5] arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade.
NestMMU requires us to mark the pte invalid and flush the tlb when we do a RW upgrade of pte. We fixed a variant of this in the fault path in commit Fixes: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang") Do the same for mprotect upgrades. Hugetlb is handled in the next patch. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/pgtable.h | 18 + arch/powerpc/include/asm/book3s/64/radix.h | 4 +++ arch/powerpc/mm/pgtable-book3s64.c | 27 arch/powerpc/mm/pgtable-radix.c | 18 + 4 files changed, 67 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 2e6ada28da64..92eaea164700 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1314,6 +1314,24 @@ static inline int pud_pfn(pud_t pud) BUILD_BUG(); return 0; } +#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION +pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *); +void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, +pte_t *, pte_t, pte_t); + +/* + * Returns true for a R -> RW upgrade of pte + */ +static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_val) +{ + if (!(old_val & _PAGE_READ)) + return false; + + if ((!(old_val & _PAGE_WRITE)) && (new_val & _PAGE_WRITE)) + return true; + + return false; +} #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */ diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h index 7d1a3d1543fc..5ab134eeed20 100644 --- a/arch/powerpc/include/asm/book3s/64/radix.h +++ b/arch/powerpc/include/asm/book3s/64/radix.h @@ -127,6 +127,10 @@ extern void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep pte_t entry, unsigned long address, int psize); +extern void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t old_pte, pte_t pte); + static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr, unsigned long set) { diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c index 9f93c9f985c5..3d126353b11e 100644 --- a/arch/powerpc/mm/pgtable-book3s64.c +++ b/arch/powerpc/mm/pgtable-book3s64.c @@ -482,3 +482,30 @@ void arch_report_meminfo(struct seq_file *m) atomic_long_read(_pages_count[MMU_PAGE_1G]) << 20); } #endif /* CONFIG_PROC_FS */ + +pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, +pte_t *ptep) +{ + unsigned long pte_val; + + /* +* Clear the _PAGE_PRESENT so that no hardware parallel update is +* possible. Also keep the pte_present true so that we don't take +* wrong fault. +*/ + pte_val = pte_update(vma->vm_mm, addr, ptep, _PAGE_PRESENT, _PAGE_INVALID, 0); + + return __pte(pte_val); + +} +EXPORT_SYMBOL(ptep_modify_prot_start); + +void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, +pte_t *ptep, pte_t old_pte, pte_t pte) +{ + if (radix_enabled()) + return radix__ptep_modify_prot_commit(vma, addr, + ptep, old_pte, pte); + set_pte_at(vma->vm_mm, addr, ptep, pte); +} +EXPORT_SYMBOL(ptep_modify_prot_commit); diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c index 931156069a81..14938186df5b 100644 --- a/arch/powerpc/mm/pgtable-radix.c +++ b/arch/powerpc/mm/pgtable-radix.c @@ -1063,3 +1063,21 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep, } /* See ptesync comment in radix__set_pte_at */ } + +void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t old_pte, pte_t pte) +{ + struct mm_struct *mm = vma->vm_mm; + + /* +* To avoid NMMU hang while relaxing access we need to flush the tlb before +* we set the new value. We need to do this only for radix, because hash +* translation does flush when updating the linux pte. +*/ + if (is_pte_rw_upgrade(pte_val(old_pte), pte_val(pte)) && + (atomic_read(>context.copros) > 0)) + flush_tlb_page(vma, addr); + + set_pte_at(mm, addr, ptep, pte); +} -- 2.19.1
[PATCH V2 2/5] mm: update ptep_modify_prot_commit to take old pte value as arg
Architectures like ppc64 requires to do a conditional tlb flush based on the old and new value of pte. Enable that by passing old pte value as the arg. Signed-off-by: Aneesh Kumar K.V --- arch/s390/include/asm/pgtable.h | 3 ++- arch/s390/mm/pgtable.c | 2 +- arch/x86/include/asm/paravirt.h | 2 +- fs/proc/task_mmu.c | 8 +--- include/asm-generic/pgtable.h | 2 +- mm/memory.c | 8 mm/mprotect.c | 6 +++--- 7 files changed, 17 insertions(+), 14 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 5d730199e37b..76dc344edb8c 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1070,7 +1070,8 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *); -void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, pte_t *, pte_t); +void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, +pte_t *, pte_t, pte_t); #define __HAVE_ARCH_PTEP_CLEAR_FLUSH static inline pte_t ptep_clear_flush(struct vm_area_struct *vma, diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 29c0a21cd34a..b283b92722cc 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -322,7 +322,7 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, EXPORT_SYMBOL(ptep_modify_prot_start); void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, -pte_t *ptep, pte_t pte) +pte_t *ptep, pte_t old_pte, pte_t pte) { pgste_t pgste; struct mm_struct *mm = vma->vm_mm; diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 1154f154025d..0d75a4f60500 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -429,7 +429,7 @@ static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned } static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, - pte_t *ptep, pte_t pte) + pte_t *ptep, pte_t old_pte, pte_t pte) { struct mm_struct *mm = vma->vm_mm; diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 9952d7185170..8d62891d38a8 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -940,10 +940,12 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma, pte_t ptent = *pte; if (pte_present(ptent)) { - ptent = ptep_modify_prot_start(vma, addr, pte); - ptent = pte_wrprotect(ptent); + pte_t old_pte; + + old_pte = ptep_modify_prot_start(vma, addr, pte); + ptent = pte_wrprotect(old_pte); ptent = pte_clear_soft_dirty(ptent); - ptep_modify_prot_commit(vma, addr, pte, ptent); + ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); } else if (is_swap_pte(ptent)) { ptent = pte_swp_clear_soft_dirty(ptent); set_pte_at(vma->vm_mm, addr, pte, ptent); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index c9897dcc46c4..37039e918f17 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -619,7 +619,7 @@ static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, */ static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, - pte_t *ptep, pte_t pte) + pte_t *ptep, pte_t old_pte, pte_t pte) { __ptep_modify_prot_commit(vma->vm_mm, addr, ptep, pte); } diff --git a/mm/memory.c b/mm/memory.c index d36b0eaa7862..4f3ddaedc764 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3568,7 +3568,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) int last_cpupid; int target_nid; bool migrated = false; - pte_t pte; + pte_t pte, old_pte; bool was_writable = pte_savedwrite(vmf->orig_pte); int flags = 0; @@ -3588,12 +3588,12 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) * Make it present again, Depending on how arch implementes non * accessible ptes, some can allow access by kernel mode. */ - pte = ptep_modify_prot_start(vma, vmf->address, vmf->pte); - pte = pte_modify(pte, vma->vm_page_prot); + old_pte = ptep_modify_prot_start(vma, vmf->address, vmf->pte); + pte = pte_modify(old_pte, vma->vm_page_prot); pte = pte_mkyoung(pte); if (was_writable) pte = pte_mkwrite(pte); - ptep_modify_prot_commit(vma, vmf->address, vmf->pte, pte);
[PATCH V2 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg
Some architecture may want to call flush_tlb_range from these helpers. Signed-off-by: Aneesh Kumar K.V --- arch/s390/include/asm/pgtable.h | 4 ++-- arch/s390/mm/pgtable.c | 6 -- arch/x86/include/asm/paravirt.h | 7 +-- fs/proc/task_mmu.c | 4 ++-- include/asm-generic/pgtable.h | 8 mm/memory.c | 4 ++-- mm/mprotect.c | 4 ++-- 7 files changed, 21 insertions(+), 16 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 063732414dfb..5d730199e37b 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1069,8 +1069,8 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, } #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION -pte_t ptep_modify_prot_start(struct mm_struct *, unsigned long, pte_t *); -void ptep_modify_prot_commit(struct mm_struct *, unsigned long, pte_t *, pte_t); +pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *); +void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, pte_t *, pte_t); #define __HAVE_ARCH_PTEP_CLEAR_FLUSH static inline pte_t ptep_clear_flush(struct vm_area_struct *vma, diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index f2cc7da473e4..29c0a21cd34a 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -301,12 +301,13 @@ pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr, } EXPORT_SYMBOL(ptep_xchg_lazy); -pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, +pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { pgste_t pgste; pte_t old; int nodat; + struct mm_struct *mm = vma->vm_mm; preempt_disable(); pgste = ptep_xchg_start(mm, addr, ptep); @@ -320,10 +321,11 @@ pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, } EXPORT_SYMBOL(ptep_modify_prot_start); -void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr, +void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t pte) { pgste_t pgste; + struct mm_struct *mm = vma->vm_mm; if (!MACHINE_HAS_NX) pte_val(pte) &= ~_PAGE_NOEXEC; diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 4bf42f9e4eea..1154f154025d 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -417,19 +417,22 @@ static inline pgdval_t pgd_val(pgd_t pgd) } #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION -static inline pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, +static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { pteval_t ret; + struct mm_struct *mm = vma->vm_mm; ret = PVOP_CALL3(pteval_t, mmu.ptep_modify_prot_start, mm, addr, ptep); return (pte_t) { .pte = ret }; } -static inline void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr, +static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t pte) { + struct mm_struct *mm = vma->vm_mm; + if (sizeof(pteval_t) > sizeof(long)) /* 5 arg words */ pv_ops.mmu.ptep_modify_prot_commit(mm, addr, ptep, pte); diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 47c3764c469b..9952d7185170 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -940,10 +940,10 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma, pte_t ptent = *pte; if (pte_present(ptent)) { - ptent = ptep_modify_prot_start(vma->vm_mm, addr, pte); + ptent = ptep_modify_prot_start(vma, addr, pte); ptent = pte_wrprotect(ptent); ptent = pte_clear_soft_dirty(ptent); - ptep_modify_prot_commit(vma->vm_mm, addr, pte, ptent); + ptep_modify_prot_commit(vma, addr, pte, ptent); } else if (is_swap_pte(ptent)) { ptent = pte_swp_clear_soft_dirty(ptent); set_pte_at(vma->vm_mm, addr, pte, ptent); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 359fb935ded6..c9897dcc46c4 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -606,22 +606,22 @@ static inline void __ptep_modify_prot_commit(struct mm_struct *mm, * queue the update to be done at some later time. The update must be * actually committed before the pte lock is released, however. */ -static inline pte_t ptep_modify_prot_start(struct mm_struct *mm, +static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
[PATCH V2 0/5] NestMMU pte upgrade workaround for mprotect
We can upgrade pte access (R -> RW transition) via mprotect. We need to make sure we follow the recommended pte update sequence as outlined in commit: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang") for such updates. This patch series do that. Changes from V1: * Restrict ths only for R->RW upgrade. We don't need to do this for Autonuma * Restrict this only for radix translation mode. Aneesh Kumar K.V (5): mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg mm: update ptep_modify_prot_commit to take old pte value as arg arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade. mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgrade arch/powerpc/include/asm/book3s/64/hugetlb.h | 12 arch/powerpc/include/asm/book3s/64/pgtable.h | 18 arch/powerpc/include/asm/book3s/64/radix.h | 4 +++ arch/powerpc/mm/hugetlbpage-radix.c | 17 arch/powerpc/mm/hugetlbpage.c| 29 arch/powerpc/mm/pgtable-book3s64.c | 27 ++ arch/powerpc/mm/pgtable-radix.c | 18 arch/s390/include/asm/pgtable.h | 5 ++-- arch/s390/mm/pgtable.c | 8 -- arch/x86/include/asm/paravirt.h | 9 -- fs/proc/task_mmu.c | 8 -- include/asm-generic/pgtable.h| 10 +++ include/linux/hugetlb.h | 18 mm/hugetlb.c | 8 -- mm/memory.c | 8 +++--- mm/mprotect.c| 6 ++-- 16 files changed, 179 insertions(+), 26 deletions(-) -- 2.19.1
[PATCH] selftests/powerpc: New TM signal self test
A new self test that forces MSR[TS] to be set without calling any TM instruction. This test also tries to cause a page fault at a signal handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG when tm_recheckpoint() is called. This test is not deterministic since it is hard to guarantee that the page access will cause a page fault. Tests have shown that the bug could be exposed with few interactions in a buggy kernel. This test is configured to loop 5000x, having a good chance to hit the kernel issue in just one run. This self test takes less than two seconds to run. This test uses set/getcontext because the kernel will recheckpoint zeroed structures, causing the test to segfault, which is undesired because the test needs to rerun, so, there is a signal handler for SIGSEGV which will restart the test. Signed-off-by: Breno Leitao --- tools/testing/selftests/powerpc/tm/.gitignore | 1 + tools/testing/selftests/powerpc/tm/Makefile | 3 +- .../powerpc/tm/tm-signal-force-msr.c | 115 ++ 3 files changed, 118 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore index c3ee8393dae8..89679822ebc9 100644 --- a/tools/testing/selftests/powerpc/tm/.gitignore +++ b/tools/testing/selftests/powerpc/tm/.gitignore @@ -11,6 +11,7 @@ tm-signal-context-chk-fpu tm-signal-context-chk-gpr tm-signal-context-chk-vmx tm-signal-context-chk-vsx +tm-signal-force-msr tm-vmx-unavail tm-unavailable tm-trap diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 9fc2cf6fbc92..58a2ebd13958 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \ - $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn + $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr top_srcdir = ../../../../.. include ../../lib.mk @@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64 +$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c new file mode 100644 index ..4441d61c2328 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include + +#include "tm.h" +#include "utils.h" + +#define __MASK(X) (1UL<<(X)) +#define MSR_TS_S_LG 33 /* Trans Mem state: Suspended */ +#define MSR_TM __MASK(MSR_TM_LG) /* Transactional Mem Available */ +#define MSR_TS_S__MASK(MSR_TS_S_LG) /* Transaction Suspended */ + +#define COUNT_MAX 5000/* Number of interactions */ + +/* Setting contexts because the test will crash and we want to recover */ +ucontext_t init_context, main_context; + +static int count, first_time; + +void trap_signal_handler(int signo, siginfo_t *si, void *uc) +{ + ucontext_t *ucp = uc; + + /* +* Allocating memory in a signal handler, and never freeing it on +* purpose, forcing the heap increase, so, the memory leak is what +* we want here. +*/ + ucp->uc_link = malloc(sizeof(ucontext_t)); + memcpy(>uc_link, >uc_mcontext, sizeof(ucp->uc_mcontext)); + + /* Forcing to enable MSR[TM] */ + ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; + + /* +* A fork inside a signal handler seems to be more efficient than a +* fork() prior to the signal being raised. +*/ + if (fork() == 0) { + /* +* Both child and parent will return, but, child returns +* with count set so it will exit in the next segfault. +* Parent will continue to loop. +*/ + count = COUNT_MAX; + } + + /* +* If the change above does not hit the bug, it will cause a +* segmentation fault, since the ck structures are NULL. +*/ +} + +void seg_signal_handler(int signo, siginfo_t *si, void *uc) +{ +
[PATCH v10 9/9] powerpc: clean stack pointers naming
Some stack pointers used to also be thread_info pointers and were called tp. Now that they are only stack pointers, rename them sp. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 17 +++-- arch/powerpc/kernel/setup_64.c | 20 ++-- 2 files changed, 17 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 62cfccf4af89..754f0efc507b 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs) void do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); - void *curtp, *irqtp, *sirqtp; + void *cursp, *irqsp, *sirqsp; /* Switch to the irq stack to handle this */ - curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); - irqtp = hardirq_ctx[raw_smp_processor_id()]; - sirqtp = softirq_ctx[raw_smp_processor_id()]; + cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); + irqsp = hardirq_ctx[raw_smp_processor_id()]; + sirqsp = softirq_ctx[raw_smp_processor_id()]; /* Already there ? */ - if (unlikely(curtp == irqtp || curtp == sirqtp)) { + if (unlikely(cursp == irqsp || cursp == sirqsp)) { __do_irq(regs); set_irq_regs(old_regs); return; } /* Switch stack and call */ - call_do_irq(regs, irqtp); + call_do_irq(regs, irqsp); set_irq_regs(old_regs); } @@ -732,10 +732,7 @@ void irq_ctx_init(void) void do_softirq_own_stack(void) { - void *irqtp; - - irqtp = softirq_ctx[smp_processor_id()]; - call_do_softirq(irqtp); + call_do_softirq(softirq_ctx[smp_processor_id()]); } irq_hw_number_t virq_to_hw(unsigned int virq) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 0b227d0891ec..49765ccbc8c0 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -718,22 +718,22 @@ void __init emergency_stack_init(void) limit = min(ppc64_bolted_size(), ppc64_rma_size); for_each_possible_cpu(i) { - void *ti; + void *sp; - ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); - paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE; + sp = alloc_stack(limit, i); + memset(sp, 0, THREAD_SIZE); + paca_ptrs[i]->emergency_sp = sp + THREAD_SIZE; #ifdef CONFIG_PPC_BOOK3S_64 /* emergency stack for NMI exception handling. */ - ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); - paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE; + sp = alloc_stack(limit, i); + memset(sp, 0, THREAD_SIZE); + paca_ptrs[i]->nmi_emergency_sp = sp + THREAD_SIZE; /* emergency stack for machine check exception handling. */ - ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); - paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE; + sp = alloc_stack(limit, i); + memset(sp, 0, THREAD_SIZE); + paca_ptrs[i]->mc_emergency_sp = sp + THREAD_SIZE; #endif } } -- 2.13.3
[PATCH v10 8/9] powerpc/64: Remove CURRENT_THREAD_INFO
Now that current_thread_info is located at the beginning of 'current' task struct, CURRENT_THREAD_INFO macro is not really needed any more. This patch replaces it by loads of the value at PACACURRENT(r13). Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/exception-64s.h | 4 ++-- arch/powerpc/include/asm/thread_info.h | 4 arch/powerpc/kernel/entry_64.S | 10 +- arch/powerpc/kernel/exceptions-64e.S | 2 +- arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/idle_book3e.S | 2 +- arch/powerpc/kernel/idle_power4.S | 2 +- arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 6 +++--- 8 files changed, 14 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 3b4767ed3ec5..dd6a5ae7a769 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -671,7 +671,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define RUNLATCH_ON\ BEGIN_FTR_SECTION \ - CURRENT_THREAD_INFO(r3, r1);\ + ld r3, PACACURRENT(r13); \ ld r4,TI_LOCAL_FLAGS(r3); \ andi. r0,r4,_TLF_RUNLATCH;\ beqlppc64_runlatch_on_trampoline; \ @@ -721,7 +721,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) #ifdef CONFIG_PPC_970_NAP #define FINISH_NAP \ BEGIN_FTR_SECTION \ - CURRENT_THREAD_INFO(r11, r1); \ + ld r11, PACACURRENT(r13); \ ld r9,TI_LOCAL_FLAGS(r11); \ andi. r10,r9,_TLF_NAPPING;\ bnelpower4_fixup_nap; \ diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index c959b8d66cac..8e1d0195ac36 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -17,10 +17,6 @@ #define THREAD_SIZE(1 << THREAD_SHIFT) -#ifdef CONFIG_PPC64 -#define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(ld dest, PACACURRENT(r13)) -#endif - #ifndef __ASSEMBLY__ #include #include diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 03cbf409c3f8..b017bd3da1ed 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -158,7 +158,7 @@ system_call:/* label this so stack traces look sane */ li r10,IRQS_ENABLED std r10,SOFTE(r1) - CURRENT_THREAD_INFO(r11, r1) + ld r11, PACACURRENT(r13) ld r10,TI_FLAGS(r11) andi. r11,r10,_TIF_SYSCALL_DOTRACE bne .Lsyscall_dotrace /* does not return */ @@ -205,7 +205,7 @@ system_call:/* label this so stack traces look sane */ ld r3,RESULT(r1) #endif - CURRENT_THREAD_INFO(r12, r1) + ld r12, PACACURRENT(r13) ld r8,_MSR(r1) #ifdef CONFIG_PPC_BOOK3S @@ -336,7 +336,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* Repopulate r9 and r10 for the syscall path */ addir9,r1,STACK_FRAME_OVERHEAD - CURRENT_THREAD_INFO(r10, r1) + ld r10, PACACURRENT(r13) ld r10,TI_FLAGS(r10) cmpldi r0,NR_syscalls @@ -734,7 +734,7 @@ _GLOBAL(ret_from_except_lite) mtmsrd r10,1 /* Update machine state */ #endif /* CONFIG_PPC_BOOK3E */ - CURRENT_THREAD_INFO(r9, r1) + ld r9, PACACURRENT(r13) ld r3,_MSR(r1) #ifdef CONFIG_PPC_BOOK3E ld r10,PACACURRENT(r13) @@ -848,7 +848,7 @@ resume_kernel: 1: bl preempt_schedule_irq /* Re-test flags and eventually loop */ - CURRENT_THREAD_INFO(r9, r1) + ld r9, PACACURRENT(r13) ld r4,TI_FLAGS(r9) andi. r0,r4,_TIF_NEED_RESCHED bne 1b diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 231d066b4a3d..dfafcd0af009 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -469,7 +469,7 @@ exc_##n##_bad_stack: \ * interrupts happen before the wait instruction. */ #define CHECK_NAPPING() \ - CURRENT_THREAD_INFO(r11, r1); \ + ld r11, PACACURRENT(r13); \ ld r10,TI_LOCAL_FLAGS(r11);\ andi. r9,r10,_TLF_NAPPING;\ beq+1f; \ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 89d32bb79d5e..1cbe1a78df57 100644
[PATCH v10 7/9] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
Now that thread_info is similar to task_struct, it's address is in r2 so CURRENT_THREAD_INFO() macro is useless. This patch removes it. At the same time, as the 'cpu' field is not anymore in thread_info, this patch renames it to TASK_CPU. Signed-off-by: Christophe Leroy --- arch/powerpc/Makefile | 2 +- arch/powerpc/include/asm/thread_info.h | 2 -- arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kernel/entry_32.S | 43 -- arch/powerpc/kernel/epapr_hcalls.S | 5 ++-- arch/powerpc/kernel/head_fsl_booke.S | 5 ++-- arch/powerpc/kernel/idle_6xx.S | 8 +++ arch/powerpc/kernel/idle_e500.S| 8 +++ arch/powerpc/kernel/misc_32.S | 3 +-- arch/powerpc/mm/hash_low_32.S | 14 --- arch/powerpc/sysdev/6xx-suspend.S | 5 ++-- 11 files changed, 35 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index e04e988c86b1..d43c4036bb74 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -425,7 +425,7 @@ ifdef CONFIG_SMP prepare: task_cpu_prepare task_cpu_prepare: prepare0 - $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") print $$3;}' include/generated/asm-offsets.h)) + $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TASK_CPU") print $$3;}' include/generated/asm-offsets.h)) endif # Check toolchain versions: diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index d91523c2c7d8..c959b8d66cac 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -19,8 +19,6 @@ #ifdef CONFIG_PPC64 #define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(ld dest, PACACURRENT(r13)) -#else -#define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(mr dest, r2) #endif #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 94ac190a0b16..03439785c2ea 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -96,7 +96,7 @@ int main(void) #endif /* CONFIG_PPC64 */ OFFSET(TASK_STACK, task_struct, stack); #ifdef CONFIG_SMP - OFFSET(TI_CPU, task_struct, cpu); + OFFSET(TASK_CPU, task_struct, cpu); #endif #ifdef CONFIG_LIVEPATCH diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index bd3b146e18a3..d0c546ce387e 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -168,8 +168,7 @@ transfer_to_handler: tophys(r11,r11) addir11,r11,global_dbcr0@l #ifdef CONFIG_SMP - CURRENT_THREAD_INFO(r9, r1) - lwz r9,TI_CPU(r9) + lwz r9,TASK_CPU(r2) slwir9,r9,3 add r11,r11,r9 #endif @@ -180,8 +179,7 @@ transfer_to_handler: stw r12,4(r11) #endif #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE - CURRENT_THREAD_INFO(r9, r1) - tophys(r9, r9) + tophys(r9, r2) ACCOUNT_CPU_USER_ENTRY(r9, r11, r12) #endif @@ -195,8 +193,7 @@ transfer_to_handler: ble-stack_ovf /* then the kernel stack overflowed */ 5: #if defined(CONFIG_6xx) || defined(CONFIG_E500) - CURRENT_THREAD_INFO(r9, r1) - tophys(r9,r9) /* check local flags */ + tophys(r9,r2) /* check local flags */ lwz r12,TI_LOCAL_FLAGS(r9) mtcrf 0x01,r12 bt- 31-TLF_NAPPING,4f @@ -345,8 +342,7 @@ _GLOBAL(DoSyscall) mtmsr r11 1: #endif /* CONFIG_TRACE_IRQFLAGS */ - CURRENT_THREAD_INFO(r10, r1) - lwz r11,TI_FLAGS(r10) + lwz r11,TI_FLAGS(r2) andi. r11,r11,_TIF_SYSCALL_DOTRACE bne-syscall_dotrace syscall_dotrace_cont: @@ -379,13 +375,12 @@ ret_from_syscall: lwz r3,GPR3(r1) #endif mr r6,r3 - CURRENT_THREAD_INFO(r12, r1) /* disable interrupts so current_thread_info()->flags can't change */ LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */ /* Note: We don't bother telling lockdep about it */ SYNC MTMSRD(r10) - lwz r9,TI_FLAGS(r12) + lwz r9,TI_FLAGS(r2) li r8,-MAX_ERRNO andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) bne-syscall_exit_work @@ -432,8 +427,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX) #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE andi. r4,r8,MSR_PR beq 3f - CURRENT_THREAD_INFO(r4, r1) - ACCOUNT_CPU_USER_EXIT(r4, r5, r7) + ACCOUNT_CPU_USER_EXIT(r2, r5, r7) 3: #endif lwz r4,_LINK(r1) @@ -526,7 +520,7 @@ syscall_exit_work: /* Clear per-syscall TIF flags if any are set. */ li r11,_TIF_PERSYSCALL_MASK - addir12,r12,TI_FLAGS + addir12,r2,TI_FLAGS 3: lwarx r8,0,r12
[PATCH v10 6/9] powerpc: 'current_set' is now a table of task_struct pointers
The table of pointers 'current_set' has been used for retrieving the stack and current. They used to be thread_info pointers as they were pointing to the stack and current was taken from the 'task' field of the thread_info. Now, the pointers of 'current_set' table are now both pointers to task_struct and pointers to thread_info. As they are used to get current, and the stack pointer is retrieved from current's stack field, this patch changes their type to task_struct, and renames secondary_ti to secondary_current. Reviewed-by: Nicholas Piggin Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/asm-prototypes.h | 4 ++-- arch/powerpc/kernel/head_32.S | 6 +++--- arch/powerpc/kernel/head_44x.S| 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 4 ++-- arch/powerpc/kernel/smp.c | 10 -- 5 files changed, 13 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index ec691d489656..b1b999b22a12 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -23,8 +23,8 @@ #include /* SMP */ -extern struct thread_info *current_set[NR_CPUS]; -extern struct thread_info *secondary_ti; +extern struct task_struct *current_set[NR_CPUS]; +extern struct task_struct *secondary_current; void start_secondary(void *unused); /* kexec */ diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 44dfd73b2a62..ba0341bd5a00 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -842,9 +842,9 @@ __secondary_start: #endif /* CONFIG_6xx */ /* get current's stack and current */ - lis r1,secondary_ti@ha - tophys(r1,r1) - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + tophys(r2,r2) + lwz r2,secondary_current@l(r2) tophys(r1,r2) lwz r1,TASK_STACK(r1) diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S index 2c7e90f36358..48e4de4dfd0c 100644 --- a/arch/powerpc/kernel/head_44x.S +++ b/arch/powerpc/kernel/head_44x.S @@ -1021,8 +1021,8 @@ _GLOBAL(start_secondary_47x) /* Now we can get our task struct and real stack pointer */ /* Get current's stack and current */ - lis r1,secondary_ti@ha - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + lwz r2,secondary_current@l(r2) lwz r1,TASK_STACK(r2) /* Current stack pointer */ diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index b8a2b789677e..0d27bfff52dd 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -1076,8 +1076,8 @@ __secondary_start: bl call_setup_cpu /* get current's stack and current */ - lis r1,secondary_ti@ha - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + lwz r2,secondary_current@l(r2) lwz r1,TASK_STACK(r2) /* stack */ diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index aa4517686f90..a41fa8924004 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -76,7 +76,7 @@ static DEFINE_PER_CPU(int, cpu_state) = { 0 }; #endif -struct thread_info *secondary_ti; +struct task_struct *secondary_current; bool has_big_cores; DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map); @@ -664,7 +664,7 @@ void smp_send_stop(void) } #endif /* CONFIG_NMI_IPI */ -struct thread_info *current_set[NR_CPUS]; +struct task_struct *current_set[NR_CPUS]; static void smp_store_cpu_info(int id) { @@ -929,7 +929,7 @@ void smp_prepare_boot_cpu(void) paca_ptrs[boot_cpuid]->__current = current; #endif set_numa_node(numa_cpu_lookup_table[boot_cpuid]); - current_set[boot_cpuid] = task_thread_info(current); + current_set[boot_cpuid] = current; } #ifdef CONFIG_HOTPLUG_CPU @@ -1014,15 +1014,13 @@ static bool secondaries_inhibited(void) static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle) { - struct thread_info *ti = task_thread_info(idle); - #ifdef CONFIG_PPC64 paca_ptrs[cpu]->__current = idle; paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) + THREAD_SIZE - STACK_FRAME_OVERHEAD; #endif idle->cpu = cpu; - secondary_ti = current_set[cpu] = ti; + secondary_current = current_set[cpu] = idle; } int __cpu_up(unsigned int cpu, struct task_struct *tidle) -- 2.13.3
[PATCH v10 5/9] powerpc: regain entire stack space
thread_info is not anymore in the stack, so the entire stack can now be used. There is also no risk anymore of corrupting task_cpu(p) with a stack overflow so the patch removes the test. When doing this, an explicit test for NULL stack pointer is needed in validate_sp() as it is not anymore implicitely covered by the sizeof(thread_info) gap. In the meantime, with the previous patch all pointers to the stacks are not anymore pointers to thread_info so this patch changes them to void* Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/irq.h | 10 +- arch/powerpc/include/asm/processor.h | 3 +-- arch/powerpc/kernel/asm-offsets.c| 1 - arch/powerpc/kernel/entry_32.S | 14 -- arch/powerpc/kernel/irq.c| 19 +-- arch/powerpc/kernel/misc_32.S| 6 ++ arch/powerpc/kernel/process.c| 32 +--- arch/powerpc/kernel/setup_64.c | 8 8 files changed, 38 insertions(+), 55 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 2efbae8d93be..966ddd4d2414 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -48,9 +48,9 @@ struct pt_regs; * Per-cpu stacks for handling critical, debug and machine check * level interrupts. */ -extern struct thread_info *critirq_ctx[NR_CPUS]; -extern struct thread_info *dbgirq_ctx[NR_CPUS]; -extern struct thread_info *mcheckirq_ctx[NR_CPUS]; +extern void *critirq_ctx[NR_CPUS]; +extern void *dbgirq_ctx[NR_CPUS]; +extern void *mcheckirq_ctx[NR_CPUS]; extern void exc_lvl_ctx_init(void); #else #define exc_lvl_ctx_init() @@ -59,8 +59,8 @@ extern void exc_lvl_ctx_init(void); /* * Per-cpu stacks for handling hard and soft interrupts. */ -extern struct thread_info *hardirq_ctx[NR_CPUS]; -extern struct thread_info *softirq_ctx[NR_CPUS]; +extern void *hardirq_ctx[NR_CPUS]; +extern void *softirq_ctx[NR_CPUS]; extern void irq_ctx_init(void); void call_do_softirq(void *sp); diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 15acb282a876..8179b64871ed 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -325,8 +325,7 @@ struct thread_struct { #define ARCH_MIN_TASKALIGN 16 #define INIT_SP(sizeof(init_stack) + (unsigned long) _stack) -#define INIT_SP_LIMIT \ - (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack) +#define INIT_SP_LIMIT ((unsigned long)_stack) #ifdef CONFIG_SPE #define SPEFSCR_INIT \ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 1fb52206c106..94ac190a0b16 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -92,7 +92,6 @@ int main(void) DEFINE(SIGSEGV, SIGSEGV); DEFINE(NMI_MASK, NMI_MASK); #else - DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16)); OFFSET(KSP_LIMIT, thread_struct, ksp_limit); #endif /* CONFIG_PPC64 */ OFFSET(TASK_STACK, task_struct, stack); diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index fa7a69ffb37a..bd3b146e18a3 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -97,14 +97,11 @@ crit_transfer_to_handler: mfspr r0,SPRN_SRR1 stw r0,_SRR1(r11) - /* set the stack limit to the current stack -* and set the limit to protect the thread_info -* struct -*/ + /* set the stack limit to the current stack */ mfspr r8,SPRN_SPRG_THREAD lwz r0,KSP_LIMIT(r8) stw r0,SAVED_KSP_LIMIT(r11) - rlwimi r0,r1,0,0,(31-THREAD_SHIFT) + rlwinm r0,r1,0,0,(31 - THREAD_SHIFT) stw r0,KSP_LIMIT(r8) /* fall through */ #endif @@ -121,14 +118,11 @@ crit_transfer_to_handler: mfspr r0,SPRN_SRR1 stw r0,crit_srr1@l(0) - /* set the stack limit to the current stack -* and set the limit to protect the thread_info -* struct -*/ + /* set the stack limit to the current stack */ mfspr r8,SPRN_SPRG_THREAD lwz r0,KSP_LIMIT(r8) stw r0,saved_ksp_limit@l(0) - rlwimi r0,r1,0,0,(31-THREAD_SHIFT) + rlwinm r0,r1,0,0,(31 - THREAD_SHIFT) stw r0,KSP_LIMIT(r8) /* fall through */ #endif diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 3fdb6b6973cf..62cfccf4af89 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -618,9 +618,8 @@ static inline void check_stack_overflow(void) sp = current_stack_pointer() & (THREAD_SIZE-1); /* check for stack overflow: is there less than 2KB free? */ - if (unlikely(sp < (sizeof(struct thread_info) + 2048))) { - pr_err("do_IRQ: stack overflow: %ld\n", - sp - sizeof(struct thread_info)); +
[PATCH v10 4/9] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 7 + arch/powerpc/include/asm/ptrace.h | 2 +- arch/powerpc/include/asm/smp.h | 17 +++- arch/powerpc/include/asm/thread_info.h | 17 ++-- arch/powerpc/kernel/asm-offsets.c | 7 +++-- arch/powerpc/kernel/entry_32.S | 9 +++ arch/powerpc/kernel/exceptions-64e.S | 11 arch/powerpc/kernel/head_32.S | 6 ++--- arch/powerpc/kernel/head_44x.S | 4 +-- arch/powerpc/kernel/head_64.S | 1 + arch/powerpc/kernel/head_booke.h | 8 +- arch/powerpc/kernel/head_fsl_booke.S | 7 +++-- arch/powerpc/kernel/irq.c | 47 +- arch/powerpc/kernel/kgdb.c | 28 arch/powerpc/kernel/machine_kexec_64.c | 6 ++--- arch/powerpc/kernel/setup_64.c | 21 --- arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/net/bpf_jit32.h | 5 ++-- 19 files changed, 52 insertions(+), 154 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8be31261aec8..fd634a293d7f 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -238,6 +238,7 @@ config PPC select RTC_LIB select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE + select THREAD_INFO_IN_TASK select VIRT_TO_BUS if !PPC64 # # Please keep this list sorted alphabetically. diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 0bff8bd82ed5..e04e988c86b1 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -421,6 +421,13 @@ else endif endif +ifdef CONFIG_SMP +prepare: task_cpu_prepare + +task_cpu_prepare: prepare0 + $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") print $$3;}' include/generated/asm-offsets.h)) +endif + # Check toolchain versions: # - gcc-4.6 is the minimum kernel-wide version so nothing required. checkbin: diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 0b8a735b6d85..64271e562fed 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -157,7 +157,7 @@ extern int ptrace_put_reg(struct task_struct *task, int regno, unsigned long data); #define current_pt_regs() \ - ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) - 1) + ((struct pt_regs *)((unsigned long)task_stack_page(current) + THREAD_SIZE) - 1) /* * We use the least-significant bit of the trap field to indicate * whether we have saved the full set of registers, or only a diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 41695745032c..0de717e16dd6 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -83,7 +83,22 @@ int is_cpu_dead(unsigned int cpu); /* 32-bit */ extern int smp_hw_index[]; -#define raw_smp_processor_id() (current_thread_info()->cpu) +/* + * This is particularly ugly: it appears we can't actually get the definition + * of task_struct here, but we need access to the CPU this task is running on. + * Instead of using task_struct we're using _TASK_CPU which is extracted from + * asm-offsets.h by kbuild to get the current processor ID. + * + * This also needs to be safeguarded when building asm-offsets.s because at + * that time _TASK_CPU is not defined yet. It could have been guarded by + * _TASK_CPU itself, but we want the build to fail if _TASK_CPU is missing + * when building something else than asm-offsets.s + */ +#ifdef GENERATING_ASM_OFFSETS +#define raw_smp_processor_id() (0) +#else +#define raw_smp_processor_id() (*(unsigned int *)((void *)current + _TASK_CPU)) +#endif #define hard_smp_processor_id()(smp_hw_index[smp_processor_id()]) static inline int
[PATCH v10 3/9] powerpc: Prepare for moving thread_info into task_struct
This patch cleans the powerpc kernel before activating CONFIG_THREAD_INFO_IN_TASK: - The purpose of the pointer given to call_do_softirq() and call_do_irq() is to point the new stack ==> change it to void* and rename it 'sp' - Don't use CURRENT_THREAD_INFO() to locate the stack. - Fix a few comments. - Replace current_thread_info()->task by current - Remove unnecessary casts to thread_info, as they'll become invalid once thread_info is not in stack anymore. - Rename THREAD_INFO to TASK_STASK: as it is in fact the offset of the pointer to the stack in task_struct, this pointer will not be impacted by the move of THREAD_INFO. - Makes TASK_STACK available to PPC64. PPC64 will need it to get the stack pointer from current once the thread_info have been moved. - Modifies klp_init_thread_info() to take task_struct pointer argument. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/irq.h | 4 ++-- arch/powerpc/include/asm/livepatch.h | 7 --- arch/powerpc/include/asm/processor.h | 4 ++-- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/kernel/asm-offsets.c| 2 +- arch/powerpc/kernel/entry_32.S | 2 +- arch/powerpc/kernel/entry_64.S | 2 +- arch/powerpc/kernel/head_32.S| 4 ++-- arch/powerpc/kernel/head_40x.S | 4 ++-- arch/powerpc/kernel/head_44x.S | 2 +- arch/powerpc/kernel/head_8xx.S | 2 +- arch/powerpc/kernel/head_booke.h | 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 4 ++-- arch/powerpc/kernel/irq.c| 2 +- arch/powerpc/kernel/misc_32.S| 4 ++-- arch/powerpc/kernel/process.c| 8 arch/powerpc/kernel/setup-common.c | 2 +- arch/powerpc/kernel/setup_32.c | 15 +-- arch/powerpc/kernel/smp.c| 4 +++- 19 files changed, 38 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index ee39ce56b2a2..2efbae8d93be 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS]; extern struct thread_info *softirq_ctx[NR_CPUS]; extern void irq_ctx_init(void); -extern void call_do_softirq(struct thread_info *tp); -extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp); +void call_do_softirq(void *sp); +void call_do_irq(struct pt_regs *regs, void *sp); extern void do_IRQ(struct pt_regs *regs); extern void __init init_IRQ(void); extern void __do_irq(struct pt_regs *regs); diff --git a/arch/powerpc/include/asm/livepatch.h b/arch/powerpc/include/asm/livepatch.h index 47a03b9b528b..8a81d10ccc82 100644 --- a/arch/powerpc/include/asm/livepatch.h +++ b/arch/powerpc/include/asm/livepatch.h @@ -43,13 +43,14 @@ static inline unsigned long klp_get_ftrace_location(unsigned long faddr) return ftrace_location_range(faddr, faddr + 16); } -static inline void klp_init_thread_info(struct thread_info *ti) +static inline void klp_init_thread_info(struct task_struct *p) { + struct thread_info *ti = task_thread_info(p); /* + 1 to account for STACK_END_MAGIC */ - ti->livepatch_sp = (unsigned long *)(ti + 1) + 1; + ti->livepatch_sp = end_of_stack(p) + 1; } #else -static void klp_init_thread_info(struct thread_info *ti) { } +static inline void klp_init_thread_info(struct task_struct *p) { } #endif /* CONFIG_LIVEPATCH */ #endif /* _ASM_POWERPC_LIVEPATCH_H */ diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 692f7383d461..15acb282a876 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -40,7 +40,7 @@ #ifndef __ASSEMBLY__ #include -#include +#include #include #include @@ -326,7 +326,7 @@ struct thread_struct { #define INIT_SP(sizeof(init_stack) + (unsigned long) _stack) #define INIT_SP_LIMIT \ - (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack) + (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack) #ifdef CONFIG_SPE #define SPEFSCR_INIT \ diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index de52c3166ba4..95b68bdf34df 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1060,7 +1060,7 @@ * - SPRG9 debug exception scratch * * All 32-bit: - * - SPRG3 current thread_info pointer + * - SPRG3 current thread_struct physical addr pointer *(virtual on BookE, physical on others) * * 32-bit classic: diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9ffc72ded73a..b2b52e002a76 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -90,10 +90,10 @@ int main(void) DEFINE(SIGSEGV, SIGSEGV); DEFINE(NMI_MASK, NMI_MASK); #else - OFFSET(THREAD_INFO, task_struct, stack); DEFINE(THREAD_INFO_GAP,
[PATCH v10 2/9] powerpc: Only use task_struct 'cpu' field on SMP
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field gets moved into task_struct and only defined when CONFIG_SMP is set. This patch ensures that TI_CPU is only used when CONFIG_SMP is set and that task_struct 'cpu' field is not used directly out of SMP code. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/kernel/head_fsl_booke.S | 2 ++ arch/powerpc/kernel/misc_32.S| 4 arch/powerpc/xmon/xmon.c | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index e2750b856c8f..05b574f416b3 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -243,8 +243,10 @@ set_ivor: li r0,0 stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1) +#ifdef CONFIG_SMP CURRENT_THREAD_INFO(r22, r1) stw r24, TI_CPU(r22) +#endif bl early_init diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 695b24a2d954..2f0fe8bfc078 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll) or r4,r4,r5 mtspr SPRN_HID1,r4 +#ifdef CONFIG_SMP /* Store new HID1 image */ CURRENT_THREAD_INFO(r6, r1) lwz r6,TI_CPU(r6) slwir6,r6,2 +#else + li r6, 0 +#endif addis r6,r6,nap_save_hid1@ha stw r4,nap_save_hid1@l(r6) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index fee96bbabf42..e64f9bb51025 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2995,7 +2995,7 @@ static void show_task(struct task_struct *tsk) printf("%px %016lx %6d %6d %c %2d %s\n", tsk, tsk->thread.ksp, tsk->pid, rcu_dereference(tsk->parent)->pid, - state, task_thread_info(tsk)->cpu, + state, task_cpu(tsk), tsk->comm); } -- 2.13.3
[PATCH v10 1/9] book3s/64: avoid circular header inclusion in mmu-hash.h
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes asm/current.h. This generates a circular dependency. To avoid that, asm/processor.h shall not be included in mmu-hash.h In order to do that, this patch moves into a new header called asm/task_size_user64.h the information from asm/processor.h required by mmu-hash.h Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/processor.h | 34 +- arch/powerpc/include/asm/task_size_user64.h | 42 +++ arch/powerpc/kvm/book3s_hv_hmi.c | 1 + 4 files changed, 45 insertions(+), 34 deletions(-) create mode 100644 arch/powerpc/include/asm/task_size_user64.h diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index 12e522807f9f..b2aba048301e 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -23,7 +23,7 @@ */ #include #include -#include +#include #include /* diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index ee58526cb6c2..692f7383d461 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -95,40 +95,8 @@ void release_thread(struct task_struct *); #endif #ifdef CONFIG_PPC64 -/* - * 64-bit user address space can have multiple limits - * For now supported values are: - */ -#define TASK_SIZE_64TB (0x4000UL) -#define TASK_SIZE_128TB (0x8000UL) -#define TASK_SIZE_512TB (0x0002UL) -#define TASK_SIZE_1PB (0x0004UL) -#define TASK_SIZE_2PB (0x0008UL) -/* - * With 52 bits in the address we can support - * upto 4PB of range. - */ -#define TASK_SIZE_4PB (0x0010UL) -/* - * For now 512TB is only supported with book3s and 64K linux page size. - */ -#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES) -/* - * Max value currently used: - */ -#define TASK_SIZE_USER64 TASK_SIZE_4PB -#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB -#define TASK_CONTEXT_SIZE TASK_SIZE_512TB -#else -#define TASK_SIZE_USER64 TASK_SIZE_64TB -#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB -/* - * We don't need to allocate extended context ids for 4K page size, because - * we limit the max effective address on this config to 64TB. - */ -#define TASK_CONTEXT_SIZE TASK_SIZE_64TB -#endif +#include /* * 32-bit user address space is 4GB - 1 page diff --git a/arch/powerpc/include/asm/task_size_user64.h b/arch/powerpc/include/asm/task_size_user64.h new file mode 100644 index ..a4043075864b --- /dev/null +++ b/arch/powerpc/include/asm/task_size_user64.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_TASK_SIZE_USER64_H +#define _ASM_POWERPC_TASK_SIZE_USER64_H + +#ifdef CONFIG_PPC64 +/* + * 64-bit user address space can have multiple limits + * For now supported values are: + */ +#define TASK_SIZE_64TB (0x4000UL) +#define TASK_SIZE_128TB (0x8000UL) +#define TASK_SIZE_512TB (0x0002UL) +#define TASK_SIZE_1PB (0x0004UL) +#define TASK_SIZE_2PB (0x0008UL) +/* + * With 52 bits in the address we can support + * upto 4PB of range. + */ +#define TASK_SIZE_4PB (0x0010UL) + +/* + * For now 512TB is only supported with book3s and 64K linux page size. + */ +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES) +/* + * Max value currently used: + */ +#define TASK_SIZE_USER64 TASK_SIZE_4PB +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB +#define TASK_CONTEXT_SIZE TASK_SIZE_512TB +#else +#define TASK_SIZE_USER64 TASK_SIZE_64TB +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB +/* + * We don't need to allocate extended context ids for 4K page size, because + * we limit the max effective address on this config to 64TB. + */ +#define TASK_CONTEXT_SIZE TASK_SIZE_64TB +#endif + +#endif /* CONFIG_PPC64 */ +#endif /* _ASM_POWERPC_TASK_SIZE_USER64_H */ diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c index e3f738eb1cac..64b5011475c7 100644 --- a/arch/powerpc/kvm/book3s_hv_hmi.c +++ b/arch/powerpc/kvm/book3s_hv_hmi.c @@ -24,6 +24,7 @@ #include #include #include +#include void wait_for_subcore_guest_exit(void) { -- 2.13.3
[PATCH v10 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK
The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. Changes since v9: - Rebased on 183cbf93be88 ("Automatic merge of branches 'master', 'next' and 'fixes' into merge") ==> Fixed conflict on xmon Changes since v8: - Rebased on e589b79e40d9 ("Automatic merge of branches 'master', 'next' and 'fixes' into merge") ==> Main impact was conflicts due to commit 9a8dd708d547 ("memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*") Changes since v7: - Rebased on fb6c6ce7907d ("Automatic merge of branches 'master', 'next' and 'fixes' into merge") Changes since v6: - Fixed validate_sp() to exclude NULL sp in 'regain entire stack space' patch (early crash with CONFIG_KMEMLEAK) Changes since v5: - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding - Fixed PPC_BPF_LOAD_CPU() macro Changes since v4: - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is not already existing, was due to spaces instead of a tab in the Makefile Changes since RFC v3: (based on Nick's review) - Renamed task_size.h to task_size_user64.h to better relate to what it contains. - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs moved to a separate patch. - Removed CURRENT_THREAD_INFO macro completely. - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is defined. - Added a patch at the end to rename 'tp' pointers to 'sp' pointers - Renamed 'tp' into 'sp' pointers in preparation patch when relevant - Fixed a few commit logs - Fixed checkpatch report. Changes since RFC v2: - Removed the modification of names in asm-offsets - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu in CFLAGS - Modified asm/smp.h to use the offset set in CFLAGS - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch - Moved the modification of current_pt_regs in the patch activating CONFIG_THREAD_INFO_IN_TASK Changes since RFC v1: - Removed the first patch which was modifying header inclusion order in timer - Modified some names in asm-offsets to avoid conflicts when including asm-offsets in C files - Modified asm/smp.h to avoid having to include linux/sched.h (using asm-offsets instead) - Moved some changes from the activation patch to the preparation patch. Christophe Leroy (9): book3s/64: avoid circular header inclusion in mmu-hash.h powerpc: Only use task_struct 'cpu' field on SMP powerpc: Prepare for moving thread_info into task_struct powerpc: Activate CONFIG_THREAD_INFO_IN_TASK powerpc: regain entire stack space powerpc: 'current_set' is now a table of task_struct pointers powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU powerpc/64: Remove CURRENT_THREAD_INFO powerpc: clean stack pointers naming arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 7 +++ arch/powerpc/include/asm/asm-prototypes.h | 4 +- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/exception-64s.h | 4 +- arch/powerpc/include/asm/irq.h | 14 ++--- arch/powerpc/include/asm/livepatch.h | 7 ++- arch/powerpc/include/asm/processor.h | 39 + arch/powerpc/include/asm/ptrace.h | 2 +- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/include/asm/smp.h | 17 +- arch/powerpc/include/asm/task_size_user64.h| 42 ++ arch/powerpc/include/asm/thread_info.h | 19 --- arch/powerpc/kernel/asm-offsets.c | 10 ++-- arch/powerpc/kernel/entry_32.S | 66 -- arch/powerpc/kernel/entry_64.S | 12 ++-- arch/powerpc/kernel/epapr_hcalls.S | 5 +- arch/powerpc/kernel/exceptions-64e.S | 13 + arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/head_32.S | 14 ++--- arch/powerpc/kernel/head_40x.S | 4 +- arch/powerpc/kernel/head_44x.S | 8 +-- arch/powerpc/kernel/head_64.S | 1 + arch/powerpc/kernel/head_8xx.S | 2 +- arch/powerpc/kernel/head_booke.h | 12 +--- arch/powerpc/kernel/head_fsl_booke.S | 16 +++--- arch/powerpc/kernel/idle_6xx.S | 8 +-- arch/powerpc/kernel/idle_book3e.S | 2 +- arch/powerpc/kernel/idle_e500.S| 8 +-- arch/powerpc/kernel/idle_power4.S | 2 +- arch/powerpc/kernel/irq.c | 77 +- arch/powerpc/kernel/kgdb.c
[PATCH v7 16/16] powerpc/8xx: regroup TLB handler routines
As this is running with MMU off, the CPU only does speculative fetch for code in the same page. Following the significant size reduction of TLB handler routines, the side handlers can be brought back close to the main part, ie in the same page. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 112 - 1 file changed, 54 insertions(+), 58 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 0a4f8a9c85ff..b171b7c0a0e7 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -399,6 +399,23 @@ InstructionTLBMiss: rfi #endif +#ifndef CONFIG_PIN_TLB_TEXT +ITLBMissLinear: + mtcrr11 + /* Set 8M byte page and mark it valid */ + li r11, MI_PS8MEG | MI_SVALID + mtspr SPRN_MI_TWC, r11 + rlwinm r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */ + ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ + _PAGE_PRESENT + mtspr SPRN_MI_RPN, r10/* Update TLB entry */ + +0: mfspr r10, SPRN_SPRG_SCRATCH0 + mfspr r11, SPRN_SPRG_SCRATCH1 + rfi + patch_site 0b, patch__itlbmiss_exit_2 +#endif + . = 0x1200 DataStoreTLBMiss: mtspr SPRN_SPRG_SCRATCH0, r10 @@ -484,6 +501,43 @@ DataStoreTLBMiss: rfi #endif +DTLBMissIMMR: + mtcrr11 + /* Set 512k byte guarded page and mark it valid */ + li r10, MD_PS512K | MD_GUARDED | MD_SVALID + mtspr SPRN_MD_TWC, r10 + mfspr r10, SPRN_IMMR /* Get current IMMR */ + rlwinm r10, r10, 0, 0xfff8 /* Get 512 kbytes boundary */ + ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ + _PAGE_PRESENT | _PAGE_NO_CACHE + mtspr SPRN_MD_RPN, r10/* Update TLB entry */ + + li r11, RPN_PATTERN + mtspr SPRN_DAR, r11 /* Tag DAR */ + +0: mfspr r10, SPRN_SPRG_SCRATCH0 + mfspr r11, SPRN_SPRG_SCRATCH1 + rfi + patch_site 0b, patch__dtlbmiss_exit_2 + +DTLBMissLinear: + mtcrr11 + /* Set 8M byte page and mark it valid */ + li r11, MD_PS8MEG | MD_SVALID + mtspr SPRN_MD_TWC, r11 + rlwinm r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */ + ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ + _PAGE_PRESENT + mtspr SPRN_MD_RPN, r10/* Update TLB entry */ + + li r11, RPN_PATTERN + mtspr SPRN_DAR, r11 /* Tag DAR */ + +0: mfspr r10, SPRN_SPRG_SCRATCH0 + mfspr r11, SPRN_SPRG_SCRATCH1 + rfi + patch_site 0b, patch__dtlbmiss_exit_3 + /* This is an instruction TLB error on the MPC8xx. This could be due * to many reasons, such as executing guarded memory or illegal instruction * addresses. There is nothing to do but handle a big time error fault. @@ -583,64 +637,6 @@ InstructionBreakpoint: . = 0x2000 -/* - * Bottom part of DataStoreTLBMiss handlers for IMMR area and linear RAM. - * not enough space in the DataStoreTLBMiss area. - */ -DTLBMissIMMR: - mtcrr11 - /* Set 512k byte guarded page and mark it valid */ - li r10, MD_PS512K | MD_GUARDED | MD_SVALID - mtspr SPRN_MD_TWC, r10 - mfspr r10, SPRN_IMMR /* Get current IMMR */ - rlwinm r10, r10, 0, 0xfff8 /* Get 512 kbytes boundary */ - ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ - _PAGE_PRESENT | _PAGE_NO_CACHE - mtspr SPRN_MD_RPN, r10/* Update TLB entry */ - - li r11, RPN_PATTERN - mtspr SPRN_DAR, r11 /* Tag DAR */ - -0: mfspr r10, SPRN_SPRG_SCRATCH0 - mfspr r11, SPRN_SPRG_SCRATCH1 - rfi - patch_site 0b, patch__dtlbmiss_exit_2 - -DTLBMissLinear: - mtcrr11 - /* Set 8M byte page and mark it valid */ - li r11, MD_PS8MEG | MD_SVALID - mtspr SPRN_MD_TWC, r11 - rlwinm r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */ - ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ - _PAGE_PRESENT - mtspr SPRN_MD_RPN, r10/* Update TLB entry */ - - li r11, RPN_PATTERN - mtspr SPRN_DAR, r11 /* Tag DAR */ - -0: mfspr r10, SPRN_SPRG_SCRATCH0 - mfspr r11, SPRN_SPRG_SCRATCH1 - rfi - patch_site 0b, patch__dtlbmiss_exit_3 - -#ifndef CONFIG_PIN_TLB_TEXT -ITLBMissLinear: - mtcrr11 - /* Set 8M byte page and mark it valid */ - li r11, MI_PS8MEG | MI_SVALID - mtspr SPRN_MI_TWC, r11 - rlwinm r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */ - ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ -
[PATCH v7 15/16] powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers
This patch reworks the TLB Miss handler in order to not use r12 register, hence avoiding having to save it into SPRN_SPRG_SCRATCH2. In the DAR Fixup code we can now use SPRN_M_TW, freeing SPRN_SPRG_SCRATCH2. Then SPRN_SPRG_SCRATCH2 may be used for something else in the future. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 110 ++--- 1 file changed, 49 insertions(+), 61 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 85fb4b8bf6c7..0a4f8a9c85ff 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -302,90 +302,87 @@ SystemCall: */ #ifdef CONFIG_8xx_CPU15 -#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr) \ - additmp, addr, PAGE_SIZE; \ - tlbie tmp;\ - additmp, addr, -PAGE_SIZE; \ - tlbie tmp +#define INVALIDATE_ADJACENT_PAGES_CPU15(addr) \ + addiaddr, addr, PAGE_SIZE; \ + tlbie addr; \ + addiaddr, addr, -(PAGE_SIZE << 1); \ + tlbie addr; \ + addiaddr, addr, PAGE_SIZE #else -#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr) +#define INVALIDATE_ADJACENT_PAGES_CPU15(addr) #endif InstructionTLBMiss: mtspr SPRN_SPRG_SCRATCH0, r10 +#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) mtspr SPRN_SPRG_SCRATCH1, r11 -#ifdef ITLB_MISS_KERNEL - mtspr SPRN_SPRG_SCRATCH2, r12 #endif /* If we are faulting a kernel address, we have to use the * kernel page tables. */ mfspr r10, SPRN_SRR0 /* Get effective address of fault */ - INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10) + INVALIDATE_ADJACENT_PAGES_CPU15(r10) mtspr SPRN_MD_EPN, r10 /* Only modules will cause ITLB Misses as we always * pin the first 8MB of kernel memory */ #ifdef ITLB_MISS_KERNEL - mfcrr12 + mfcrr11 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT) - andis. r11, r10, 0x8000/* Address >= 0x8000 */ + cmpicr0, r10, 0 /* Address >= 0x8000 */ #else - rlwinm r11, r10, 16, 0xfff8 - cmpli cr0, r11, PAGE_OFFSET@h + rlwinm r10, r10, 16, 0xfff8 + cmpli cr0, r10, PAGE_OFFSET@h #ifndef CONFIG_PIN_TLB_TEXT /* It is assumed that kernel code fits into the first 8M page */ -0: cmpli cr7, r11, (PAGE_OFFSET + 0x080)@h +0: cmpli cr7, r10, (PAGE_OFFSET + 0x080)@h patch_site 0b, patch__itlbmiss_linmem_top #endif #endif #endif - mfspr r11, SPRN_M_TWB /* Get level 1 table */ + mfspr r10, SPRN_M_TWB /* Get level 1 table */ #ifdef ITLB_MISS_KERNEL #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT) - beq+3f + bge+3f #else blt+3f #endif #ifndef CONFIG_PIN_TLB_TEXT blt cr7, ITLBMissLinear #endif - rlwinm r11, r11, 0, 20, 31 - orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha + rlwinm r10, r10, 0, 20, 31 + orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha 3: #endif - lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the level 1 entry */ + lwz r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10)/* Get level 1 entry */ + mtspr SPRN_MI_TWC, r10/* Set segment attributes */ - mtspr SPRN_MD_TWC, r11 + mtspr SPRN_MD_TWC, r10 mfspr r10, SPRN_MD_TWC lwz r10, 0(r10) /* Get the pte */ #ifdef ITLB_MISS_KERNEL - mtcrr12 + mtcrr11 #endif - /* Load the MI_TWC with the attributes for this "segment." */ - mtspr SPRN_MI_TWC, r11/* Set segment attributes */ - #ifdef CONFIG_SWAP rlwinm r11, r10, 32-5, _PAGE_PRESENT and r11, r11, r10 rlwimi r10, r11, 0, _PAGE_PRESENT #endif - li r11, RPN_PATTERN | 0x200 /* The Linux PTE won't go exactly into the MMU TLB. * Software indicator bits 20 and 23 must be clear. * Software indicator bits 22, 24, 25, 26, and 27 must be * set. All other Linux PTE bits control the behavior * of the MMU. */ - rlwimi r11, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */ - rlwimi r10, r11, 0, 0x0ff0 /* Set 22, 24-27, clear 20,23 */ + rlwimi r10, r10, 0, 0x0f00 /* Clear bits 20-23 */ + rlwimi r10, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */ + ori r10, r10, RPN_PATTERN | 0x200 /* Set 22 and 24-27 */ mtspr SPRN_MI_RPN, r10/* Update TLB entry */ /* Restore registers */ 0: mfspr r10, SPRN_SPRG_SCRATCH0 +#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) mfspr r11, SPRN_SPRG_SCRATCH1 -#ifdef ITLB_MISS_KERNEL - mfspr r12, SPRN_SPRG_SCRATCH2 #endif rfi patch_site 0b,
[PATCH v7 14/16] powerpc/mm: reintroduce 16K pages with HW assistance on 8xx
Using this HW assistance implies some constraints on the page table structure: - Regardless of the main page size used (4k or 16k), the level 1 table (PGD) contains 1024 entries and each PGD entry covers a 4Mbytes area which is managed by a level 2 table (PTE) containing also 1024 entries each describing a 4k page. - 16k pages require 4 identifical entries in the L2 table - 512k pages PTE have to be spread every 128 bytes in the L2 table - 8M pages PTE are at the address pointed by the L1 entry and each 8M page require 2 identical entries in the PGD. In order to use hardware assistance with 16K pages, this patch does the following modifications: - Make PGD size independent of the main page size - In 16k pages mode, redefine pte_t as a struct with 4 elements, and populate those 4 elements in __set_pte_at() and pte_update() - Adapt the size of the hugepage tables. - Define a PTE_FRAGMENT_NB so that a 16k page contains 4 page tables. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 1 + arch/powerpc/include/asm/nohash/32/pgtable.h | 19 ++- arch/powerpc/include/asm/nohash/pgtable.h| 4 arch/powerpc/include/asm/pgtable-types.h | 4 5 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index ddfccdf004fe..8be31261aec8 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -689,7 +689,7 @@ config PPC_4K_PAGES config PPC_16K_PAGES bool "16k page size" - depends on 44x + depends on 44x || PPC_8xx config PPC_64K_PAGES bool "64k page size" diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h index fa05aa566ece..25f05131afd5 100644 --- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h @@ -190,6 +190,7 @@ typedef struct { struct slice_mask mask_8m; # endif #endif + void *pte_frag; } mm_context_t; #define PHYS_IMMR_BASE (mfspr(SPRN_IMMR) & 0xfff8) diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h index 31a03e9a42c4..e3e81b078432 100644 --- a/arch/powerpc/include/asm/nohash/32/pgtable.h +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h @@ -19,7 +19,14 @@ extern int icache_44x_need_flush; #endif /* __ASSEMBLY__ */ +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES) +#define PTE_INDEX_SIZE (PTE_SHIFT - 2) +#define PTE_FRAG_NR4 +#define PTE_FRAG_SIZE_SHIFT12 +#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT) +#else #define PTE_INDEX_SIZE PTE_SHIFT +#endif #define PMD_INDEX_SIZE 0 #define PUD_INDEX_SIZE 0 @@ -49,7 +56,11 @@ extern int icache_44x_need_flush; * -Matt */ /* PGDIR_SHIFT determines what a top-level page table entry can map */ +#ifdef CONFIG_PPC_8xx +#define PGDIR_SHIFT22 +#else #define PGDIR_SHIFT(PAGE_SHIFT + PTE_INDEX_SIZE) +#endif #define PGDIR_SIZE (1UL << PGDIR_SHIFT) #define PGDIR_MASK (~(PGDIR_SIZE-1)) @@ -233,7 +244,13 @@ static inline unsigned long pte_update(pte_t *p, : "cc" ); #else /* PTE_ATOMIC_UPDATES */ unsigned long old = pte_val(*p); - *p = __pte((old & ~clr) | set); + unsigned long new = (old & ~clr) | set; + +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES) + p->pte = p->pte1 = p->pte2 = p->pte3 = new; +#else + *p = __pte(new); +#endif #endif /* !PTE_ATOMIC_UPDATES */ #ifdef CONFIG_44x diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h index 70ff23974b59..1ca1c1864b32 100644 --- a/arch/powerpc/include/asm/nohash/pgtable.h +++ b/arch/powerpc/include/asm/nohash/pgtable.h @@ -209,7 +209,11 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, /* Anything else just stores the PTE normally. That covers all 64-bit * cases, and 32-bit non-hash with 32-bit PTEs. */ +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES) + ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte); +#else *ptep = pte; +#endif /* * With hardware tablewalk, a sync is needed to ensure that diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h index eccb30b38b47..3b0edf041b2e 100644 --- a/arch/powerpc/include/asm/pgtable-types.h +++ b/arch/powerpc/include/asm/pgtable-types.h @@ -3,7 +3,11 @@ #define _ASM_POWERPC_PGTABLE_TYPES_H /* PTE level */ +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES) +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t; +#else typedef struct { pte_basic_t pte; } pte_t; +#endif #define __pte(x) ((pte_t) { (x) }) static inline pte_basic_t pte_val(pte_t x) { -- 2.13.3
[PATCH v7 11/16] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx
Today, on the 8xx the TLB handlers do SW tablewalk by doing all the calculation in ASM, in order to match with the Linux page table structure. The 8xx offers hardware assistance which allows significant size reduction of the TLB handlers, hence also reduces the time spent in the handlers. However, using this HW assistance implies some constraints on the page table structure: - Regardless of the main page size used (4k or 16k), the level 1 table (PGD) contains 1024 entries and each PGD entry covers a 4Mbytes area which is managed by a level 2 table (PTE) containing also 1024 entries each describing a 4k page. - 16k pages require 4 identifical entries in the L2 table - 512k pages PTE have to be spread every 128 bytes in the L2 table - 8M pages PTE are at the address pointed by the L1 entry and each 8M page require 2 identical entries in the PGD. This patch modifies the TLB handlers to use HW assistance for 4K PAGES. Before that patch, the mean time spent in TLB miss handlers is: - ITLB miss: 80 ticks - DTLB miss: 62 ticks After that patch, the mean time spent in TLB miss handlers is: - ITLB miss: 72 ticks - DTLB miss: 54 ticks So the improvement is 10% for ITLB and 13% for DTLB misses Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 58 +- arch/powerpc/mm/8xx_mmu.c | 4 +-- 2 files changed, 26 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 01f58b1d9ae7..85fb4b8bf6c7 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -292,7 +292,7 @@ SystemCall: . = 0x1100 /* * For the MPC8xx, this is a software tablewalk to load the instruction - * TLB. The task switch loads the M_TW register with the pointer to the first + * TLB. The task switch loads the M_TWB register with the pointer to the first * level table. * If we discover there is no second level table (value is zero) or if there * is an invalid pte, we load that into the TLB, which causes another fault @@ -323,6 +323,7 @@ InstructionTLBMiss: */ mfspr r10, SPRN_SRR0 /* Get effective address of fault */ INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10) + mtspr SPRN_MD_EPN, r10 /* Only modules will cause ITLB Misses as we always * pin the first 8MB of kernel memory */ #ifdef ITLB_MISS_KERNEL @@ -339,7 +340,7 @@ InstructionTLBMiss: #endif #endif #endif - mfspr r11, SPRN_M_TW /* Get level 1 table */ + mfspr r11, SPRN_M_TWB /* Get level 1 table */ #ifdef ITLB_MISS_KERNEL #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT) beq+3f @@ -349,16 +350,14 @@ InstructionTLBMiss: #ifndef CONFIG_PIN_TLB_TEXT blt cr7, ITLBMissLinear #endif - lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha + rlwinm r11, r11, 0, 20, 31 + orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha 3: #endif - /* Insert level 1 index */ - rlwimi r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29 lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the level 1 entry */ - /* Extract level 2 index */ - rlwinm r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29 - rlwimi r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */ + mtspr SPRN_MD_TWC, r11 + mfspr r10, SPRN_MD_TWC lwz r10, 0(r10) /* Get the pte */ #ifdef ITLB_MISS_KERNEL mtcrr12 @@ -417,7 +416,7 @@ DataStoreTLBMiss: mfspr r10, SPRN_MD_EPN rlwinm r11, r10, 16, 0xfff8 cmpli cr0, r11, PAGE_OFFSET@h - mfspr r11, SPRN_M_TW /* Get level 1 table */ + mfspr r11, SPRN_M_TWB /* Get level 1 table */ blt+3f rlwinm r11, r10, 16, 0xfff8 #ifndef CONFIG_PIN_TLB_IMMR @@ -430,20 +429,16 @@ DataStoreTLBMiss: patch_site 0b, patch__dtlbmiss_immr_jmp #endif blt cr7, DTLBMissLinear - lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha + mfspr r11, SPRN_M_TWB /* Get level 1 table */ + rlwinm r11, r11, 0, 20, 31 + orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha 3: - - /* Insert level 1 index */ - rlwimi r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29 lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the level 1 entry */ - /* We have a pte table, so load fetch the pte from the table. -*/ - /* Extract level 2 index */ - rlwinm r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29 - rlwimi r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */ + mtspr SPRN_MD_TWC, r11 + mfspr r10, SPRN_MD_TWC lwz r10, 0(r10) /* Get the pte */ -4: + mtcrr12 /* Insert the Guarded flag into the TWC from the Linux PTE. @@ -668,9 +663,10 @@ FixupDAR:/* Entry point for dcbx workaround. */ mtspr
[PATCH v7 13/16] powerpc/mm: Enable 512k hugepage support with HW assistance on the 8xx
For using 512k pages with hardware assistance, the PTEs have to be spread every 128 bytes in the L2 table. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/hugetlb.h | 4 +++- arch/powerpc/mm/hugetlbpage.c | 13 + arch/powerpc/mm/tlb_nohash.c | 3 +++ 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h index dfb8bf236586..62a0ca02ca7d 100644 --- a/arch/powerpc/include/asm/hugetlb.h +++ b/arch/powerpc/include/asm/hugetlb.h @@ -74,7 +74,9 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr, unsigned long idx = 0; pte_t *dir = hugepd_page(hpd); -#ifndef CONFIG_PPC_FSL_BOOK3E +#ifdef CONFIG_PPC_8xx + idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT; +#elif !defined(CONFIG_PPC_FSL_BOOK3E) idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd); #endif diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index bc97874d7c74..d0b92a0a072d 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -66,7 +66,11 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp, cachep = PGT_CACHE(PTE_T_ORDER); num_hugepd = 1 << (pshift - pdshift); } else { +#ifdef CONFIG_PPC_8xx + cachep = PGT_CACHE(PTE_SHIFT); +#else cachep = PGT_CACHE(pdshift - pshift); +#endif num_hugepd = 1; } @@ -332,8 +336,13 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif if (shift >= pdshift) hugepd_free(tlb, hugepte); else +#ifdef CONFIG_PPC_8xx + pgtable_free_tlb(tlb, hugepte, +get_hugepd_cache_index(PTE_SHIFT)); +#else pgtable_free_tlb(tlb, hugepte, get_hugepd_cache_index(pdshift - shift)); +#endif } static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud, @@ -701,7 +710,11 @@ static int __init hugetlbpage_init(void) * use pgt cache for hugepd. */ if (pdshift > shift) +#ifdef CONFIG_PPC_8xx + pgtable_cache_add(PTE_SHIFT); +#else pgtable_cache_add(pdshift - shift); +#endif #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx) else pgtable_cache_add(PTE_T_ORDER); diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c index 8ad7aab150b7..ae5d568e267f 100644 --- a/arch/powerpc/mm/tlb_nohash.c +++ b/arch/powerpc/mm/tlb_nohash.c @@ -97,6 +97,9 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = { .shift = 14, }, #endif + [MMU_PAGE_512K] = { + .shift = 19, + }, [MMU_PAGE_8M] = { .shift = 23, }, -- 2.13.3
[PATCH v7 12/16] powerpc/mm: Enable 8M hugepage support with HW assistance on the 8xx
HW assistance naturally supports 8M huge pages without further modifications. Signed-off-by: Christophe Leroy --- arch/powerpc/mm/tlb_nohash.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c index 4f79639e432f..8ad7aab150b7 100644 --- a/arch/powerpc/mm/tlb_nohash.c +++ b/arch/powerpc/mm/tlb_nohash.c @@ -97,6 +97,9 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = { .shift = 14, }, #endif + [MMU_PAGE_8M] = { + .shift = 23, + }, }; #else struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = { -- 2.13.3
[PATCH v7 10/16] powerpc/8xx: Temporarily disable 16k pages and hugepages
In preparation of making use of hardware assistance in TLB handlers, this patch temporarily disables 16K pages and hugepages. The reason is that when using HW assistance in 4K pages mode, the linux model fit with the HW model for 4K pages and 8M pages. However for 16K pages and 512K mode some additional work is needed to get linux model fit with HW model. For the 8M pages, they will naturaly come back when we switch to HW assistance, without any additional handling. In order to keep the following patch smaller, the removal of the current special handling for 8M pages gets removed here as well. Therefore the 4K pages mode will be implemented first and without support for 512k hugepages. Then the 512k hugepages will be brought back. And the 16K pages will be implemented in the following step. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/head_8xx.S | 74 +++--- arch/powerpc/mm/tlb_nohash.c | 6 3 files changed, 6 insertions(+), 76 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8be31261aec8..ddfccdf004fe 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -689,7 +689,7 @@ config PPC_4K_PAGES config PPC_16K_PAGES bool "16k page size" - depends on 44x || PPC_8xx + depends on 44x config PPC_64K_PAGES bool "64k page size" diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index c203defe49a4..01f58b1d9ae7 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -314,7 +314,7 @@ SystemCall: InstructionTLBMiss: mtspr SPRN_SPRG_SCRATCH0, r10 mtspr SPRN_SPRG_SCRATCH1, r11 -#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE) +#ifdef ITLB_MISS_KERNEL mtspr SPRN_SPRG_SCRATCH2, r12 #endif @@ -325,10 +325,8 @@ InstructionTLBMiss: INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10) /* Only modules will cause ITLB Misses as we always * pin the first 8MB of kernel memory */ -#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE) - mfcrr12 -#endif #ifdef ITLB_MISS_KERNEL + mfcrr12 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT) andis. r11, r10, 0x8000/* Address >= 0x8000 */ #else @@ -360,15 +358,9 @@ InstructionTLBMiss: /* Extract level 2 index */ rlwinm r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29 -#ifdef CONFIG_HUGETLB_PAGE - mtcrr11 - bt- 28, 10f /* bit 28 = Large page (8M) */ - bt- 29, 20f /* bit 29 = Large page (8M or 512k) */ -#endif rlwimi r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */ lwz r10, 0(r10) /* Get the pte */ -4: -#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE) +#ifdef ITLB_MISS_KERNEL mtcrr12 #endif /* Load the MI_TWC with the attributes for this "segment." */ @@ -393,7 +385,7 @@ InstructionTLBMiss: /* Restore registers */ 0: mfspr r10, SPRN_SPRG_SCRATCH0 mfspr r11, SPRN_SPRG_SCRATCH1 -#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE) +#ifdef ITLB_MISS_KERNEL mfspr r12, SPRN_SPRG_SCRATCH2 #endif rfi @@ -406,35 +398,12 @@ InstructionTLBMiss: stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0) mfspr r10, SPRN_SPRG_SCRATCH0 mfspr r11, SPRN_SPRG_SCRATCH1 -#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE) +#ifdef ITLB_MISS_KERNEL mfspr r12, SPRN_SPRG_SCRATCH2 #endif rfi #endif -#ifdef CONFIG_HUGETLB_PAGE -10:/* 8M pages */ -#ifdef CONFIG_PPC_16K_PAGES - /* Extract level 2 index */ - rlwinm r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29 - /* Add level 2 base */ - rlwimi r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1 -#else - /* Level 2 base */ - rlwinm r10, r11, 0, ~HUGEPD_SHIFT_MASK -#endif - lwz r10, 0(r10) /* Get the pte */ - b 4b - -20:/* 512k pages */ - /* Extract level 2 index */ - rlwinm r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29 - /* Add level 2 base */ - rlwimi r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1 - lwz r10, 0(r10) /* Get the pte */ - b 4b -#endif - . = 0x1200 DataStoreTLBMiss: mtspr SPRN_SPRG_SCRATCH0, r10 @@ -472,11 +441,6 @@ DataStoreTLBMiss: */ /* Extract level 2 index */ rlwinm r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29 -#ifdef CONFIG_HUGETLB_PAGE - mtcrr11 - bt- 28, 10f /* bit 28 = Large page (8M) */ - bt- 29, 20f /* bit 29 = Large page (8M or 512k) */ -#endif rlwimi r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
[PATCH v7 08/16] powerpc/mm: Extend pte_fragment functionality to PPC32
In order to allow the 8xx to handle pte_fragments, this patch extends the use of pte_fragments to PPC32 platforms. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/mmu-hash.h | 5 - arch/powerpc/include/asm/book3s/32/pgalloc.h | 18 ++ arch/powerpc/include/asm/book3s/32/pgtable.h | 5 +++-- arch/powerpc/include/asm/mmu_context.h| 2 +- arch/powerpc/include/asm/nohash/32/mmu.h | 4 +++- arch/powerpc/include/asm/nohash/32/pgalloc.h | 23 --- arch/powerpc/include/asm/nohash/32/pgtable.h | 8 +--- arch/powerpc/mm/Makefile | 1 + arch/powerpc/mm/mmu_context.c | 10 ++ arch/powerpc/mm/mmu_context_nohash.c | 2 +- arch/powerpc/mm/pgtable_32.c | 25 - 11 files changed, 54 insertions(+), 49 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h b/arch/powerpc/include/asm/book3s/32/mmu-hash.h index 5bd26c218b94..2bb500d25de6 100644 --- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_ #define _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_ + /* * 32-bit hash table MMU support */ @@ -9,6 +10,8 @@ * BATs */ +#include + /* Block size masks */ #define BL_128K0x000 #define BL_256K 0x001 @@ -43,7 +46,7 @@ struct ppc_bat { u32 batl; }; -typedef struct page *pgtable_t; +typedef pte_t *pgtable_t; #endif /* !__ASSEMBLY__ */ /* diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h index a70f3cf16dc8..56e805107352 100644 --- a/arch/powerpc/include/asm/book3s/32/pgalloc.h +++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h @@ -27,6 +27,10 @@ extern void __bad_pte(pmd_t *pmd); extern struct kmem_cache *pgtable_cache[]; #define PGT_CACHE(shift) pgtable_cache[shift] +void pte_frag_destroy(void *pte_frag); +pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel); +void pte_fragment_free(unsigned long *table, int kernel); + static inline pgd_t *pgd_alloc(struct mm_struct *mm) { return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE), @@ -56,30 +60,28 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp, pgtable_t pte_page) { - *pmdp = __pmd((page_to_pfn(pte_page) << PAGE_SHIFT) | _PMD_PRESENT); + *pmdp = __pmd(__pa(pte_page) | _PMD_PRESENT); } -#define pmd_pgtable(pmd) pmd_page(pmd) +#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd)) extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr); extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr); static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte) { - free_page((unsigned long)pte); + pte_fragment_free((unsigned long *)pte, 1); } static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage) { - pgtable_page_dtor(ptepage); - __free_page(ptepage); + pte_fragment_free((unsigned long *)ptepage, 0); } static inline void pgtable_free(void *table, unsigned index_size) { if (!index_size) { - pgtable_page_dtor(virt_to_page(table)); - free_page((unsigned long)table); + pte_fragment_free((unsigned long *)table, 0); } else { BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE); kmem_cache_free(PGT_CACHE(index_size), table); @@ -117,6 +119,6 @@ static inline void pgtable_free_tlb(struct mmu_gather *tlb, static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table, unsigned long address) { - pgtable_free_tlb(tlb, page_address(table), 0); + pgtable_free_tlb(tlb, table, 0); } #endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */ diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 32c33eccc0e2..47156b93f9af 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -329,7 +329,7 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma, #define pte_same(A,B) (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0) #define pmd_page_vaddr(pmd)\ - ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK)) + ((unsigned long)__va(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1))) #define pmd_page(pmd) \ pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT) @@ -346,7 +346,8 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma, #define pte_offset_kernel(dir, addr) \ ((pte_t *) pmd_page_vaddr(*(dir)) + pte_index(addr)) #define pte_offset_map(dir, addr) \ - ((pte_t *)
[PATCH v7 09/16] powerpc/8xx: Move SW perf counters in first 32kb of memory
In order to simplify time critical exceptions handling 8xx specific SW perf counters, this patch moves the counters into the beginning of memory. This is possible because .text is readable and the counters are never modified outside of the handlers. By doing this, we avoid having to set a second register with the upper part of the address of the counters. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 58 -- 1 file changed, 28 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 3b67b9533c82..c203defe49a4 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -106,6 +106,23 @@ turn_on_mmu: mtspr SPRN_SRR0,r0 rfi /* enables MMU */ + +#ifdef CONFIG_PERF_EVENTS + .align 4 + + .globl itlb_miss_counter +itlb_miss_counter: + .space 4 + + .globl dtlb_miss_counter +dtlb_miss_counter: + .space 4 + + .globl instruction_counter +instruction_counter: + .space 4 +#endif + /* * Exception entry code. This code runs with address translation * turned off, i.e. using physical addresses. @@ -384,17 +401,16 @@ InstructionTLBMiss: #ifdef CONFIG_PERF_EVENTS patch_site 0f, patch__itlbmiss_perf -0: lis r10, (itlb_miss_counter - PAGE_OFFSET)@ha - lwz r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10) - addir11, r11, 1 - stw r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10) -#endif +0: lwz r10, (itlb_miss_counter - PAGE_OFFSET)@l(0) + addir10, r10, 1 + stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0) mfspr r10, SPRN_SPRG_SCRATCH0 mfspr r11, SPRN_SPRG_SCRATCH1 #if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE) mfspr r12, SPRN_SPRG_SCRATCH2 #endif rfi +#endif #ifdef CONFIG_HUGETLB_PAGE 10:/* 8M pages */ @@ -509,15 +525,14 @@ DataStoreTLBMiss: #ifdef CONFIG_PERF_EVENTS patch_site 0f, patch__dtlbmiss_perf -0: lis r10, (dtlb_miss_counter - PAGE_OFFSET)@ha - lwz r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10) - addir11, r11, 1 - stw r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10) -#endif +0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) + addir10, r10, 1 + stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) mfspr r10, SPRN_SPRG_SCRATCH0 mfspr r11, SPRN_SPRG_SCRATCH1 mfspr r12, SPRN_SPRG_SCRATCH2 rfi +#endif #ifdef CONFIG_HUGETLB_PAGE 10:/* 8M pages */ @@ -625,16 +640,13 @@ DataBreakpoint: . = 0x1d00 InstructionBreakpoint: mtspr SPRN_SPRG_SCRATCH0, r10 - mtspr SPRN_SPRG_SCRATCH1, r11 - lis r10, (instruction_counter - PAGE_OFFSET)@ha - lwz r11, (instruction_counter - PAGE_OFFSET)@l(r10) - addir11, r11, -1 - stw r11, (instruction_counter - PAGE_OFFSET)@l(r10) + lwz r10, (instruction_counter - PAGE_OFFSET)@l(0) + addir10, r10, -1 + stw r10, (instruction_counter - PAGE_OFFSET)@l(0) lis r10, 0x ori r10, r10, 0x01 mtspr SPRN_COUNTA, r10 mfspr r10, SPRN_SPRG_SCRATCH0 - mfspr r11, SPRN_SPRG_SCRATCH1 rfi #else EXCEPTION(0x1d00, Trap_1d, unknown_exception, EXC_XFER_EE) @@ -1065,17 +1077,3 @@ swapper_pg_dir: */ abatron_pteptrs: .space 8 - -#ifdef CONFIG_PERF_EVENTS - .globl itlb_miss_counter -itlb_miss_counter: - .space 4 - - .globl dtlb_miss_counter -dtlb_miss_counter: - .space 4 - - .globl instruction_counter -instruction_counter: - .space 4 -#endif -- 2.13.3
[PATCH v7 07/16] powerpc/mm: add helpers to get/set mm.context->pte_frag
In order to handle pte_fragment functions with single fragment without adding pte_frag in all mm_context_t, this patch creates two helpers which do nothing on platforms using a single fragment. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/pgtable.h | 31 +++ arch/powerpc/mm/pgtable-frag.c | 8 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 1c49ca31dcfe..74810bba45d2 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -110,6 +110,37 @@ void mark_initmem_nx(void); static inline void mark_initmem_nx(void) { } #endif +/* + * When used, PTE_FRAG_NR is defined in subarch pgtable.h + * so we are sure it is included when arriving here. + */ +#ifndef PTE_FRAG_NR +#define PTE_FRAG_NR1 +#define PTE_FRAG_SIZE_SHIFTPAGE_SHIFT +#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT) +#endif + +#if PTE_FRAG_NR != 1 +static inline void *pte_frag_get(mm_context_t *ctx) +{ + return ctx->pte_frag; +} + +static inline void pte_frag_set(mm_context_t *ctx, void *p) +{ + ctx->pte_frag = p; +} +#else +static inline void *pte_frag_get(mm_context_t *ctx) +{ + return NULL; +} + +static inline void pte_frag_set(mm_context_t *ctx, void *p) +{ +} +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PGTABLE_H */ diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c index 7544d0d7177d..af23a587f019 100644 --- a/arch/powerpc/mm/pgtable-frag.c +++ b/arch/powerpc/mm/pgtable-frag.c @@ -38,7 +38,7 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm) return NULL; spin_lock(>page_table_lock); - ret = mm->context.pte_frag; + ret = pte_frag_get(>context); if (ret) { pte_frag = ret + PTE_FRAG_SIZE; /* @@ -46,7 +46,7 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm) */ if (((unsigned long)pte_frag & ~PAGE_MASK) == 0) pte_frag = NULL; - mm->context.pte_frag = pte_frag; + pte_frag_set(>context, pte_frag); } spin_unlock(>page_table_lock); return (pte_t *)ret; @@ -86,9 +86,9 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel) * the allocated page with single fragement * count. */ - if (likely(!mm->context.pte_frag)) { + if (likely(!pte_frag_get(>context))) { atomic_set(>pt_frag_refcount, PTE_FRAG_NR); - mm->context.pte_frag = ret + PTE_FRAG_SIZE; + pte_frag_set(>context, ret + PTE_FRAG_SIZE); } spin_unlock(>page_table_lock); -- 2.13.3
[PATCH v7 06/16] powerpc/mm: Move pgtable_t into platform headers
This patch move pgtable_t into platform headers. It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64 as nohash/64 doesn't support CONFIG_PPC_64K_PAGES. Reviewed-by: Aneesh Kumar K.V Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/mmu-hash.h | 2 ++ arch/powerpc/include/asm/book3s/64/mmu.h | 9 + arch/powerpc/include/asm/nohash/32/mmu.h | 4 arch/powerpc/include/asm/nohash/64/mmu.h | 4 arch/powerpc/include/asm/page.h | 14 -- 5 files changed, 19 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h b/arch/powerpc/include/asm/book3s/32/mmu-hash.h index e38c91388c40..5bd26c218b94 100644 --- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h @@ -42,6 +42,8 @@ struct ppc_bat { u32 batu; u32 batl; }; + +typedef struct page *pgtable_t; #endif /* !__ASSEMBLY__ */ /* diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index 6328857f259f..1ceee000c18d 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -2,6 +2,8 @@ #ifndef _ASM_POWERPC_BOOK3S_64_MMU_H_ #define _ASM_POWERPC_BOOK3S_64_MMU_H_ +#include + #ifndef __ASSEMBLY__ /* * Page size definition @@ -24,6 +26,13 @@ struct mmu_psize_def { }; extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT]; +/* + * For BOOK3s 64 with 4k and 64K linux page size + * we want to use pointers, because the page table + * actually store pfn + */ +typedef pte_t *pgtable_t; + #endif /* __ASSEMBLY__ */ /* 64-bit classic hash table MMU */ diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h b/arch/powerpc/include/asm/nohash/32/mmu.h index af0e8b54876a..f61f933a4cd8 100644 --- a/arch/powerpc/include/asm/nohash/32/mmu.h +++ b/arch/powerpc/include/asm/nohash/32/mmu.h @@ -16,4 +16,8 @@ #include #endif +#ifndef __ASSEMBLY__ +typedef struct page *pgtable_t; +#endif + #endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */ diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h b/arch/powerpc/include/asm/nohash/64/mmu.h index 87871d027b75..e6585480dfc4 100644 --- a/arch/powerpc/include/asm/nohash/64/mmu.h +++ b/arch/powerpc/include/asm/nohash/64/mmu.h @@ -5,4 +5,8 @@ /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */ #include +#ifndef __ASSEMBLY__ +typedef struct page *pgtable_t; +#endif + #endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */ diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 9ea903221a9f..a7624a3b1435 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -335,20 +335,6 @@ void arch_free_page(struct page *page, int order); #endif struct vm_area_struct; -#ifdef CONFIG_PPC_BOOK3S_64 -/* - * For BOOK3s 64 with 4k and 64K linux page size - * we want to use pointers, because the page table - * actually store pfn - */ -typedef pte_t *pgtable_t; -#else -#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC64) -typedef pte_t *pgtable_t; -#else -typedef struct page *pgtable_t; -#endif -#endif #include #endif /* __ASSEMBLY__ */ -- 2.13.3
[PATCH v7 05/16] powerpc/mm: move platform specific mmu-xxx.h in platform directories
The purpose of this patch is to move platform specific mmu-xxx.h files in platform directories like pte-xxx.h files. In the meantime this patch creates common nohash and nohash/32 + nohash/64 mmu.h files for future common parts. Reviewed-by: Aneesh Kumar K.V Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/mmu.h | 14 ++ arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h | 0 arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h | 0 arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h | 0 arch/powerpc/include/asm/nohash/32/mmu.h | 19 +++ arch/powerpc/include/asm/nohash/64/mmu.h | 8 arch/powerpc/include/asm/{ => nohash}/mmu-book3e.h | 0 arch/powerpc/include/asm/nohash/mmu.h | 11 +++ arch/powerpc/kernel/cpu_setup_fsl_booke.S | 2 +- arch/powerpc/kvm/e500.h| 2 +- 10 files changed, 42 insertions(+), 14 deletions(-) rename arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h (100%) rename arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h (100%) rename arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h (100%) create mode 100644 arch/powerpc/include/asm/nohash/32/mmu.h create mode 100644 arch/powerpc/include/asm/nohash/64/mmu.h rename arch/powerpc/include/asm/{ => nohash}/mmu-book3e.h (100%) create mode 100644 arch/powerpc/include/asm/nohash/mmu.h diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index eb20eb3b8fb0..2184021b0e1c 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -341,18 +341,8 @@ static inline void mmu_early_init_devtree(void) { } #if defined(CONFIG_PPC_STD_MMU_32) /* 32-bit classic hash table MMU */ #include -#elif defined(CONFIG_40x) -/* 40x-style software loaded TLB */ -# include -#elif defined(CONFIG_44x) -/* 44x-style software loaded TLB */ -# include -#elif defined(CONFIG_PPC_BOOK3E_MMU) -/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */ -# include -#elif defined (CONFIG_PPC_8xx) -/* Motorola/Freescale 8xx software loaded TLB */ -# include +#elif defined(CONFIG_PPC_MMU_NOHASH) +#include #endif #endif /* __KERNEL__ */ diff --git a/arch/powerpc/include/asm/mmu-40x.h b/arch/powerpc/include/asm/nohash/32/mmu-40x.h similarity index 100% rename from arch/powerpc/include/asm/mmu-40x.h rename to arch/powerpc/include/asm/nohash/32/mmu-40x.h diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/nohash/32/mmu-44x.h similarity index 100% rename from arch/powerpc/include/asm/mmu-44x.h rename to arch/powerpc/include/asm/nohash/32/mmu-44x.h diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h similarity index 100% rename from arch/powerpc/include/asm/mmu-8xx.h rename to arch/powerpc/include/asm/nohash/32/mmu-8xx.h diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h b/arch/powerpc/include/asm/nohash/32/mmu.h new file mode 100644 index ..af0e8b54876a --- /dev/null +++ b/arch/powerpc/include/asm/nohash/32/mmu.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_NOHASH_32_MMU_H_ +#define _ASM_POWERPC_NOHASH_32_MMU_H_ + +#if defined(CONFIG_40x) +/* 40x-style software loaded TLB */ +#include +#elif defined(CONFIG_44x) +/* 44x-style software loaded TLB */ +#include +#elif defined(CONFIG_PPC_BOOK3E_MMU) +/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */ +#include +#elif defined (CONFIG_PPC_8xx) +/* Motorola/Freescale 8xx software loaded TLB */ +#include +#endif + +#endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */ diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h b/arch/powerpc/include/asm/nohash/64/mmu.h new file mode 100644 index ..87871d027b75 --- /dev/null +++ b/arch/powerpc/include/asm/nohash/64/mmu.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_NOHASH_64_MMU_H_ +#define _ASM_POWERPC_NOHASH_64_MMU_H_ + +/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */ +#include + +#endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */ diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/nohash/mmu-book3e.h similarity index 100% rename from arch/powerpc/include/asm/mmu-book3e.h rename to arch/powerpc/include/asm/nohash/mmu-book3e.h diff --git a/arch/powerpc/include/asm/nohash/mmu.h b/arch/powerpc/include/asm/nohash/mmu.h new file mode 100644 index ..a037cb1efb57 --- /dev/null +++ b/arch/powerpc/include/asm/nohash/mmu.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_NOHASH_MMU_H_ +#define _ASM_POWERPC_NOHASH_MMU_H_ + +#ifdef CONFIG_PPC64 +#include +#else +#include +#endif + +#endif /* _ASM_POWERPC_NOHASH_MMU_H_ */ diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/powerpc/kernel/cpu_setup_fsl_booke.S index 8d142e5d84cd..5fbc890d1094 100644 --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S +++
[PATCH v7 02/16] powerpc/8xx: Remove PTE_ATOMIC_UPDATES
commit 1bc54c03117b9 ("powerpc: rework 4xx PTE access and TLB miss") introduced non atomic PTE updates and started the work of removing PTE updates in TLB miss handlers, but kept PTE_ATOMIC_UPDATES for the 8xx with the following comment: /* Until my rework is finished, 8xx still needs atomic PTE updates */ commit fe11dc3f9628e ("powerpc/8xx: Update TLB asm so it behaves as linux mm expects") removed all PTE updates done in TLB miss handlers Therefore, atomic PTE updates are not needed anymore for the 8xx Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/nohash/32/pte-8xx.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h index 6bfe041ef59d..c9e4b2d90f65 100644 --- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h @@ -65,9 +65,6 @@ #define _PTE_NONE_MASK 0 -/* Until my rework is finished, 8xx still needs atomic PTE updates */ -#define PTE_ATOMIC_UPDATES 1 - #ifdef CONFIG_PPC_16K_PAGES #define _PAGE_PSIZE_PAGE_SPS #else -- 2.13.3
[PATCH v7 04/16] powerpc/mm: Avoid useless lock with single page fragments
There is no point in taking the page table lock as pte_frag or pmd_frag are always NULL when we have only one fragment. Reviewed-by: Aneesh Kumar K.V Signed-off-by: Christophe Leroy --- arch/powerpc/mm/pgtable-book3s64.c | 3 +++ arch/powerpc/mm/pgtable-frag.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c index 0c0fd173208a..f3c31f5e1026 100644 --- a/arch/powerpc/mm/pgtable-book3s64.c +++ b/arch/powerpc/mm/pgtable-book3s64.c @@ -244,6 +244,9 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm) { void *pmd_frag, *ret; + if (PMD_FRAG_NR == 1) + return NULL; + spin_lock(>page_table_lock); ret = mm->context.pmd_frag; if (ret) { diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c index d61e7c2a9a79..7544d0d7177d 100644 --- a/arch/powerpc/mm/pgtable-frag.c +++ b/arch/powerpc/mm/pgtable-frag.c @@ -34,6 +34,9 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm) { void *pte_frag, *ret; + if (PTE_FRAG_NR == 1) + return NULL; + spin_lock(>page_table_lock); ret = mm->context.pte_frag; if (ret) { -- 2.13.3
[PATCH v7 03/16] powerpc/mm: Move pte_fragment_alloc() to a common location
In preparation of next patch which generalises the use of pte_fragment_alloc() for all, this patch moves the related functions in a place that is common to all subarches. The 8xx will need that for supporting 16k pages, as in that mode page tables still have a size of 4k. Since pte_fragment with only once fragment is not different from what is done in the general case, we can easily migrate all subarchs to pte fragments. Reviewed-by: Aneesh Kumar K.V Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/64/pgalloc.h | 1 + arch/powerpc/mm/Makefile | 4 +- arch/powerpc/mm/mmu_context_book3s64.c | 15 arch/powerpc/mm/pgtable-book3s64.c | 85 arch/powerpc/mm/pgtable-frag.c | 116 +++ 5 files changed, 120 insertions(+), 101 deletions(-) create mode 100644 arch/powerpc/mm/pgtable-frag.c diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h index bfed4cf3b2f3..6c2808c0f052 100644 --- a/arch/powerpc/include/asm/book3s/64/pgalloc.h +++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h @@ -39,6 +39,7 @@ extern struct vmemmap_backing *vmemmap_list; extern struct kmem_cache *pgtable_cache[]; #define PGT_CACHE(shift) pgtable_cache[shift] +void pte_frag_destroy(void *pte_frag); extern pte_t *pte_fragment_alloc(struct mm_struct *, unsigned long, int); extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long); extern void pte_fragment_free(unsigned long *, int); diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index ca96e7be4d0e..3cbb1acf0745 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -15,7 +15,9 @@ obj-$(CONFIG_PPC_MMU_NOHASH) += mmu_context_nohash.o tlb_nohash.o \ obj-$(CONFIG_PPC_BOOK3E) += tlb_low_$(BITS)e.o hash64-$(CONFIG_PPC_NATIVE):= hash_native_64.o obj-$(CONFIG_PPC_BOOK3E_64) += pgtable-book3e.o -obj-$(CONFIG_PPC_BOOK3S_64)+= pgtable-hash64.o hash_utils_64.o slb.o $(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o +obj-$(CONFIG_PPC_BOOK3S_64)+= pgtable-hash64.o hash_utils_64.o slb.o \ + $(hash64-y) mmu_context_book3s64.o \ + pgtable-book3s64.o pgtable-frag.o obj-$(CONFIG_PPC_RADIX_MMU)+= pgtable-radix.o tlb-radix.o obj-$(CONFIG_PPC_STD_MMU_32) += ppc_mmu_32.o hash_low_32.o mmu_context_hash32.o obj-$(CONFIG_PPC_STD_MMU) += tlb_hash$(BITS).o diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c index 510f103d7813..f720c5cc0b5e 100644 --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -164,21 +164,6 @@ static void destroy_contexts(mm_context_t *ctx) } } -static void pte_frag_destroy(void *pte_frag) -{ - int count; - struct page *page; - - page = virt_to_page(pte_frag); - /* drop all the pending references */ - count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT; - /* We allow PTE_FRAG_NR fragments from a PTE page */ - if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) { - pgtable_page_dtor(page); - __free_page(page); - } -} - static void pmd_frag_destroy(void *pmd_frag) { int count; diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c index 9f93c9f985c5..0c0fd173208a 100644 --- a/arch/powerpc/mm/pgtable-book3s64.c +++ b/arch/powerpc/mm/pgtable-book3s64.c @@ -322,91 +322,6 @@ void pmd_fragment_free(unsigned long *pmd) } } -static pte_t *get_pte_from_cache(struct mm_struct *mm) -{ - void *pte_frag, *ret; - - spin_lock(>page_table_lock); - ret = mm->context.pte_frag; - if (ret) { - pte_frag = ret + PTE_FRAG_SIZE; - /* -* If we have taken up all the fragments mark PTE page NULL -*/ - if (((unsigned long)pte_frag & ~PAGE_MASK) == 0) - pte_frag = NULL; - mm->context.pte_frag = pte_frag; - } - spin_unlock(>page_table_lock); - return (pte_t *)ret; -} - -static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel) -{ - void *ret = NULL; - struct page *page; - - if (!kernel) { - page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT); - if (!page) - return NULL; - if (!pgtable_page_ctor(page)) { - __free_page(page); - return NULL; - } - } else { - page = alloc_page(PGALLOC_GFP); - if (!page) - return NULL; - } - - atomic_set(>pt_frag_refcount, 1); - - ret = page_address(page); - /* -* if we support only one fragment just return the -* allocated page.
[PATCH v7 01/16] powerpc/book3s32: Remove CONFIG_BOOKE dependent code
BOOK3S/32 cannot be BOOKE, so remove useless code Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/pgalloc.h | 18 -- arch/powerpc/include/asm/book3s/32/pgtable.h | 14 -- 2 files changed, 32 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h index 96138ab3ddd6..a70f3cf16dc8 100644 --- a/arch/powerpc/include/asm/book3s/32/pgalloc.h +++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h @@ -47,8 +47,6 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) #define __pmd_free_tlb(tlb,x,a)do { } while (0) /* #define pgd_populate(mm, pmd, pte) BUG() */ -#ifndef CONFIG_BOOKE - static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *pte) { @@ -62,22 +60,6 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp, } #define pmd_pgtable(pmd) pmd_page(pmd) -#else - -static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, - pte_t *pte) -{ - *pmdp = __pmd((unsigned long)pte | _PMD_PRESENT); -} - -static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp, - pgtable_t pte_page) -{ - *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) | _PMD_PRESENT); -} - -#define pmd_pgtable(pmd) pmd_page(pmd) -#endif extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr); extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr); diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index c21d33704633..32c33eccc0e2 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -328,24 +328,10 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma, #define __HAVE_ARCH_PTE_SAME #define pte_same(A,B) (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0) -/* - * Note that on Book E processors, the pmd contains the kernel virtual - * (lowmem) address of the pte page. The physical address is less useful - * because everything runs with translation enabled (even the TLB miss - * handler). On everything else the pmd contains the physical address - * of the pte page. -- paulus - */ -#ifndef CONFIG_BOOKE #define pmd_page_vaddr(pmd)\ ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK)) #define pmd_page(pmd) \ pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT) -#else -#define pmd_page_vaddr(pmd)\ - ((unsigned long) (pmd_val(pmd) & PAGE_MASK)) -#define pmd_page(pmd) \ - pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT)) -#endif /* to find an entry in a kernel page-table-directory */ #define pgd_offset_k(address) pgd_offset(_mm, address) -- 2.13.3
[PATCH v7 00/16] Implement use of HW assistance on TLB table walk on 8xx
The purpose of this serie is to implement hardware assistance for TLB table walk on the 8xx. First part prepares for using HW assistance in TLB routines: - Reverts a former patch which broke SWAP on the 8xx - move book3s64 page fragment code in a common part for reusing it by the 8xx as 16k page size mode still uses 4k page tables. - switches to patch_site instead of patch_instruction, as it makes the code clearer and avoids pollution with global symbols. - Optimise access to perf counters (hence reducing number of registers used) Second part implements HW assistance in TLB routines in the following steps: - Disable 16k page size mode and 512k hugepages - Switch 4k to HW assistance - Bring back 512k hugepages - Bring back 16k page size mode. Tested successfully on 8xx and 83xx (book3s/32) Changes in v7: - Reordered to get trivial and already reviewed patches in front. - Reordered to regroup all HW assistance related patches together. - Rebased on today merge branch (28 Nov) - Added a helper for access to mm_context_t.frag - Reduced the amount of changes in PPC32 to support pte_fragment - Applied pte_fragment to both nohash/32 and book3s/32 Changes in v6: - Droped the part related to handling GUARD attribute at PGD/PMD level. - Moved the commonalisation of page_fragment in the begining (this part has been reviewed by Aneesh) - Rebased on today merge branch (19 Oct) Changes in v5: - Also avoid useless lock in get_pmd_from_cache() - A new patch to relocate mmu headers in platform specific directories - A new patch to distribute pgtable_t typedefs in platform specific mmu headers instead of the uggly #ifdef - Moved early_pte_alloc_kernel() in platform specific pgalloc - Restricted definition of PTE_FRAG_SIZE and PTE_FRAG_NR to platforms using the pte fragmentation. - arch_exit_mmap() and destroy_pagetable_cache() are now platform specific. Changes in v4: - Reordered the serie to put at the end the modifications which makes L1 and L2 entries independant. - No modifications to ppc64 ioremap (we still have an opportunity to merge them, for a future patch serie) - 8xx code modified to use patch_site instead of patch_instruction to get a clearer code and avoid object pollution with global symbols - Moved perf counters in first 32kb of memory to optimise access - Split the big bang to HW assistance in several steps: 1. Temporarily removes support of 16k pages and 512k hugepages 2. Change TLB routines to use HW assistance for 4k pages and 8M hugepages 3. Add back support for 512k hugepages 4. Add back support for 16k pages (using pte_fragment as page tables are still 4k) Changes in v3: - Fixed an issue in the 09/14 when CONFIG_PIN_TLB_TEXT was not enabled - Added performance measurement in the 09/14 commit log - Rebased on latest 'powerpc/merge' tree, which conflicted with 13/14 Changes in v2: - Removed the 3 first patchs which have been applied already - Fixed compilation errors reported by Michael - Squashed the commonalisation of ioremap functions into a single patch - Fixed the use of pte_fragment - Added a patch optimising perf counting of TLB misses and instructions Christophe Leroy (16): powerpc/book3s32: Remove CONFIG_BOOKE dependent code powerpc/8xx: Remove PTE_ATOMIC_UPDATES powerpc/mm: Move pte_fragment_alloc() to a common location powerpc/mm: Avoid useless lock with single page fragments powerpc/mm: move platform specific mmu-xxx.h in platform directories powerpc/mm: Move pgtable_t into platform headers powerpc/mm: add helpers to get/set mm.context->pte_frag powerpc/mm: Extend pte_fragment functionality to PPC32 powerpc/8xx: Move SW perf counters in first 32kb of memory powerpc/8xx: Temporarily disable 16k pages and hugepages powerpc/mm: Use hardware assistance in TLB handlers on the 8xx powerpc/mm: Enable 8M hugepage support with HW assistance on the 8xx powerpc/mm: Enable 512k hugepage support with HW assistance on the 8xx powerpc/mm: reintroduce 16K pages with HW assistance on 8xx powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers powerpc/8xx: regroup TLB handler routines arch/powerpc/include/asm/book3s/32/mmu-hash.h | 5 + arch/powerpc/include/asm/book3s/32/pgalloc.h | 36 +- arch/powerpc/include/asm/book3s/32/pgtable.h | 19 +- arch/powerpc/include/asm/book3s/64/mmu.h | 9 + arch/powerpc/include/asm/book3s/64/pgalloc.h | 1 + arch/powerpc/include/asm/hugetlb.h | 4 +- arch/powerpc/include/asm/mmu.h | 14 +- arch/powerpc/include/asm/mmu_context.h | 2 +- arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h | 0 arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h | 0 arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h | 1 + arch/powerpc/include/asm/nohash/32/mmu.h | 25 ++ arch/powerpc/include/asm/nohash/32/pgalloc.h | 23 +- arch/powerpc/include/asm/nohash/32/pgtable.h
[PATCH] powerpc/8xx: hide itlbie and dtlbie symbols
When disassembling InstructionTLBError we get the following messy code: c000138c: 7d 84 63 78 mr r4,r12 c0001390: 75 25 58 00 andis. r5,r9,22528 c0001394: 75 2a 40 00 andis. r10,r9,16384 c0001398: 41 a2 00 08 beq c00013a0 c000139c: 7c 00 22 64 tlbie r4,r0 c00013a0 : c00013a0: 39 40 04 01 li r10,1025 c00013a4: 91 4b 00 b0 stw r10,176(r11) c00013a8: 39 40 10 32 li r10,4146 c00013ac: 48 00 cc 59 bl c000e004 For a cleaner code dump, this patch replaces itlbie and dtlbie symbols by numeric symbols. c000138c: 7d 84 63 78 mr r4,r12 c0001390: 75 25 58 00 andis. r5,r9,22528 c0001394: 75 2a 40 00 andis. r10,r9,16384 c0001398: 41 a2 00 08 beq c00013a0 c000139c: 7c 00 22 64 tlbie r4,r0 c00013a0: 39 40 04 01 li r10,1025 c00013a4: 91 4b 00 b0 stw r10,176(r11) c00013a8: 39 40 10 32 li r10,4146 c00013ac: 48 00 cc 59 bl c000e004 Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 3b67b9533c82..8c848acfe249 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -552,11 +552,10 @@ InstructionTLBError: mr r4,r12 andis. r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */ andis. r10,r9,SRR1_ISI_NOPT@h - beq+1f + beq+1301f tlbie r4 -itlbie: /* 0x400 is InstructionAccess exception, needed by bad_page_fault() */ -1: EXC_XFER_LITE(0x400, handle_page_fault) +1301: EXC_XFER_LITE(0x400, handle_page_fault) /* This is the data TLB error on the MPC8xx. This could be due to * many reasons, including a dirty update to a pte. We bail out to @@ -578,10 +577,9 @@ DARFixed:/* Return from dcbx instruction bug workaround */ stw r5,_DSISR(r11) mfspr r4,SPRN_DAR andis. r10,r5,DSISR_NOHPTE@h - beq+1f + beq+1401f tlbie r4 -dtlbie: -1: li r10,RPN_PATTERN +1401: li r10,RPN_PATTERN mtspr SPRN_DAR,r10/* Tag DAR, to be used in DTLB Error */ /* 0x300 is DataAccess exception, needed by bad_page_fault() */ EXC_XFER_LITE(0x300, handle_page_fault) @@ -604,8 +602,8 @@ DataBreakpoint: mtspr SPRN_SPRG_SCRATCH1, r11 mfcrr10 mfspr r11, SPRN_SRR0 - cmplwi cr0, r11, (dtlbie - PAGE_OFFSET)@l - cmplwi cr7, r11, (itlbie - PAGE_OFFSET)@l + cmplwi cr0, r11, (1401b - PAGE_OFFSET)@l + cmplwi cr7, r11, (1301b - PAGE_OFFSET)@l beq-cr0, 11f beq-cr7, 11f EXCEPTION_PROLOG_1 -- 2.13.3
Re: use generic DMA mapping code in powerpc V4
Christoph Hellwig writes: > Any comments? I'd like to at least get the ball moving on the easy > bits. Nothing specific yet. I'm a bit worried it might break one of the many old obscure platforms we have that aren't well tested. There's not much we can do about that, but I'll just try and test it on everything I can find. Is the plan that you take these via the dma-mapping tree or that they go via powerpc? cheers > On Wed, Nov 14, 2018 at 09:22:40AM +0100, Christoph Hellwig wrote: >> Hi all, >> >> this series switches the powerpc port to use the generic swiotlb and >> noncoherent dma ops, and to use more generic code for the coherent >> direct mapping, as well as removing a lot of dead code. >> >> As this series is very large and depends on the dma-mapping tree I've >> also published a git tree: >> >> git://git.infradead.org/users/hch/misc.git powerpc-dma.4 >> >> Gitweb: >> >> >> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.4 >> >> Changes since v3: >> - rebase on the powerpc fixes tree >> - add a new patch to actually make the baseline amigaone config >>configure without warnings >> - only use ZONE_DMA for 64-bit embedded CPUs, on pseries an IOMMU is >>always present >> - fix compile in mem.c for one configuration >> - drop the full npu removal for now, will be resent separately >> - a few git bisection fixes >> >> The changes since v1 are to big to list and v2 was not posted in public. >> >> ___ >> iommu mailing list >> io...@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/iommu > ---end quoted text---
Re: [PATCH 0/1] Fix NULL pointer access in PowerPC MSI teardown code
Hi Radu, Radu Rendec writes: > Hi everyone, > > It seems there's an unchecked access to a NULL pointer (to a function) > in the PowerPC MSI teardown code. I found this on kernel 4.9, but the > code looks identical in the latest 4.20-rc. I don't see any reason why > this wouldn't happen on recent kernels too. > > The PowerPC architecture specific MSI setup and teardown functions are > in arch/powerpc/kernel/msi.c: > > * arch_setup_msi_irqs() checks pointers for both the setup_msi_irqs > and teardown_msi_irqs ops and returns -ENOSYS if either one is NULL. > > * arch_teardown_msi_irqs() calls on the teardown_msi_irqs op pointer > without checking it and assumes the function is never called unless > arch_setup_msi_irqs() returns successfully. > > The assumption in arch_teardown_msi_irqs() is wrong and results in a > function call on a NULL pointer. An example of how this can happen is > included in the actual patch header. In my case, it happens when the PCI > hardware is configured during kernel start-up, because my controller > doesn't support MSI and the ops are NULL. What hardware are you on? > I'm proposing the attached patch to fix the problem. It basically just > checks the pointer before the function call. Yeah that patch looks good to me. I suspect this bug was introduced in: 6b2fd7efeb88 ("PCI/MSI/PPC: Remove arch_msi_check_device()") Previously we had that check routine which would run before any of the MSI setup had been done, and so if there were no MSI ops then we bailed out early and didn't call teardown. I guess since then (2014) we haven't tested an MSI capable device on a system that isn't MSI capable? cheers
Re: [RFC PATCH v1 5/6] powerpc/mm: Add a framework for Kernel Userspace Access Protection
Le 22/11/2018 à 02:25, Russell Currey a écrit : On Wed, 2018-11-21 at 09:32 +0100, Christophe LEROY wrote: Le 21/11/2018 à 03:26, Russell Currey a écrit : On Wed, 2018-11-07 at 16:56 +, Christophe Leroy wrote: This patch implements a framework for Kernel Userspace Access Protection. Then subarches will have to possibility to provide their own implementation by providing setup_kuap(), and lock/unlock_user_rd/wr_access We separate read and write accesses because some subarches like book3s32 might only support write access protection. Signed-off-by: Christophe Leroy Separating read and writes does have a performance impact, I'm doing some benchmarking to find out exactly how much - but at least for radix it means we have to do a RMW instead of just a write. It does add some amount of security, though. The other issue I have is that you're just locking everything here (like I was), and not doing anything different for just reads or writes. In theory, wouldn't someone assume that they could (for example) unlock reads, lock writes, then attempt to read? At which point the read would fail, because the lock actually locks both. I would think we either need to bundle read/write locking/unlocking together, or only implement this on platforms that can do one at a time, unless there's a cleaner way to handle this. Glancing at the values you use for 8xx, this doesn't seem possible there, and it's a definite performance hit for radix. At the same time, as you say, it would suck for book3s32 that can only do writes, but maybe just doing both at the same time and if implemented for that platform it could just have a warning that it only applies to writes on init? Well, I see your points. My idea was not to separate read and write on platform that can lock both. I think it is no problem to also unlocking writes when we are doing a read, so on platforms that can do both I think both should do the same.. The idea was to avoid spending time unlocking writes for doing a read on platforms on which reads are not locked. And for platforms able to independently unlock/lock reads and writes, if only unlocking reads can improve performance it can be interesting as well. For book3s/32, locking/unlocking will be done through Kp/Ks bits in segment registers, the function won't be trivial as it may involve more than one segment at a time. So I just wanted to avoid spending time doing that for reads as reads won't be protected. And may also be the case on older book3s/64, may not it ? On Book3s/32, the page protection bits are as follows: Key 0 1 PP 00 RW NA 01 RW RO 10 RW RW 11 RO RO So the idea is to encode user RW with PP01 (instead of PP10 today) and user RO with PP11 (as done today), giving Key0 to user and Key1 to kernel (today both user and kernel have Key1). Then when kernel needs to write, we change Ks to Key0 in segment register for the involved segments. I'm not sure there is any risk that someone nests unlocks/locks for reads and unlocks/locks for writes, because the unlocks/locks are done in very limited places. Yeah I don't think it's a risk since the scope is so limited, it just needs to be clearly documented that locking/unlocking reads/writes could have the side effect of covering the other. My concern is less about a problem in practice as much as functions that only don't exactly do what the function name says. Another option is to again have a single lock/unlock function that takes a bitmask (so read/write/both), which due to being a singular function might be a bit more obvious that it could lock/unlock everything, but at this point I'm just bikeshedding. In order to support book3s/32, I needed to add arguments to the unlock/lock functions, as the address is needed to identify the affected segments. Therefore, I changed it to single functions as you suggested. These functions have 'to', 'from' and 'size' arguments. When it is a read, 'to' is NULL. When it is a write, 'from' is NULL. When it is a copy, none is NULL. See RFC v2 for the details. Christophe Doing it this way should be fine, I'm just cautious that some future developer might be caught off guard. Planning on sending my series based on top of yours for radix today. - Russell Christophe Curious for people's thoughts on this. - Russell
Re: [RFC PATCH v2 07/11] powerpc/mm/radix: Use KUEP API for Radix MMU
Sorry, I forgot to reset the author so the patch appears as coming from yourself. Le 28/11/2018 à 10:27, Russell Currey a écrit : Execution protection already exists on radix, this just refactors the radix init to provide the KUEP setup function instead. Thus, the only functional change is that it can now be disabled. Signed-off-by: Russell Currey Signed-off-by: Christophe Leroy --- arch/powerpc/mm/pgtable-radix.c| 9 ++--- arch/powerpc/platforms/Kconfig.cputype | 1 + 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c index 931156069a81..45aa9e501e76 100644 --- a/arch/powerpc/mm/pgtable-radix.c +++ b/arch/powerpc/mm/pgtable-radix.c @@ -535,8 +535,13 @@ static void radix_init_amor(void) mtspr(SPRN_AMOR, (3ul << 62)); } -static void radix_init_iamr(void) +void setup_kuep(bool disabled) { + if (disabled) + return; + + pr_info("Activating Kernel Userspace Execution Prevention\n"); + /* * Radix always uses key0 of the IAMR to determine if an access is * allowed. We set bit 0 (IBM bit 1) of key0, to prevent instruction @@ -605,7 +610,6 @@ void __init radix__early_init_mmu(void) memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE); - radix_init_iamr(); radix_init_pgtable(); /* Switch to the guard PID before turning on MMU */ radix__switch_mmu_context(NULL, _mm); @@ -627,7 +631,6 @@ void radix__early_init_mmu_secondary(void) __pa(partition_tb) | (PATB_SIZE_SHIFT - 12)); radix_init_amor(); } - radix_init_iamr(); radix__switch_mmu_context(NULL, _mm); if (cpu_has_feature(CPU_FTR_HVMODE)) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index a20669a9ec13..e6831d0ec159 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -334,6 +334,7 @@ config PPC_RADIX_MMU bool "Radix MMU Support" depends on PPC_BOOK3S_64 select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA + select PPC_HAVE_KUEP default y help Enable support for the Power ISA 3.0 Radix style MMU. Currently this
Re: [PATCH 4/4] powerpc/64s: Implement KUAP for Radix MMU
On 11/22/2018 02:04 PM, Russell Currey wrote: Kernel Userspace Access Prevention utilises a feature of the Radix MMU which disallows read and write access to userspace addresses. By utilising this, the kernel is prevented from accessing user data from outside of trusted paths that perform proper safety checks, such as copy_{to/from}_user() and friends. Userspace access is disabled from early boot and is only enabled when: - exiting the kernel and entering userspace - performing an operation like copy_{to/from}_user() - context switching to a process that has access enabled and similarly, access is disabled again when exiting userspace and entering the kernel. This feature has a slight performance impact which I roughly measured to be 3% slower in the worst case (performing 1GB of 1 byte read()/write() syscalls), and is gated behind the CONFIG_PPC_KUAP option for performance-critical builds. This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and performing the following: echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT if enabled, this should send SIGSEGV to the thread. Signed-off-by: Russell Currey I squashed the paca thing into this one in RFC v2 Christophe --- arch/powerpc/include/asm/book3s/64/radix.h | 43 ++ arch/powerpc/include/asm/exception-64e.h | 3 ++ arch/powerpc/include/asm/exception-64s.h | 19 +- arch/powerpc/include/asm/mmu.h | 9 - arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/entry_64.S | 16 +++- arch/powerpc/mm/pgtable-radix.c| 12 ++ arch/powerpc/mm/pkeys.c| 7 +++- arch/powerpc/platforms/Kconfig.cputype | 1 + 9 files changed, 105 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h index 7d1a3d1543fc..9af93d05e6fa 100644 --- a/arch/powerpc/include/asm/book3s/64/radix.h +++ b/arch/powerpc/include/asm/book3s/64/radix.h @@ -284,5 +284,48 @@ static inline unsigned long radix__get_tree_size(void) int radix__create_section_mapping(unsigned long start, unsigned long end, int nid); int radix__remove_section_mapping(unsigned long start, unsigned long end); #endif /* CONFIG_MEMORY_HOTPLUG */ + +#ifdef CONFIG_PPC_KUAP +#include +/* + * We do have the ability to individually lock/unlock reads and writes rather + * than both at once, however it's a significant performance hit due to needing + * to do a read-modify-write, which adds a mfspr, which is slow. As a result, + * locking/unlocking both at once is preferred. + */ +static inline void __unlock_user_rd_access(void) +{ + if (!mmu_has_feature(MMU_FTR_RADIX_KUAP)) + return; + + mtspr(SPRN_AMR, 0); + isync(); +} + +static inline void __lock_user_rd_access(void) +{ + if (!mmu_has_feature(MMU_FTR_RADIX_KUAP)) + return; + + mtspr(SPRN_AMR, AMR_LOCKED); +} + +static inline void __unlock_user_wr_access(void) +{ + if (!mmu_has_feature(MMU_FTR_RADIX_KUAP)) + return; + + mtspr(SPRN_AMR, 0); + isync(); +} + +static inline void __lock_user_wr_access(void) +{ + if (!mmu_has_feature(MMU_FTR_RADIX_KUAP)) + return; + + mtspr(SPRN_AMR, AMR_LOCKED); +} +#endif /* CONFIG_PPC_KUAP */ #endif /* __ASSEMBLY__ */ #endif diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h index 555e22d5e07f..bf25015834ee 100644 --- a/arch/powerpc/include/asm/exception-64e.h +++ b/arch/powerpc/include/asm/exception-64e.h @@ -215,5 +215,8 @@ exc_##label##_book3e: #define RFI_TO_USER \ rfi +#define UNLOCK_USER_ACCESS(reg) +#define LOCK_USER_ACCESS(reg) + #endif /* _ASM_POWERPC_EXCEPTION_64E_H */ diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 3b4767ed3ec5..d92614c66d87 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -264,6 +264,19 @@ BEGIN_FTR_SECTION_NESTED(943) \ std ra,offset(r13); \ END_FTR_SECTION_NESTED(ftr,ftr,943) +#define LOCK_USER_ACCESS(reg) \ +BEGIN_MMU_FTR_SECTION_NESTED(944) \ + LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \ + mtspr SPRN_AMR,reg; \ +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KUAP,MMU_FTR_RADIX_KUAP,944) + +#define UNLOCK_USER_ACCESS(reg) \ +BEGIN_MMU_FTR_SECTION_NESTED(945) \ + li reg,0; \ + mtspr SPRN_AMR,reg;
Re: [PATCH 2/4] powerpc/64: Setup KUP before feature fixups
On 11/22/2018 02:04 PM, Russell Currey wrote: The subsequent implementation of KUAP for radix makes use of a MMU feature in order to patch out assembly when KUAP is disabled or unsupported. This won't work unless there's an entry point for KUP support before the feature magic happens, so relocate setup_kup() earlier in setup. Signed-off-by: Russell Currey I squashed it in my RFC v2 Christophe --- arch/powerpc/kernel/setup_64.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 0f4e06ab70a5..cc20dc3e7b69 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -331,6 +331,12 @@ void __init early_setup(unsigned long dt_ptr) */ configure_exceptions(); + /* +* Configure Kernel Userspace Protection. This needs to happen before +* feature fixups for platforms that implement this using features. +*/ + setup_kup(); + /* Apply all the dynamic patching */ apply_feature_fixups(); setup_feature_keys(); @@ -372,7 +378,6 @@ void __init early_setup(unsigned long dt_ptr) */ btext_map(); #endif /* CONFIG_PPC_EARLY_DEBUG_BOOTX */ - setup_kup(); } #ifdef CONFIG_SMP
Re: [PATCH 1/4] powerpc: Track KUAP state in the PACA
On 11/22/2018 02:04 PM, Russell Currey wrote: Necessary for subsequent patches that enable KUAP support for radix. Could plausibly be useful for other platforms too, if similar to the radix case, reading the register that manages these accesses is costly. Has the unfortunate downside of another layer of abstraction for platforms that implement the locks and unlocks, but this could be useful in future for other things too, like counters for benchmarking or smartly handling lots of small accesses at once. Signed-off-by: Russell Currey Build failure. [root@po14163vm linux-powerpc]# make mpc885_ads_defconfig # # configuration written to .config # [root@po14163vm linux-powerpc]# make scripts/kconfig/conf --syncconfig Kconfig UPD include/config/kernel.release UPD include/generated/utsrelease.h CC kernel/bounds.s CC arch/powerpc/kernel/asm-offsets.s In file included from ./include/linux/uaccess.h:14:0, from ./include/linux/compat.h:19, from arch/powerpc/kernel/asm-offsets.c:16: ./arch/powerpc/include/asm/uaccess.h: In function ‘unlock_user_rd_access’: ./arch/powerpc/include/asm/uaccess.h:70:2: error: implicit declaration of function ‘get_paca’ [-Werror=implicit-function-declaration] get_paca()->user_access_allowed = 1; ^ ./arch/powerpc/include/asm/uaccess.h:70:12: error: invalid type argument of ‘->’ (have ‘int’) get_paca()->user_access_allowed = 1; ^ ./arch/powerpc/include/asm/uaccess.h: In function ‘lock_user_rd_access’: ./arch/powerpc/include/asm/uaccess.h:75:12: error: invalid type argument of ‘->’ (have ‘int’) get_paca()->user_access_allowed = 0; ^ ./arch/powerpc/include/asm/uaccess.h: In function ‘unlock_user_wr_access’: ./arch/powerpc/include/asm/uaccess.h:80:12: error: invalid type argument of ‘->’ (have ‘int’) get_paca()->user_access_allowed = 1; ^ ./arch/powerpc/include/asm/uaccess.h: In function ‘lock_user_wr_access’: ./arch/powerpc/include/asm/uaccess.h:85:12: error: invalid type argument of ‘->’ (have ‘int’) get_paca()->user_access_allowed = 0; ^ cc1: some warnings being treated as errors make[1]: *** [arch/powerpc/kernel/asm-offsets.s] Error 1 make: *** [prepare0] Error 2 Christophe --- this is all because I can't do PACA things from radix.h and I spent an hour figuring this out at midnight --- arch/powerpc/include/asm/nohash/32/pte-8xx.h | 8 +++ arch/powerpc/include/asm/paca.h | 3 +++ arch/powerpc/include/asm/uaccess.h | 23 +++- arch/powerpc/kernel/asm-offsets.c| 1 + 4 files changed, 30 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h index f1ec7cf949d5..7bc0955a56e9 100644 --- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h @@ -137,22 +137,22 @@ static inline pte_t pte_mkhuge(pte_t pte) #define pte_mkhuge pte_mkhuge #ifdef CONFIG_PPC_KUAP -static inline void lock_user_wr_access(void) +static inline void __lock_user_wr_access(void) { mtspr(SPRN_MD_AP, MD_APG_KUAP); } -static inline void unlock_user_wr_access(void) +static inline void __unlock_user_wr_access(void) { mtspr(SPRN_MD_AP, MD_APG_INIT); } -static inline void lock_user_rd_access(void) +static inline void __lock_user_rd_access(void) { mtspr(SPRN_MD_AP, MD_APG_KUAP); } -static inline void unlock_user_rd_access(void) +static inline void __unlock_user_rd_access(void) { mtspr(SPRN_MD_AP, MD_APG_INIT); } diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index e843bc5d1a0f..56236f6d8c89 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -169,6 +169,9 @@ struct paca_struct { u64 saved_r1; /* r1 save for RTAS calls or PM or EE=0 */ u64 saved_msr; /* MSR saved here by enter_rtas */ u16 trap_save; /* Used when bad stack is encountered */ +#ifdef CONFIG_PPC_KUAP + u8 user_access_allowed; /* can the kernel access user memory? */ +#endif u8 irq_soft_mask; /* mask for irq soft masking */ u8 irq_happened;/* irq happened while soft-disabled */ u8 io_sync; /* writel() needs spin_unlock sync */ diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 2f3625cbfcee..76dae1095f7e 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -63,7 +63,28 @@ static inline int __access_ok(unsigned long addr, unsigned long size, #endif -#ifndef CONFIG_PPC_KUAP +#ifdef CONFIG_PPC_KUAP +static inline void unlock_user_rd_access(void) +{ + __unlock_user_rd_access(); + get_paca()->user_access_allowed = 1; +} +static inline void
[RFC PATCH v2 11/11] powerpc/book3s32: Implement Kernel Userspace Access Protection
This patch implements Kernel Userspace Access Protection for book3s/32. Due to limitations of the processor page protection capabilities, the protection is only against writing. read protection cannot be achieved using page protection. In order to provide the protection, Ku and Ks keys are modified in Userspace Segment registers, and different PP bits are used to: PP01 provides RW for Key 0 and RO for Key 1 PP10 provides RW for all PP11 provides RO for all Today PP10 is used for RW pages and PP11 for RO pages. This patch modifies page protection to PP01 for RW pages. Then segment registers are set to Ku 0 and Ks 1. When kernel needs to write to RW pages, the associated segment register is changed to Ks 0 in order to allow write access to the kernel. In order to avoid having the read all segment registers when locking/unlocking the access, some data is kept in the thread_struct and saved on stack on exceptions. The field identifies both the first unlocked segment and the first segment following the last unlocked one. When no segment is unlocked, it contains value 0. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/kup.h | 98 arch/powerpc/include/asm/kup.h | 3 + arch/powerpc/kernel/head_32.S| 2 +- arch/powerpc/mm/ppc_mmu_32.c | 10 arch/powerpc/platforms/Kconfig.cputype | 1 + 5 files changed, 113 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/book3s/32/kup.h diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h new file mode 100644 index ..7455ecaab3f9 --- /dev/null +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -0,0 +1,98 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_BOOK3S_32_KUP_H +#define _ASM_POWERPC_BOOK3S_32_KUP_H + +#ifdef CONFIG_PPC_KUAP +#define LOCK_USER_ACCESS(val, sp, sr, srmax, thread) \ + lwz sr, KUAP(thread); \ + stw sr, _KUAP(sp); \ + cmpli cr7, sr, 0; \ + beq+cr7, 102f; \ + li val, 0; \ + stw val, KUAP(thread); \ + rlwinm srmax, sr, 28, 0xf000; \ + mfsrin val, sr;\ + orisval, val ,0x4000; /* Set Ks */\ +101: \ + mtsrin val, sr;\ + addival, val, 0x111;/* next VSID */ \ + rlwinm val, val, 0, 8, 3; /* clear VSID overflow */ \ + addis sr, sr, 0x1000; /* address of next segment */ \ + cmplcr7, sr, srmax; \ + blt-cr7, 101b; \ +102: + +#define REST_USER_ACCESS(val, sp, sr, srmax, curr) \ + lwz sr, _KUAP(sp); \ + stw sr, THREAD+KUAP(curr); \ + cmpli cr7, sr, 0; \ + beq+cr7, 102f; \ + rlwinm srmax, sr, 28, 0xf000; \ + mfsrin val, sr;\ + rlwinm val, val ,0, ~0x4000; /* Clear Ks */ \ +101: \ + mtsrin val, sr;\ + addival, val, 0x111;/* next VSID */ \ + rlwinm val, val, 0, 8, 3; /* clear VSID overflow */ \ + addis sr, sr, 0x1000; /* address of next segment */ \ + cmplcr7, sr, srmax; \ + blt-cr7, 101b; \ +102: + +#define KUAP_START 0 +#endif + +#ifndef __ASSEMBLY__ +#ifdef CONFIG_PPC_KUAP + +#include + +static inline void lock_user_access(void __user *to, const void __user *from, + unsigned long size) +{ + unsigned long addr = (unsigned long)to; + unsigned long end = addr + size; + unsigned long sr; + + if (!to) + return; + + current->thread.kuap = 0; + sr = mfsrin(addr); + sr |= 0x4000; /* set Ks */ + mb(); /* make sure all writes are done before SR are updated */ + while (addr < end) { + mtsrin(sr, addr); + sr += 0x111;/* next VSID */ +
[RFC PATCH v2 10/11] powerpc/book3s32: Prepare Kernel Userspace Access Protection
This patch prepares Kernel Userspace Access Protection for book3s/32. Due to limitations of the processor page protection capabilities, the protection is only against writing. read protection cannot be achieved using page protection. In order to provide the protection, Ku and Ks keys are modified in Userspace Segment registers, and different PP bits are used to: PP01 provides RW for Key 0 and RO for Key 1 PP10 provides RW for all PP11 provides RO for all Today PP10 is used for RW pages and PP11 for RO pages, SR Ku and Ks set to 1. This patch modifies page protection to user PP01 for RW pages. Then segment registers are set to Ku 0 and Ks 0. This will allow to setup Userspace write access protection by settng Ks to 1 in the following patch. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.S | 20 +++- arch/powerpc/mm/hash_low_32.S | 6 +++--- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 61ca27929355..1aca0dba0ec1 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -522,13 +522,13 @@ InstructionTLBMiss: */ stw r0,0(r2)/* update PTE (accessed bit) */ /* Convert linux-style PTE to low word of PPC-style PTE */ - rlwinm r1,r0,32-10,31,31 /* _PAGE_RW -> PP lsb */ - rlwinm r2,r0,32-7,31,31/* _PAGE_DIRTY -> PP lsb */ + rlwinm r1,r0,32-9,30,30/* _PAGE_RW -> PP msb */ + rlwinm r2,r0,32-6,30,30/* _PAGE_DIRTY -> PP msb */ and r1,r1,r2/* writable if _RW and _DIRTY */ rlwimi r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */ rlwimi r0,r0,32-1,31,31/* _PAGE_USER -> PP lsb */ ori r1,r1,0xe04 /* clear out reserved bits */ - andcr1,r0,r1/* PP = user? (rw? 2: 3): 0 */ + andcr1,r0,r1/* PP = user? (rw? 1: 3): 0 */ BEGIN_FTR_SECTION rlwinm r1,r1,0,~_PAGE_COHERENT /* clear M (coherence not required) */ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT) @@ -596,8 +596,8 @@ DataLoadTLBMiss: */ stw r0,0(r2)/* update PTE (accessed bit) */ /* Convert linux-style PTE to low word of PPC-style PTE */ - rlwinm r1,r0,32-10,31,31 /* _PAGE_RW -> PP lsb */ - rlwinm r2,r0,32-7,31,31/* _PAGE_DIRTY -> PP lsb */ + rlwinm r1,r0,32-9,30,30/* _PAGE_RW -> PP msb */ + rlwinm r2,r0,32-6,30,30/* _PAGE_DIRTY -> PP msb */ and r1,r1,r2/* writable if _RW and _DIRTY */ rlwimi r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */ rlwimi r0,r0,32-1,31,31/* _PAGE_USER -> PP lsb */ @@ -680,9 +680,9 @@ DataStoreTLBMiss: */ stw r0,0(r2)/* update PTE (accessed/dirty bits) */ /* Convert linux-style PTE to low word of PPC-style PTE */ - rlwimi r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */ - li r1,0xe05/* clear out reserved bits & PP lsb */ - andcr1,r0,r1/* PP = user? 2: 0 */ + rlwimi r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */ + li r1,0xe06/* clear out reserved bits & PP msb */ + andcr1,r0,r1/* PP = user? 1: 0 */ BEGIN_FTR_SECTION rlwinm r1,r1,0,~_PAGE_COHERENT /* clear M (coherence not required) */ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT) @@ -1014,7 +1014,9 @@ _ENTRY(switch_mmu_context) blt-4f mulli r3,r3,897 /* multiply context by skew factor */ rlwinm r3,r3,4,8,27/* VSID = (context & 0xf) << 4 */ - addis r3,r3,0x6000/* Set Ks, Ku bits */ +#ifdef CONFIG_PPC_KUAP + addis r3,r3,0x4000/* Set Ks, clear Ku bits */ +#endif li r0,NUM_USER_SEGMENTS mtctr r0 diff --git a/arch/powerpc/mm/hash_low_32.S b/arch/powerpc/mm/hash_low_32.S index 26acf6c8c20c..0e549eb91823 100644 --- a/arch/powerpc/mm/hash_low_32.S +++ b/arch/powerpc/mm/hash_low_32.S @@ -316,13 +316,13 @@ Hash_msk = (((1 << Hash_bits) - 1) * 64) _GLOBAL(create_hpte) /* Convert linux-style PTE (r5) to low word of PPC-style PTE (r8) */ - rlwinm r8,r5,32-10,31,31 /* _PAGE_RW -> PP lsb */ - rlwinm r0,r5,32-7,31,31/* _PAGE_DIRTY -> PP lsb */ + rlwinm r8,r5,32-9,30,30/* _PAGE_RW -> PP msb */ + rlwinm r0,r5,32-6,30,30/* _PAGE_DIRTY -> PP msb */ and r8,r8,r0/* writable if _RW & _DIRTY */ rlwimi r5,r5,32-1,30,30/* _PAGE_USER -> PP msb */ rlwimi r5,r5,32-2,31,31/* _PAGE_USER -> PP lsb */ ori r8,r8,0xe04 /* clear out reserved bits */ - andcr8,r5,r8/* PP = user? (rw? 2: 3): 0 */ + andcr8,r5,r8/* PP =
[RFC PATCH v2 09/11] powerpc/32: add helper to write into segment registers
This patch add an helper which wraps 'mtsrin' instruction to write into segment registers. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/reg.h | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index d9598e6790d8..6b5d2a61af5a 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1424,6 +1424,11 @@ static inline void msr_check_and_clear(unsigned long bits) #define mfsrin(v) ({unsigned int rval; \ asm volatile("mfsrin %0,%1" : "=r" (rval) : "r" (v)); \ rval;}) + +static inline void mtsrin(u32 val, u32 idx) +{ + asm volatile("mtsrin %0, %1" : : "r" (val), "r" (idx)); +} #endif #define proc_trap()asm volatile("trap") -- 2.13.3
[RFC PATCH v2 08/11] powerpc/64s: Implement KUAP for Radix MMU
Kernel Userspace Access Prevention utilises a feature of the Radix MMU which disallows read and write access to userspace addresses. By utilising this, the kernel is prevented from accessing user data from outside of trusted paths that perform proper safety checks, such as copy_{to/from}_user() and friends. Userspace access is disabled from early boot and is only enabled when: - exiting the kernel and entering userspace - performing an operation like copy_{to/from}_user() - context switching to a process that has access enabled and similarly, access is disabled again when exiting userspace and entering the kernel. This feature has a slight performance impact which I roughly measured to be 3% slower in the worst case (performing 1GB of 1 byte read()/write() syscalls), and is gated behind the CONFIG_PPC_KUAP option for performance-critical builds. This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and performing the following: echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT if enabled, this should send SIGSEGV to the thread. The KUAP state is tracked in the PACA because reading the register that manages these accesses is costly. This Has the unfortunate downside of another layer of abstraction for platforms that implement the locks and unlocks, but this could be useful in future for other things too, like counters for benchmarking or smartly handling lots of small accesses at once. Signed-off-by: Russell Currey Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/64/kup-radix.h | 36 ++ arch/powerpc/include/asm/exception-64s.h | 14 -- arch/powerpc/include/asm/kup.h | 3 +++ arch/powerpc/include/asm/mmu.h | 9 ++- arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/mm/pgtable-radix.c| 12 + arch/powerpc/mm/pkeys.c| 7 +++-- arch/powerpc/platforms/Kconfig.cputype | 1 + 8 files changed, 78 insertions(+), 5 deletions(-) create mode 100644 arch/powerpc/include/asm/book3s/64/kup-radix.h diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h new file mode 100644 index ..93273ca99310 --- /dev/null +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_KUP_RADIX_H +#define _ASM_POWERPC_KUP_RADIX_H + +#ifndef __ASSEMBLY__ +#ifdef CONFIG_PPC_KUAP +#include +/* + * We do have the ability to individually lock/unlock reads and writes rather + * than both at once, however it's a significant performance hit due to needing + * to do a read-modify-write, which adds a mfspr, which is slow. As a result, + * locking/unlocking both at once is preferred. + */ +static inline void unlock_user_access(void __user *to, const void __user *from, + unsigned long size) +{ + if (!mmu_has_feature(MMU_FTR_RADIX_KUAP)) + return; + + mtspr(SPRN_AMR, 0); + isync(); + get_paca()->user_access_allowed = 1; +} + +static inline void lock_user_access(void __user *to, const void __user *from, + unsigned long size) +{ + if (!mmu_has_feature(MMU_FTR_RADIX_KUAP)) + return; + + mtspr(SPRN_AMR, AMR_LOCKED); + get_paca()->user_access_allowed = 0; +} +#endif /* CONFIG_PPC_KUAP */ +#endif /* __ASSEMBLY__ */ +#endif diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 4d971ca1e69b..d92614c66d87 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -264,8 +264,18 @@ BEGIN_FTR_SECTION_NESTED(943) \ std ra,offset(r13); \ END_FTR_SECTION_NESTED(ftr,ftr,943) -#define LOCK_USER_ACCESS(reg) -#define UNLOCK_USER_ACCESS(reg) +#define LOCK_USER_ACCESS(reg) \ +BEGIN_MMU_FTR_SECTION_NESTED(944) \ + LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \ + mtspr SPRN_AMR,reg; \ +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KUAP,MMU_FTR_RADIX_KUAP,944) + +#define UNLOCK_USER_ACCESS(reg) \ +BEGIN_MMU_FTR_SECTION_NESTED(945) \ + li reg,0; \ + mtspr SPRN_AMR,reg; \ + isync; \ +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KUAP,MMU_FTR_RADIX_KUAP,945) #define EXCEPTION_PROLOG_0(area) \ GET_PACA(r13);
[RFC PATCH v2 07/11] powerpc/mm/radix: Use KUEP API for Radix MMU
Execution protection already exists on radix, this just refactors the radix init to provide the KUEP setup function instead. Thus, the only functional change is that it can now be disabled. Signed-off-by: Russell Currey Signed-off-by: Christophe Leroy --- arch/powerpc/mm/pgtable-radix.c| 9 ++--- arch/powerpc/platforms/Kconfig.cputype | 1 + 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c index 931156069a81..45aa9e501e76 100644 --- a/arch/powerpc/mm/pgtable-radix.c +++ b/arch/powerpc/mm/pgtable-radix.c @@ -535,8 +535,13 @@ static void radix_init_amor(void) mtspr(SPRN_AMOR, (3ul << 62)); } -static void radix_init_iamr(void) +void setup_kuep(bool disabled) { + if (disabled) + return; + + pr_info("Activating Kernel Userspace Execution Prevention\n"); + /* * Radix always uses key0 of the IAMR to determine if an access is * allowed. We set bit 0 (IBM bit 1) of key0, to prevent instruction @@ -605,7 +610,6 @@ void __init radix__early_init_mmu(void) memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE); - radix_init_iamr(); radix_init_pgtable(); /* Switch to the guard PID before turning on MMU */ radix__switch_mmu_context(NULL, _mm); @@ -627,7 +631,6 @@ void radix__early_init_mmu_secondary(void) __pa(partition_tb) | (PATB_SIZE_SHIFT - 12)); radix_init_amor(); } - radix_init_iamr(); radix__switch_mmu_context(NULL, _mm); if (cpu_has_feature(CPU_FTR_HVMODE)) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index a20669a9ec13..e6831d0ec159 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -334,6 +334,7 @@ config PPC_RADIX_MMU bool "Radix MMU Support" depends on PPC_BOOK3S_64 select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA + select PPC_HAVE_KUEP default y help Enable support for the Power ISA 3.0 Radix style MMU. Currently this -- 2.13.3
[RFC PATCH v2 06/11] powerpc/8xx: Add Kernel Userspace Access Protection
This patch adds Kernel Userspace Access Protection on the 8xx. When a page is RO or RW, it is set RO or RW for Key 0 and NA for Key 1. Up to now, the User group is defined with Key 0 for both User and Supervisor. By changing the group to Key 0 for User and Key 1 for Supervisor, this patch prevents the Kernel from being able to access user data. At exception entry, the kernel saves SPRN_MD_AP in the regs struct, and reapply the protection. At exception exit it restore SPRN_MD_AP with the value it had on exception entry. For the time being, the unused mq field of pt_regs struct is used for that. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/kup.h | 4 arch/powerpc/include/asm/mmu-8xx.h | 6 + arch/powerpc/include/asm/nohash/32/kup-8xx.h | 34 arch/powerpc/mm/8xx_mmu.c| 12 ++ arch/powerpc/platforms/Kconfig.cputype | 1 + 5 files changed, 57 insertions(+) create mode 100644 arch/powerpc/include/asm/nohash/32/kup-8xx.h diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h index 2ac540fb488f..f7262f4c427e 100644 --- a/arch/powerpc/include/asm/kup.h +++ b/arch/powerpc/include/asm/kup.h @@ -2,6 +2,10 @@ #ifndef _ASM_POWERPC_KUP_H_ #define _ASM_POWERPC_KUP_H_ +#ifdef CONFIG_PPC_8xx +#include +#endif + #ifndef __ASSEMBLY__ #include diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h index 53dbf0788fce..01a0a1694ebd 100644 --- a/arch/powerpc/include/asm/mmu-8xx.h +++ b/arch/powerpc/include/asm/mmu-8xx.h @@ -120,6 +120,12 @@ */ #define MD_APG_INIT0x +/* + * 0 => No user => 01 (all accesses performed according to page definition) + * 1 => User => 10 (all accesses performed according to swaped page definition) + */ +#define MD_APG_KUAP0x + /* The effective page number register. When read, contains the information * about the last instruction TLB miss. When MD_RPN is written, bits in * this register are used to create the TLB entry. diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h new file mode 100644 index ..8f4975c0de22 --- /dev/null +++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_KUP_8XX_H_ +#define _ASM_POWERPC_KUP_8XX_H_ + +#ifdef CONFIG_PPC_KUAP +#define LOCK_USER_ACCESS(val, sp, sr, srmax, current) \ + mfspr val, SPRN_MD_AP;\ + stw val, _KUAP(sp); \ + lis val, MD_APG_KUAP@h; \ + ori val, val, MD_APG_KUAP@l;\ + mtspr SPRN_MD_AP, val + +#define REST_USER_ACCESS(val, sp, sr, srmax, current) \ + lwz val, _KUAP(sp); \ + mtspr SPRN_MD_AP, val + +#define KUAP_START MD_APG_KUAP + +#ifndef __ASSEMBLY__ +static inline void lock_user_access(void __user *to, const void __user *from, + unsigned long size) +{ + mtspr(SPRN_MD_AP, MD_APG_KUAP); +} + +static inline void unlock_user_access(void __user *to, const void __user *from, + unsigned long size) +{ + mtspr(SPRN_MD_AP, MD_APG_INIT); +} +#endif /* !__ASSEMBLY__ */ +#endif /* CONFIG_PPC_KUAP */ + +#endif /* _ASM_POWERPC_KUP_8XX_H_ */ diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c index f14ceb507d98..2bba4fd2eed7 100644 --- a/arch/powerpc/mm/8xx_mmu.c +++ b/arch/powerpc/mm/8xx_mmu.c @@ -206,3 +206,15 @@ void __init setup_kuep(bool disabled) mtspr(SPRN_MI_AP, MI_APG_KUEP); } #endif + +#ifdef CONFIG_PPC_KUAP +void __init setup_kuap(bool disabled) +{ + pr_info("Activating Kernel Userspace Access Protection\n"); + + if (disabled) + pr_warn("KUAP cannot be disabled yet on 8xx when compiled in\n"); + + mtspr(SPRN_MD_AP, MD_APG_KUAP); +} +#endif diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index d1757cedf60b..a20669a9ec13 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -34,6 +34,7 @@ config PPC_8xx select FSL_SOC select SYS_SUPPORTS_HUGETLBFS select PPC_HAVE_KUEP + select PPC_HAVE_KUAP config 40x bool "AMCC 40x" -- 2.13.3
[RFC PATCH v2 05/11] powerpc/8xx: Add Kernel Userspace Execution Prevention
This patch adds Kernel Userspace Execution Prevention on the 8xx. When a page is Executable, it is set Executable for Key 0 and NX for Key 1. Up to now, the User group is defined with Key 0 for both User and Supervisor. By changing the group to Key 0 for User and Key 1 for Supervisor, this patch prevents the Kernel from being able to execute user code. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/mmu-8xx.h | 6 ++ arch/powerpc/mm/8xx_mmu.c | 12 arch/powerpc/platforms/Kconfig.cputype | 1 + 3 files changed, 19 insertions(+) diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h index fa05aa566ece..53dbf0788fce 100644 --- a/arch/powerpc/include/asm/mmu-8xx.h +++ b/arch/powerpc/include/asm/mmu-8xx.h @@ -41,6 +41,12 @@ */ #define MI_APG_INIT0x +/* + * 0 => No user => 01 (all accesses performed according to page definition) + * 1 => User => 10 (all accesses performed according to swaped page definition) + */ +#define MI_APG_KUEP0x + /* The effective page number register. When read, contains the information * about the last instruction TLB miss. When MI_RPN is written, bits in * this register are used to create the TLB entry. diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c index 01b7f5107c3a..f14ceb507d98 100644 --- a/arch/powerpc/mm/8xx_mmu.c +++ b/arch/powerpc/mm/8xx_mmu.c @@ -194,3 +194,15 @@ void flush_instruction_cache(void) mtspr(SPRN_IC_CST, IDC_INVALL); isync(); } + +#ifdef CONFIG_PPC_KUEP +void __init setup_kuep(bool disabled) +{ + if (disabled) + return; + + pr_info("Activating Kernel Userspace Execution Prevention\n"); + + mtspr(SPRN_MI_AP, MI_APG_KUEP); +} +#endif diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 68eaafd54aca..d1757cedf60b 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -33,6 +33,7 @@ config PPC_8xx bool "Freescale 8xx" select FSL_SOC select SYS_SUPPORTS_HUGETLBFS + select PPC_HAVE_KUEP config 40x bool "AMCC 40x" -- 2.13.3
[RFC PATCH v2 04/11] powerpc/mm: Add a framework for Kernel Userspace Access Protection
This patch implements a framework for Kernel Userspace Access Protection. Then subarches will have to possibility to provide their own implementation by providing setup_kuap() and lock/unlock_user_access() Some platform will need to know the area accessed and whether it is accessed from read, write or both. Therefore source, destination and size and handed over to the two functions. Signed-off-by: Christophe Leroy --- Documentation/admin-guide/kernel-parameters.txt | 2 +- arch/powerpc/include/asm/exception-64e.h| 3 ++ arch/powerpc/include/asm/exception-64s.h| 9 +- arch/powerpc/include/asm/futex.h| 4 +++ arch/powerpc/include/asm/kup.h | 21 ++ arch/powerpc/include/asm/paca.h | 3 ++ arch/powerpc/include/asm/processor.h| 3 ++ arch/powerpc/include/asm/ptrace.h | 3 ++ arch/powerpc/include/asm/uaccess.h | 38 +++-- arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kernel/entry_32.S | 8 +- arch/powerpc/kernel/entry_64.S | 16 +-- arch/powerpc/kernel/process.c | 3 ++ arch/powerpc/lib/checksum_wrappers.c| 4 +++ arch/powerpc/mm/fault.c | 17 --- arch/powerpc/mm/init-common.c | 10 +++ arch/powerpc/platforms/Kconfig.cputype | 12 17 files changed, 146 insertions(+), 17 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 1103549363bb..0d059b141ff8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2792,7 +2792,7 @@ noexec=on: enable non-executable mappings (default) noexec=off: disable non-executable mappings - nosmap [X86] + nosmap [X86,PPC] Disable SMAP (Supervisor Mode Access Prevention) even if it is supported by processor. diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h index 555e22d5e07f..bf25015834ee 100644 --- a/arch/powerpc/include/asm/exception-64e.h +++ b/arch/powerpc/include/asm/exception-64e.h @@ -215,5 +215,8 @@ exc_##label##_book3e: #define RFI_TO_USER\ rfi +#define UNLOCK_USER_ACCESS(reg) +#define LOCK_USER_ACCESS(reg) + #endif /* _ASM_POWERPC_EXCEPTION_64E_H */ diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 3b4767ed3ec5..4d971ca1e69b 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -264,6 +264,9 @@ BEGIN_FTR_SECTION_NESTED(943) \ std ra,offset(r13); \ END_FTR_SECTION_NESTED(ftr,ftr,943) +#define LOCK_USER_ACCESS(reg) +#define UNLOCK_USER_ACCESS(reg) + #define EXCEPTION_PROLOG_0(area) \ GET_PACA(r13); \ std r9,area+EX_R9(r13); /* save r9 */ \ @@ -500,7 +503,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) beq 4f; /* if from kernel mode */ \ ACCOUNT_CPU_USER_ENTRY(r13, r9, r10); \ SAVE_PPR(area, r9);\ -4: EXCEPTION_PROLOG_COMMON_2(area)\ +4: lbz r9,PACA_USER_ACCESS_ALLOWED(r13); \ + cmpwi cr1,r9,0; \ + beq 5f;\ + LOCK_USER_ACCESS(r9); \ +5: EXCEPTION_PROLOG_COMMON_2(area) \ EXCEPTION_PROLOG_COMMON_3(n) \ ACCOUNT_STOLEN_TIME diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h index 94542776a62d..32230f9a1c32 100644 --- a/arch/powerpc/include/asm/futex.h +++ b/arch/powerpc/include/asm/futex.h @@ -35,6 +35,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, { int oldval = 0, ret; + unlock_user_access(uaddr, NULL, sizeof(*uaddr)); pagefault_disable(); switch (op) { @@ -62,6 +63,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, if (!ret) *oval = oldval; + lock_user_access(uaddr, NULL, sizeof(*uaddr)); return ret; } @@ -75,6 +77,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, if (!access_ok(VERIFY_WRITE,
[RFC PATCH v2 03/11] powerpc: Add skeleton for Kernel Userspace Execution Prevention
This patch adds a skeleton for Kernel Userspace Execution Prevention. Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP and provide setup_kuep() function. Signed-off-by: Christophe Leroy --- Documentation/admin-guide/kernel-parameters.txt | 2 +- arch/powerpc/include/asm/kup.h | 6 ++ arch/powerpc/mm/fault.c | 3 ++- arch/powerpc/mm/init-common.c | 11 +++ arch/powerpc/platforms/Kconfig.cputype | 12 5 files changed, 32 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 81d1d5a74728..1103549363bb 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2796,7 +2796,7 @@ Disable SMAP (Supervisor Mode Access Prevention) even if it is supported by processor. - nosmep [X86] + nosmep [X86,PPC] Disable SMEP (Supervisor Mode Execution Prevention) even if it is supported by processor. diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h index 7a88b8b9b54d..af4b5f854ca4 100644 --- a/arch/powerpc/include/asm/kup.h +++ b/arch/powerpc/include/asm/kup.h @@ -6,6 +6,12 @@ void setup_kup(void); +#ifdef CONFIG_PPC_KUEP +void setup_kuep(bool disabled); +#else +static inline void setup_kuep(bool disabled) { } +#endif + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_POWERPC_KUP_H_ */ diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 50e5c790d11e..e57bd46cf25b 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -230,8 +230,9 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code, if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT | DSISR_PROTFAULT))) { printk_ratelimited(KERN_CRIT "kernel tried to execute" - " exec-protected page (%lx) -" + " %s page (%lx) -" "exploit attempt? (uid: %d)\n", + address >= TASK_SIZE ? "exec-protected" : "user", address, from_kuid(_user_ns, current_uid())); } diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c index a72bbfc3add6..37f84a43b822 100644 --- a/arch/powerpc/mm/init-common.c +++ b/arch/powerpc/mm/init-common.c @@ -26,8 +26,19 @@ #include #include +static bool disable_kuep = !IS_ENABLED(CONFIG_PPC_KUEP); + +static int __init parse_nosmep(char *p) +{ + disable_kuep = true; + pr_warn("Disabling Kernel Userspace Execution Prevention\n"); + return 0; +} +early_param("nosmep", parse_nosmep); + void __init setup_kup(void) { + setup_kuep(disable_kuep); } static void pgd_ctor(void *addr) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index f4e2c5729374..70830cb3c18a 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -351,6 +351,18 @@ config PPC_RADIX_MMU_DEFAULT If you're unsure, say Y. +config PPC_HAVE_KUEP + bool + +config PPC_KUEP + bool "Kernel Userspace Execution Prevention" + depends on PPC_HAVE_KUEP + default y + help + Enable support for Kernel Userspace Execution Prevention (KUEP) + + If you're unsure, say Y. + config ARCH_ENABLE_HUGEPAGE_MIGRATION def_bool y depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION -- 2.13.3
[RFC PATCH v2 02/11] powerpc: Add framework for Kernel Userspace Protection
This patch adds a skeleton for Kernel Userspace Protection functionnalities like Kernel Userspace Access Protection and Kernel Userspace Execution Prevention The subsequent implementation of KUAP for radix makes use of a MMU feature in order to patch out assembly when KUAP is disabled or unsupported. This won't work unless there's an entry point for KUP support before the feature magic happens, so for PPC64 setup_kup() is called early in setup. On PPC32, feature_fixup is done too early to allow the same. Suggested-by: Russell Currey Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/kup.h | 11 +++ arch/powerpc/kernel/setup_64.c | 7 +++ arch/powerpc/mm/init-common.c | 5 + arch/powerpc/mm/init_32.c | 3 +++ 4 files changed, 26 insertions(+) create mode 100644 arch/powerpc/include/asm/kup.h diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h new file mode 100644 index ..7a88b8b9b54d --- /dev/null +++ b/arch/powerpc/include/asm/kup.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_KUP_H_ +#define _ASM_POWERPC_KUP_H_ + +#ifndef __ASSEMBLY__ + +void setup_kup(void); + +#endif /* !__ASSEMBLY__ */ + +#endif /* _ASM_POWERPC_KUP_H_ */ diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 236c1151a3a7..771f280a6bf6 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -68,6 +68,7 @@ #include #include #include +#include #include "setup.h" @@ -331,6 +332,12 @@ void __init early_setup(unsigned long dt_ptr) */ configure_exceptions(); + /* +* Configure Kernel Userspace Protection. This needs to happen before +* feature fixups for platforms that implement this using features. +*/ + setup_kup(); + /* Apply all the dynamic patching */ apply_feature_fixups(); setup_feature_keys(); diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c index 2b656e67f2ea..a72bbfc3add6 100644 --- a/arch/powerpc/mm/init-common.c +++ b/arch/powerpc/mm/init-common.c @@ -24,6 +24,11 @@ #include #include #include +#include + +void __init setup_kup(void) +{ +} static void pgd_ctor(void *addr) { diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c index 3e59e5d64b01..93cfa8cf015d 100644 --- a/arch/powerpc/mm/init_32.c +++ b/arch/powerpc/mm/init_32.c @@ -45,6 +45,7 @@ #include #include #include +#include #include "mmu_decl.h" @@ -182,6 +183,8 @@ void __init MMU_init(void) btext_unmap(); #endif + setup_kup(); + /* Shortly after that, the entire linear mapping will be available */ memblock_set_current_limit(lowmem_end_addr); } -- 2.13.3
[PATCH v2 01/11] powerpc/mm: Fix reporting of kernel execute faults on the 8xx
On the 8xx, no-execute is set via PPP bits in the PTE. Therefore a no-exec fault generates DSISR_PROTFAULT error bits, not DSISR_NOEXEC_OR_G. This patch adds DSISR_PROTFAULT in the test mask. Fixes: d3ca587404b3 ("powerpc/mm: Fix reporting of kernel execute faults") Signed-off-by: Christophe Leroy --- arch/powerpc/mm/fault.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 1697e903bbf2..50e5c790d11e 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -226,7 +226,9 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, static bool bad_kernel_fault(bool is_exec, unsigned long error_code, unsigned long address) { - if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT))) { + /* NX faults set DSISR_PROTFAULT on the 8xx, DSISR_NOEXEC_OR_G on others */ + if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT | + DSISR_PROTFAULT))) { printk_ratelimited(KERN_CRIT "kernel tried to execute" " exec-protected page (%lx) -" "exploit attempt? (uid: %d)\n", -- 2.13.3