Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
On Wed, Jan 29, 2020 at 11:20:44PM +0100, Gerald Schaefer wrote: > On Mon, 27 Jan 2020 22:33:08 -0500 > > For example, who would have thought that pXd_bad() is supposed to > report large entries as bad? It's not really documented anywhere, A bit off-topic, @Anshuman, maybe you could start a Documentation/ patch that describes at least some of the pXd_whaterver()? Or that would be too much to ask? ;-) > so we just checked them for sanity like normal entries, which > apparently worked fine so far, but for how long? -- Sincerely yours, Mike.
[PATCH 1/2] selftests: vm: Do not override definition of ARCH
Independent builds of the vm selftests is currently broken because commit 7549b3364201 overrides the value of ARCH with the machine name from uname. This does not always match the architecture names used for tasks like header installation. E.g. for building tests on powerpc64, we need ARCH=powerpc and not ARCH=ppc64 or ARCH=ppc64le. Otherwise, the build fails as shown below. $ uname -m ppc64le $ make -C tools/testing/selftests/vm make: Entering directory '/home/sandipan/linux/tools/testing/selftests/vm' make --no-builtin-rules ARCH=ppc64le -C ../../../.. headers_install make[1]: Entering directory '/home/sandipan/linux' Makefile:653: arch/ppc64le/Makefile: No such file or directory make[1]: *** No rule to make target 'arch/ppc64le/Makefile'. Stop. make[1]: Leaving directory '/home/sandipan/linux' ../lib.mk:50: recipe for target 'khdr' failed make: *** [khdr] Error 2 make: Leaving directory '/home/sandipan/linux/tools/testing/selftests/vm' Fixes: 7549b3364201 ("selftests: vm: Build/Run 64bit tests only on 64bit arch") Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 7f9a8a8c31da..3f2e2f0ccbc9 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 # Makefile for vm selftests uname_M := $(shell uname -m 2>/dev/null || echo not) -ARCH ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/') +MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/') CFLAGS = -Wall -I ../../../../usr/include $(EXTRA_CFLAGS) LDLIBS = -lrt @@ -19,7 +19,7 @@ TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd -ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) +ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch TEST_GEN_FILES += virtual_address_range endif -- 2.17.1
[PATCH 2/2] selftests: vm: Fix 64-bit test builds for powerpc64le
Some tests are built only for 64-bit systems. This makes sure that these tests are built for both big and little endian variants of powerpc64. Fixes: 7549b3364201 ("selftests: vm: Build/Run 64bit tests only on 64bit arch") Reviewed-by: Kamalesh Babulal Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/Makefile| 2 +- tools/testing/selftests/vm/run_vmtests | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 3f2e2f0ccbc9..8074340c6b3a 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -19,7 +19,7 @@ TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd -ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) +ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch TEST_GEN_FILES += virtual_address_range endif diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index a692ea828317..db8e0d1c7b39 100755 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -59,7 +59,7 @@ else fi #filter 64bit architectures -ARCH64STR="arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64" +ARCH64STR="arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 s390x sh64 sparc64 x86_64" if [ -z $ARCH ]; then ARCH=`uname -m 2>/dev/null | sed -e 's/aarch64.*/arm64/'` fi -- 2.17.1
[PATCH 0/2] selftests: vm: Build fixes for powerpc64
The second patch was already posted independently but because of the changes in the first patch, the second one now depends on it. Hence posting it now as a part of this series. The last version (v2) of the second patch can be found at: https://patchwork.ozlabs.org/patch/1225969/ Sandipan Das (2): selftests: vm: Do not override definition of ARCH selftests: vm: Fix 64-bit test builds for powerpc64le tools/testing/selftests/vm/Makefile| 4 ++-- tools/testing/selftests/vm/run_vmtests | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) -- 2.17.1
Re: [PATCH 2/2] pseries/makefile: Remove CONFIG_PPC_PSERIES check
On Thu, Jan 30, 2020 at 5:32 PM Oliver O'Halloran wrote: > > The platform makefile (arch/powerpc/platforms/pseries/Makefile) is only doh s/platform/pseries/ > included by the platform makefile (arch/powerpc/platform/Makefile) when > CONFIG_PPC_PSERIES is selected, so checking for CONFIG_PPC_PSERIES in the > pseries makefile is pointless. > > Signed-off-by: Oliver O'Halloran > --- > arch/powerpc/platforms/pseries/Makefile | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/Makefile > b/arch/powerpc/platforms/pseries/Makefile > index a3c74a5..c8a2b0b 100644 > --- a/arch/powerpc/platforms/pseries/Makefile > +++ b/arch/powerpc/platforms/pseries/Makefile > @@ -29,6 +29,4 @@ obj-$(CONFIG_PPC_SPLPAR) += vphn.o > obj-$(CONFIG_PPC_SVM) += svm.o > obj-$(CONFIG_FA_DUMP) += rtas-fadump.o > > -ifdef CONFIG_PPC_PSERIES > obj-$(CONFIG_SUSPEND) += suspend.o > -endif > -- > 2.9.5 >
[PATCH v18 24/24] selftests: vm: pkeys: Fix multilib builds for x86
This ensures that both 32-bit and 64-bit binaries are generated when this is built on a x86_64 system. Most of the changes have been borrowed from tools/testing/selftests/x86/Makefile. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/Makefile | 72 + 1 file changed, 72 insertions(+) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 4e9c741be6af..82031f84af21 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -18,7 +18,30 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd + +ifeq ($(ARCH),x86_64) +CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_32bit_program.c -m32) +CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_64bit_program.c) +CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_program.c -no-pie) + +TARGETS := protection_keys +BINARIES_32 := $(TARGETS:%=%_32) +BINARIES_64 := $(TARGETS:%=%_64) + +ifeq ($(CAN_BUILD_WITH_NOPIE),1) +CFLAGS += -no-pie +endif + +ifeq ($(CAN_BUILD_I386),1) +TEST_GEN_FILES += $(BINARIES_32) +endif + +ifeq ($(CAN_BUILD_X86_64),1) +TEST_GEN_FILES += $(BINARIES_64) +endif +else TEST_GEN_FILES += protection_keys +endif ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch @@ -32,6 +55,55 @@ TEST_FILES := test_vmalloc.sh KSFT_KHDR_INSTALL := 1 include ../lib.mk +ifeq ($(ARCH),x86_64) +BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32)) +BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64)) + +define gen-target-rule-32 +$(1) $(1)_32: $(OUTPUT)/$(1)_32 +.PHONY: $(1) $(1)_32 +endef + +define gen-target-rule-64 +$(1) $(1)_64: $(OUTPUT)/$(1)_64 +.PHONY: $(1) $(1)_64 +endef + +ifeq ($(CAN_BUILD_I386),1) +$(BINARIES_32): CFLAGS += -m32 +$(BINARIES_32): LDLIBS += -lrt -ldl -lm +$(BINARIES_32): %_32: %.c + $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ +$(foreach t,$(TARGETS),$(eval $(call gen-target-rule-32,$(t +endif + +ifeq ($(CAN_BUILD_X86_64),1) +$(BINARIES_64): CFLAGS += -m64 +$(BINARIES_64): LDLIBS += -lrt -ldl +$(BINARIES_64): %_64: %.c + $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ +$(foreach t,$(TARGETS),$(eval $(call gen-target-rule-64,$(t +endif + +# x86_64 users should be encouraged to install 32-bit libraries +ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01) +all: warn_32bit_failure + +warn_32bit_failure: + @echo "Warning: you seem to have a broken 32-bit build" 2>&1; \ + echo "environment. This will reduce test coverage of 64-bit" 2>&1; \ + echo "kernels. If you are using a Debian-like distribution," 2>&1; \ + echo "try:"; 2>&1; \ + echo ""; \ + echo " apt-get install gcc-multilib libc6-i386 libc6-dev-i386"; \ + echo ""; \ + echo "If you are using a Fedora-like distribution, try:"; \ + echo ""; \ + echo " yum install glibc-devel.*i686"; \ + exit 0; +endif +endif + $(OUTPUT)/userfaultfd: LDLIBS += -lpthread $(OUTPUT)/mlock-random-test: LDLIBS += -lcap -- 2.17.1
[PATCH v18 23/24] selftests: vm: pkeys: Use the correct page size on powerpc
Both 4K and 64K pages are supported on powerpc. Parts of the selftest code perform alignment computations based on the PAGE_SIZE macro which is currently hardcoded to 64K for powerpc. This causes some test failures on kernels configured with 4K page size. In some cases, we need to enforce function alignment on page size. Since this can only be done at build time, 64K is used as the alignment factor as that also ensures 4K alignment. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-powerpc.h| 2 +- tools/testing/selftests/vm/protection_keys.c | 5 + 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index 02bd4dd7d467..3a761e51a587 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -36,7 +36,7 @@ pkey-31 and exec-only key */ #define PKEY_BITS_PER_PKEY 2 #define HPAGE_SIZE (1UL << 24) -#define PAGE_SIZE (1UL << 16) +#define PAGE_SIZE sysconf(_SC_PAGESIZE) static inline u32 pkey_bit_position(int pkey) { diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index a1cb9a71e77c..fc19addcb5c8 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -146,7 +146,12 @@ void abort_hooks(void) * will then fault, which makes sure that the fault code handles * execute-only memory properly. */ +#ifdef __powerpc64__ +/* This way, both 4K and 64K alignment are maintained */ +__attribute__((__aligned__(65536))) +#else __attribute__((__aligned__(PAGE_SIZE))) +#endif void lots_o_noops_around_write(int *write_to_me) { dprintf3("running %s()\n", __func__); -- 2.17.1
[PATCH v18 22/24] selftests/vm/pkeys: Override access right definitions on powerpc
From: Ram Pai Some platforms hardcode the x86 values for PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE such as those in: /usr/include/bits/mman-shared.h. This overrides the definitions with correct values for powerpc. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-powerpc.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index d31665c48f5e..02bd4dd7d467 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -16,11 +16,13 @@ #define fpregs fp_regs #define si_pkey_offset 0x20 -#ifndef PKEY_DISABLE_ACCESS +#ifdef PKEY_DISABLE_ACCESS +#undef PKEY_DISABLE_ACCESS # define PKEY_DISABLE_ACCESS 0x3 /* disable read and write */ #endif -#ifndef PKEY_DISABLE_WRITE +#ifdef PKEY_DISABLE_WRITE +#undef PKEY_DISABLE_WRITE # define PKEY_DISABLE_WRITE0x2 #endif -- 2.17.1
[PATCH v18 21/24] selftests/vm/pkeys: Test correct behaviour of pkey-0
From: Ram Pai Ensure that pkey-0 is allocated on start and that it can be attached dynamically in various modes, without failures. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 53 1 file changed, 53 insertions(+) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index d4952b57cc90..a1cb9a71e77c 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -964,6 +964,58 @@ __attribute__((noinline)) int read_ptr(int *ptr) return *ptr; } +void test_pkey_alloc_free_attach_pkey0(int *ptr, u16 pkey) +{ + int i, err; + int max_nr_pkey_allocs; + int alloced_pkeys[NR_PKEYS]; + int nr_alloced = 0; + long size; + + pkey_assert(pkey_last_malloc_record); + size = pkey_last_malloc_record->size; + /* +* This is a bit of a hack. But mprotect() requires +* huge-page-aligned sizes when operating on hugetlbfs. +* So, make sure that we use something that's a multiple +* of a huge page when we can. +*/ + if (size >= HPAGE_SIZE) + size = HPAGE_SIZE; + + /* allocate every possible key and make sure key-0 never got allocated */ + max_nr_pkey_allocs = NR_PKEYS; + for (i = 0; i < max_nr_pkey_allocs; i++) { + int new_pkey = alloc_pkey(); + pkey_assert(new_pkey != 0); + + if (new_pkey < 0) + break; + alloced_pkeys[nr_alloced++] = new_pkey; + } + /* free all the allocated keys */ + for (i = 0; i < nr_alloced; i++) { + int free_ret; + + if (!alloced_pkeys[i]) + continue; + free_ret = sys_pkey_free(alloced_pkeys[i]); + pkey_assert(!free_ret); + } + + /* attach key-0 in various modes */ + err = sys_mprotect_pkey(ptr, size, PROT_READ, 0); + pkey_assert(!err); + err = sys_mprotect_pkey(ptr, size, PROT_WRITE, 0); + pkey_assert(!err); + err = sys_mprotect_pkey(ptr, size, PROT_EXEC, 0); + pkey_assert(!err); + err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE, 0); + pkey_assert(!err); + err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE|PROT_EXEC, 0); + pkey_assert(!err); +} + void test_read_of_write_disabled_region(int *ptr, u16 pkey) { int ptr_contents; @@ -1448,6 +1500,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = { test_pkey_syscalls_on_non_allocated_pkey, test_pkey_syscalls_bad_args, test_pkey_alloc_exhaust, + test_pkey_alloc_free_attach_pkey0, }; void run_tests_once(void) -- 2.17.1
[PATCH v18 16/24] selftests/vm/pkeys: Improve checks to determine pkey support
From: Ram Pai For the pkeys subsystem to work, both the CPU and the kernel need to have support. So, additionally check if the kernel supports pkeys apart from the CPU feature checks. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 30 tools/testing/selftests/vm/pkey-powerpc.h| 3 +- tools/testing/selftests/vm/pkey-x86.h| 2 +- tools/testing/selftests/vm/protection_keys.c | 7 +++-- 4 files changed, 37 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 2f4b1eb3a680..59ccdff18214 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -76,6 +76,8 @@ extern void abort_hooks(void); __attribute__((noinline)) int read_ptr(int *ptr); void expected_pkey_fault(int pkey); +int sys_pkey_alloc(unsigned long flags, unsigned long init_val); +int sys_pkey_free(unsigned long pkey); #if defined(__i386__) || defined(__x86_64__) /* arch */ #include "pkey-x86.h" @@ -186,4 +188,32 @@ static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si) #endif } +static inline int kernel_has_pkeys(void) +{ + /* try allocating a key and see if it succeeds */ + int ret = sys_pkey_alloc(0, 0); + if (ret <= 0) { + return 0; + } + sys_pkey_free(ret); + return 1; +} + +static inline int is_pkeys_supported(void) +{ + /* check if the cpu supports pkeys */ + if (!cpu_has_pkeys()) { + dprintf1("SKIP: %s: no CPU support\n", __func__); + return 0; + } + + /* check if the kernel supports pkeys */ + if (!kernel_has_pkeys()) { + dprintf1("SKIP: %s: no kernel support\n", __func__); + return 0; + } + + return 1; +} + #endif /* _PKEYS_HELPER_H */ diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index 319673bbab0b..7d7c3ffafdd9 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -63,8 +63,9 @@ static inline void __write_pkey_reg(u64 pkey_reg) __func__, __read_pkey_reg(), pkey_reg); } -static inline int cpu_has_pku(void) +static inline int cpu_has_pkeys(void) { + /* No simple way to determine this */ return 1; } diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h index a0c59d4f7af2..6421b846aa16 100644 --- a/tools/testing/selftests/vm/pkey-x86.h +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -97,7 +97,7 @@ static inline void __cpuid(unsigned int *eax, unsigned int *ebx, #define X86_FEATURE_PKU(1<<3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (1<<4) /* OS Protection Keys Enable */ -static inline int cpu_has_pku(void) +static inline int cpu_has_pkeys(void) { unsigned int eax; unsigned int ebx; diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 5fcbbc525364..95f173049f43 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1378,7 +1378,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey) int size = PAGE_SIZE; int sret; - if (cpu_has_pku()) { + if (cpu_has_pkeys()) { dprintf1("SKIP: %s: no CPU support\n", __func__); return; } @@ -1447,12 +1447,13 @@ void pkey_setup_shadow(void) int main(void) { int nr_iterations = 22; + int pkeys_supported = is_pkeys_supported(); setup_handlers(); - printf("has pku: %d\n", cpu_has_pku()); + printf("has pkeys: %d\n", pkeys_supported); - if (!cpu_has_pku()) { + if (!pkeys_supported) { int size = PAGE_SIZE; int *ptr; -- 2.17.1
[PATCH v18 20/24] selftests/vm/pkeys: Introduce a sub-page allocator
From: Ram Pai This introduces a new allocator that allocates 4K hardware pages to back 64K linux pages. This allocator is available only on powerpc. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 6 + tools/testing/selftests/vm/pkey-powerpc.h| 25 tools/testing/selftests/vm/pkey-x86.h| 5 tools/testing/selftests/vm/protection_keys.c | 1 + 4 files changed, 37 insertions(+) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 59ccdff18214..622a85848f61 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -28,6 +28,9 @@ extern int dprint_in_signal; extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE]; +extern int test_nr; +extern int iteration_nr; + #ifdef __GNUC__ __attribute__((format(printf, 1, 2))) #endif @@ -78,6 +81,9 @@ __attribute__((noinline)) int read_ptr(int *ptr); void expected_pkey_fault(int pkey); int sys_pkey_alloc(unsigned long flags, unsigned long init_val); int sys_pkey_free(unsigned long pkey); +int mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot, + unsigned long pkey); +void record_pkey_malloc(void *ptr, long size, int prot); #if defined(__i386__) || defined(__x86_64__) /* arch */ #include "pkey-x86.h" diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index 7d7c3ffafdd9..d31665c48f5e 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -106,4 +106,29 @@ void expect_fault_on_read_execonly_key(void *p1, int pkey) /* 4-byte instructions * 16384 = 64K page */ #define __page_o_noops() asm(".rept 16384 ; nop; .endr") +void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey) +{ + void *ptr; + int ret; + + dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__, + size, prot, pkey); + pkey_assert(pkey < NR_PKEYS); + ptr = mmap(NULL, size, prot, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); + pkey_assert(ptr != (void *)-1); + + ret = syscall(__NR_subpage_prot, ptr, size, NULL); + if (ret) { + perror("subpage_perm"); + return PTR_ERR_ENOTSUP; + } + + ret = mprotect_pkey((void *)ptr, PAGE_SIZE, prot, pkey); + pkey_assert(!ret); + record_pkey_malloc(ptr, size, prot); + + dprintf1("%s() for pkey %d @ %p\n", __func__, pkey, ptr); + return ptr; +} + #endif /* _PKEYS_POWERPC_H */ diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h index 6421b846aa16..3be20f5d5275 100644 --- a/tools/testing/selftests/vm/pkey-x86.h +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -173,4 +173,9 @@ void expect_fault_on_read_execonly_key(void *p1, int pkey) expected_pkey_fault(pkey); } +void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey) +{ + return PTR_ERR_ENOTSUP; +} + #endif /* _PKEYS_X86_H */ diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 8bb4de103874..d4952b57cc90 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -845,6 +845,7 @@ void *malloc_pkey_mmap_dax(long size, int prot, u16 pkey) void *(*pkey_malloc[])(long size, int prot, u16 pkey) = { malloc_pkey_with_mprotect, + malloc_pkey_with_mprotect_subpage, malloc_pkey_anon_huge, malloc_pkey_hugetlb /* can not do direct with the pkey_mprotect() API: -- 2.17.1
[PATCH v18 19/24] selftests/vm/pkeys: Detect write violation on a mapped access-denied-key page
From: Ram Pai Detect write-violation on a page to which access-disabled key is associated much after the page is mapped. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 13 + 1 file changed, 13 insertions(+) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index cb31a5cdf6d9..8bb4de103874 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1027,6 +1027,18 @@ void test_write_of_access_disabled_region(int *ptr, u16 pkey) *ptr = __LINE__; expected_pkey_fault(pkey); } + +void test_write_of_access_disabled_region_with_page_already_mapped(int *ptr, + u16 pkey) +{ + *ptr = __LINE__; + dprintf1("disabling access; after accessing the page, " + " to PKEY[%02d], doing write\n", pkey); + pkey_access_deny(pkey); + *ptr = __LINE__; + expected_pkey_fault(pkey); +} + void test_kernel_write_of_access_disabled_region(int *ptr, u16 pkey) { int ret; @@ -1423,6 +1435,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = { test_write_of_write_disabled_region, test_write_of_write_disabled_region_with_page_already_mapped, test_write_of_access_disabled_region, + test_write_of_access_disabled_region_with_page_already_mapped, test_kernel_write_of_access_disabled_region, test_kernel_write_of_write_disabled_region, test_kernel_gup_of_access_disabled_region, -- 2.17.1
[PATCH v18 18/24] selftests/vm/pkeys: Associate key on a mapped page and detect write violation
From: Ram Pai Detect write-violation on a page to which write-disabled key is associated much after the page is mapped. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 12 1 file changed, 12 insertions(+) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index f65d384ef6a0..cb31a5cdf6d9 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1002,6 +1002,17 @@ void test_read_of_access_disabled_region_with_page_already_mapped(int *ptr, expected_pkey_fault(pkey); } +void test_write_of_write_disabled_region_with_page_already_mapped(int *ptr, + u16 pkey) +{ + *ptr = __LINE__; + dprintf1("disabling write access; after accessing the page, " + "to PKEY[%02d], doing write\n", pkey); + pkey_write_deny(pkey); + *ptr = __LINE__; + expected_pkey_fault(pkey); +} + void test_write_of_write_disabled_region(int *ptr, u16 pkey) { dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey); @@ -1410,6 +1421,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = { test_read_of_access_disabled_region, test_read_of_access_disabled_region_with_page_already_mapped, test_write_of_write_disabled_region, + test_write_of_write_disabled_region_with_page_already_mapped, test_write_of_access_disabled_region, test_kernel_write_of_access_disabled_region, test_kernel_write_of_write_disabled_region, -- 2.17.1
[PATCH v18 17/24] selftests/vm/pkeys: Associate key on a mapped page and detect access violation
From: Ram Pai Detect access-violation on a page to which access-disabled key is associated much after the page is mapped. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 95f173049f43..f65d384ef6a0 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -984,6 +984,24 @@ void test_read_of_access_disabled_region(int *ptr, u16 pkey) dprintf1("*ptr: %d\n", ptr_contents); expected_pkey_fault(pkey); } + +void test_read_of_access_disabled_region_with_page_already_mapped(int *ptr, + u16 pkey) +{ + int ptr_contents; + + dprintf1("disabling access to PKEY[%02d], doing read @ %p\n", + pkey, ptr); + ptr_contents = read_ptr(ptr); + dprintf1("reading ptr before disabling the read : %d\n", + ptr_contents); + read_pkey_reg(); + pkey_access_deny(pkey); + ptr_contents = read_ptr(ptr); + dprintf1("*ptr: %d\n", ptr_contents); + expected_pkey_fault(pkey); +} + void test_write_of_write_disabled_region(int *ptr, u16 pkey) { dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey); @@ -1390,6 +1408,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey) void (*pkey_tests[])(int *ptr, u16 pkey) = { test_read_of_write_disabled_region, test_read_of_access_disabled_region, + test_read_of_access_disabled_region_with_page_already_mapped, test_write_of_write_disabled_region, test_write_of_access_disabled_region, test_kernel_write_of_access_disabled_region, -- 2.17.1
[PATCH v18 15/24] selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()
From: Ram Pai Some pkeys which are valid on the hardware are reserved and not available for application use. These keys cannot be allocated. test_pkey_alloc_exhaust() tries to account for these and has an assertion which validates if all available pkeys have been exahaustively allocated. However, the expression that is currently used is only valid for x86. On powerpc, a pkey is additionally reserved as compared to x86. Hence, the assertion is made to use an arch-specific helper to get the correct count of reserved pkeys. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index e6de078a9196..5fcbbc525364 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1153,6 +1153,7 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey) dprintf3("%s()::%d\n", __func__, __LINE__); /* +* On x86: * There are 16 pkeys supported in hardware. Three are * allocated by the time we get here: * 1. The default key (0) @@ -1160,8 +1161,16 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey) * 3. One allocated by the test code and passed in via * 'pkey' to this function. * Ensure that we can allocate at least another 13 (16-3). +* +* On powerpc: +* There are either 5, 28, 29 or 32 pkeys supported in +* hardware depending on the page size (4K or 64K) and +* platform (powernv or powervm). Four are allocated by +* the time we get here. These include pkey-0, pkey-1, +* exec-only pkey and the one allocated by the test code. +* Ensure that we can allocate the remaining. */ - pkey_assert(i >= NR_PKEYS-3); + pkey_assert(i >= (NR_PKEYS - get_arch_reserved_keys() - 1)); for (i = 0; i < nr_allocated_pkeys; i++) { err = sys_pkey_free(allocated_pkeys[i]); -- 2.17.1
[PATCH v18 14/24] selftests/vm/pkeys: Fix number of reserved powerpc pkeys
From: "Desnes A. Nunes do Rosario" The number of reserved pkeys in a PowerNV environment is different from that on PowerVM or KVM. Tested on PowerVM and PowerNV environments. Signed-off-by: "Desnes A. Nunes do Rosario" Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-powerpc.h | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index c79f4160a6a0..319673bbab0b 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -28,7 +28,10 @@ #define NR_RESERVED_PKEYS_4K 27 /* pkey-0, pkey-1, exec-only-pkey and 24 other keys that cannot be represented in the PTE */ -#define NR_RESERVED_PKEYS_64K 3 /* pkey-0, pkey-1 and exec-only-pkey */ +#define NR_RESERVED_PKEYS_64K_3KEYS3 /* PowerNV and KVM: pkey-0, +pkey-1 and exec-only key */ +#define NR_RESERVED_PKEYS_64K_4KEYS4 /* PowerVM: pkey-0, pkey-1, +pkey-31 and exec-only key */ #define PKEY_BITS_PER_PKEY 2 #define HPAGE_SIZE (1UL << 24) #define PAGE_SIZE (1UL << 16) @@ -65,12 +68,27 @@ static inline int cpu_has_pku(void) return 1; } +static inline bool arch_is_powervm() +{ + struct stat buf; + + if ((stat("/sys/firmware/devicetree/base/ibm,partition-name", &buf) == 0) && + (stat("/sys/firmware/devicetree/base/hmc-managed?", &buf) == 0) && + (stat("/sys/firmware/devicetree/base/chosen/qemu,graphic-width", &buf) == -1) ) + return true; + + return false; +} + static inline int get_arch_reserved_keys(void) { if (sysconf(_SC_PAGESIZE) == 4096) return NR_RESERVED_PKEYS_4K; else - return NR_RESERVED_PKEYS_64K; + if (arch_is_powervm()) + return NR_RESERVED_PKEYS_64K_4KEYS; + else + return NR_RESERVED_PKEYS_64K_3KEYS; } void expect_fault_on_read_execonly_key(void *p1, int pkey) -- 2.17.1
[PATCH v18 13/24] selftests/vm/pkeys: Introduce powerpc support
From: Ram Pai This makes use of the abstractions added earlier and introduces support for powerpc. For powerpc, after receiving the SIGSEGV, the signal handler must explicitly restore access permissions for the faulting pkey to allow the test to continue. As this makes use of pkey_access_allow(), all of its dependencies and other similar functions have been moved ahead of the signal handler. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 2 + tools/testing/selftests/vm/pkey-powerpc.h| 90 +++ tools/testing/selftests/vm/protection_keys.c | 269 ++- 3 files changed, 233 insertions(+), 128 deletions(-) create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 621fb2a0a5ef..2f4b1eb3a680 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -79,6 +79,8 @@ void expected_pkey_fault(int pkey); #if defined(__i386__) || defined(__x86_64__) /* arch */ #include "pkey-x86.h" +#elif defined(__powerpc64__) /* arch */ +#include "pkey-powerpc.h" #else /* arch */ #error Architecture not supported #endif /* arch */ diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h new file mode 100644 index ..c79f4160a6a0 --- /dev/null +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -0,0 +1,90 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _PKEYS_POWERPC_H +#define _PKEYS_POWERPC_H + +#ifndef SYS_mprotect_key +# define SYS_mprotect_key 386 +#endif +#ifndef SYS_pkey_alloc +# define SYS_pkey_alloc384 +# define SYS_pkey_free 385 +#endif +#define REG_IP_IDX PT_NIP +#define REG_TRAPNO PT_TRAP +#define gregs gp_regs +#define fpregs fp_regs +#define si_pkey_offset 0x20 + +#ifndef PKEY_DISABLE_ACCESS +# define PKEY_DISABLE_ACCESS 0x3 /* disable read and write */ +#endif + +#ifndef PKEY_DISABLE_WRITE +# define PKEY_DISABLE_WRITE0x2 +#endif + +#define NR_PKEYS 32 +#define NR_RESERVED_PKEYS_4K 27 /* pkey-0, pkey-1, exec-only-pkey + and 24 other keys that cannot be + represented in the PTE */ +#define NR_RESERVED_PKEYS_64K 3 /* pkey-0, pkey-1 and exec-only-pkey */ +#define PKEY_BITS_PER_PKEY 2 +#define HPAGE_SIZE (1UL << 24) +#define PAGE_SIZE (1UL << 16) + +static inline u32 pkey_bit_position(int pkey) +{ + return (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY; +} + +static inline u64 __read_pkey_reg(void) +{ + u64 pkey_reg; + + asm volatile("mfspr %0, 0xd" : "=r" (pkey_reg)); + + return pkey_reg; +} + +static inline void __write_pkey_reg(u64 pkey_reg) +{ + u64 amr = pkey_reg; + + dprintf4("%s() changing %016llx to %016llx\n", +__func__, __read_pkey_reg(), pkey_reg); + + asm volatile("mtspr 0xd, %0" : : "r" ((unsigned long)(amr)) : "memory"); + + dprintf4("%s() pkey register after changing %016llx to %016llx\n", + __func__, __read_pkey_reg(), pkey_reg); +} + +static inline int cpu_has_pku(void) +{ + return 1; +} + +static inline int get_arch_reserved_keys(void) +{ + if (sysconf(_SC_PAGESIZE) == 4096) + return NR_RESERVED_PKEYS_4K; + else + return NR_RESERVED_PKEYS_64K; +} + +void expect_fault_on_read_execonly_key(void *p1, int pkey) +{ + /* +* powerpc does not allow userspace to change permissions of exec-only +* keys since those keys are not allocated by userspace. The signal +* handler wont be able to reset the permissions, which means the code +* will infinitely continue to segfault here. +*/ + return; +} + +/* 4-byte instructions * 16384 = 64K page */ +#define __page_o_noops() asm(".rept 16384 ; nop; .endr") + +#endif /* _PKEYS_POWERPC_H */ diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 57c71056c93d..e6de078a9196 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -169,6 +169,125 @@ void dump_mem(void *dumpme, int len_bytes) } } +static u32 hw_pkey_get(int pkey, unsigned long flags) +{ + u64 pkey_reg = __read_pkey_reg(); + + dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n", + __func__, pkey, flags, 0, 0); + dprintf2("%s() raw pkey_reg: %016llx\n", __func__, pkey_reg); + + return (u32) get_pkey_bits(pkey_reg, pkey); +} + +static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags) +{ + u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE); + u64 old_pkey_reg = __read_pkey_reg(); +
[PATCH v18 12/24] selftests/vm/pkeys: Introduce generic pkey abstractions
From: Ram Pai This introduces some generic abstractions and provides the corresponding architecture-specfic implementations for these abstractions. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 12 tools/testing/selftests/vm/pkey-x86.h| 15 +++ tools/testing/selftests/vm/protection_keys.c | 8 ++-- 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 0e3da7c8d628..621fb2a0a5ef 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -74,6 +74,9 @@ extern void abort_hooks(void); } \ } while (0) +__attribute__((noinline)) int read_ptr(int *ptr); +void expected_pkey_fault(int pkey); + #if defined(__i386__) || defined(__x86_64__) /* arch */ #include "pkey-x86.h" #else /* arch */ @@ -172,4 +175,13 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write) #define __stringify_1(x...) #x #define __stringify(x...) __stringify_1(x) +static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si) +{ +#ifdef si_pkey + return &si->si_pkey; +#else + return (u32 *)(((u8 *)si) + si_pkey_offset); +#endif +} + #endif /* _PKEYS_HELPER_H */ diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h index def2a1bcf6a5..a0c59d4f7af2 100644 --- a/tools/testing/selftests/vm/pkey-x86.h +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -42,6 +42,7 @@ #endif #define NR_PKEYS 16 +#define NR_RESERVED_PKEYS 2 /* pkey-0 and exec-only-pkey */ #define PKEY_BITS_PER_PKEY 2 #define HPAGE_SIZE (1UL<<21) #define PAGE_SIZE 4096 @@ -158,4 +159,18 @@ int pkey_reg_xstate_offset(void) return xstate_offset; } +static inline int get_arch_reserved_keys(void) +{ + return NR_RESERVED_PKEYS; +} + +void expect_fault_on_read_execonly_key(void *p1, int pkey) +{ + int ptr_contents; + + ptr_contents = read_ptr(p1); + dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents); + expected_pkey_fault(pkey); +} + #endif /* _PKEYS_X86_H */ diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 535e464e27e9..57c71056c93d 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1307,9 +1307,7 @@ void test_executing_on_unreadable_memory(int *ptr, u16 pkey) madvise(p1, PAGE_SIZE, MADV_DONTNEED); lots_o_noops_around_write(&scratch); do_not_expect_pkey_fault("executing on PROT_EXEC memory"); - ptr_contents = read_ptr(p1); - dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents); - expected_pkey_fault(pkey); + expect_fault_on_read_execonly_key(p1, pkey); } void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey) @@ -1336,9 +1334,7 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey) madvise(p1, PAGE_SIZE, MADV_DONTNEED); lots_o_noops_around_write(&scratch); do_not_expect_pkey_fault("executing on PROT_EXEC memory"); - ptr_contents = read_ptr(p1); - dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents); - expected_pkey_fault(UNKNOWN_PKEY); + expect_fault_on_read_execonly_key(p1, UNKNOWN_PKEY); /* * Put the memory back to non-PROT_EXEC. Should clear the -- 2.17.1
[PATCH v18 11/24] selftests: vm: pkeys: Use the correct huge page size
The huge page size can vary across architectures. This will ensure that the correct huge page size is used when accessing the hugetlb controls under sysfs. Instead of using a hardcoded page size (i.e. 2MB), this now uses the HPAGE_SIZE macro which is arch-specific. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 23 ++-- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 9cc82b65f828..535e464e27e9 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -739,12 +739,15 @@ void *malloc_pkey_anon_huge(long size, int prot, u16 pkey) } int hugetlb_setup_ok; +#define SYSFS_FMT_NR_HUGE_PAGES "/sys/kernel/mm/hugepages/hugepages-%ldkB/nr_hugepages" #define GET_NR_HUGE_PAGES 10 void setup_hugetlbfs(void) { int err; int fd; - char buf[] = "123"; + char buf[256]; + long hpagesz_kb; + long hpagesz_mb; if (geteuid() != 0) { fprintf(stderr, "WARNING: not run as root, can not do hugetlb test\n"); @@ -755,11 +758,16 @@ void setup_hugetlbfs(void) /* * Now go make sure that we got the pages and that they -* are 2M pages. Someone might have made 1G the default. +* are PMD-level pages. Someone might have made PUD-level +* pages the default. */ - fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages", O_RDONLY); + hpagesz_kb = HPAGE_SIZE / 1024; + hpagesz_mb = hpagesz_kb / 1024; + sprintf(buf, SYSFS_FMT_NR_HUGE_PAGES, hpagesz_kb); + fd = open(buf, O_RDONLY); if (fd < 0) { - perror("opening sysfs 2M hugetlb config"); + fprintf(stderr, "opening sysfs %ldM hugetlb config: %s\n", + hpagesz_mb, strerror(errno)); return; } @@ -767,13 +775,14 @@ void setup_hugetlbfs(void) err = read(fd, buf, sizeof(buf)-1); close(fd); if (err <= 0) { - perror("reading sysfs 2M hugetlb config"); + fprintf(stderr, "reading sysfs %ldM hugetlb config: %s\n", + hpagesz_mb, strerror(errno)); return; } if (atoi(buf) != GET_NR_HUGE_PAGES) { - fprintf(stderr, "could not confirm 2M pages, got: '%s' expected %d\n", - buf, GET_NR_HUGE_PAGES); + fprintf(stderr, "could not confirm %ldM pages, got: '%s' expected %d\n", + hpagesz_mb, buf, GET_NR_HUGE_PAGES); return; } -- 2.17.1
[PATCH v18 10/24] selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random
From: Ram Pai alloc_random_pkey() was allocating the same pkey every time. Not all pkeys were geting tested. This fixes it. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 7fd52d5c4bfd..9cc82b65f828 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -25,6 +25,7 @@ #define __SANE_USERSPACE_TYPES__ #include #include +#include #include #include #include @@ -546,10 +547,10 @@ int alloc_random_pkey(void) int nr_alloced = 0; int random_index; memset(alloced_pkeys, 0, sizeof(alloced_pkeys)); + srand((unsigned int)time(NULL)); /* allocate every possible key and make a note of which ones we got */ max_nr_pkey_allocs = NR_PKEYS; - max_nr_pkey_allocs = 1; for (i = 0; i < max_nr_pkey_allocs; i++) { int new_pkey = alloc_pkey(); if (new_pkey < 0) -- 2.17.1
[PATCH v18 09/24] selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()
From: Ram Pai In some cases, a pkey's bits need not necessarily change in a way that the value of the pkey register increases when performing a pkey_disable_set() or decreases when performing a pkey_disable_clear(). For example, on powerpc, if a pkey's current state is PKEY_DISABLE_ACCESS and we perform a pkey_write_disable() on it, the bits still remain the same. We will observe something similar when the pkey's current state is 0 and a pkey_access_enable() is performed on it. Either case would cause some assertions to fail. This fixes the problem. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 4b1ddb526228..7fd52d5c4bfd 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -400,7 +400,7 @@ void pkey_disable_set(int pkey, int flags) dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__, pkey, read_pkey_reg()); if (flags) - pkey_assert(read_pkey_reg() > orig_pkey_reg); + pkey_assert(read_pkey_reg() >= orig_pkey_reg); dprintf1("END<---%s(%d, 0x%x)\n", __func__, pkey, flags); } @@ -431,7 +431,7 @@ void pkey_disable_clear(int pkey, int flags) dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__, pkey, read_pkey_reg()); if (flags) - assert(read_pkey_reg() < orig_pkey_reg); + assert(read_pkey_reg() <= orig_pkey_reg); } void pkey_write_allow(int pkey) -- 2.17.1
[PATCH v18 08/24] selftests/vm/pkeys: Fix pkey_disable_clear()
From: Ram Pai Currently, pkey_disable_clear() sets the specified bits instead clearing them. This has been dead code up to now because its only callers i.e. pkey_access/write_allow() are also unused. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index bed9d4de12b4..4b1ddb526228 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -418,7 +418,7 @@ void pkey_disable_clear(int pkey, int flags) pkey, pkey, pkey_rights); pkey_assert(pkey_rights >= 0); - pkey_rights |= flags; + pkey_rights &= ~flags; ret = hw_pkey_set(pkey, pkey_rights, 0); shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights); @@ -431,7 +431,7 @@ void pkey_disable_clear(int pkey, int flags) dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__, pkey, read_pkey_reg()); if (flags) - assert(read_pkey_reg() > orig_pkey_reg); + assert(read_pkey_reg() < orig_pkey_reg); } void pkey_write_allow(int pkey) -- 2.17.1
[PATCH v18 00/24] selftests, powerpc, x86: Memory Protection Keys
Memory protection keys enables an application to protect its address space from inadvertent access by its own code. This feature is now enabled on powerpc and has been available since 4.16-rc1. The patches move the selftests to arch neutral directory and enhance their test coverage. Tested on powerpc64 and x86_64 (Skylake-SP). Link to development branch: https://github.com/sandip4n/linux/tree/pkey-selftests Changelog - Link to previous version (v17): https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=154174 v18: (1) Fixed issues with x86 multilib builds based on feedback from Dave. (2) Moved patch 2 to the end of the series. v17: (1) Fixed issues with i386 builds when running on x86_64 based on feedback from Dave. (2) Replaced patch 6 from previous version with patch 7. This addresses u64 format specifier related concerns that Michael had raised in v15. v16: (1) Rebased on top of latest master. (2) Switched to u64 instead of using an arch-dependent pkey_reg_t type for references to the pkey register based on suggestions from Dave, Michal and Michael. (3) Removed build time determination of page size based on suggestion from Michael. (4) Fixed comment before the definition of __page_o_noops() from patch 13 ("selftests/vm/pkeys: Introduce powerpc support"). v15: (1) Rebased on top of latest master. (2) Addressed review comments from Dave Hansen. (3) Moved code for getting or setting pkey bits to new helpers. These changes replace patch 7 of v14. (4) Added a fix which ensures that the correct count of reserved keys is used across different platforms. (5) Added a fix which ensures that the correct page size is used as powerpc supports both 4K and 64K pages. v14: (1) Incorporated another round of comments from Dave Hansen. v13: (1) Incorporated comments for Dave Hansen. (2) Added one more test for correct pkey-0 behavior. v12: (1) Fixed the offset of pkey field in the siginfo structure for x86_64 and powerpc. And tries to use the actual field if the headers have it defined. v11: (1) Fixed a deadlock in the ptrace testcase. v10 and prior: (1) Moved the testcase to arch neutral directory. (2) Split the changes into incremental patches. Desnes A. Nunes do Rosario (1): selftests/vm/pkeys: Fix number of reserved powerpc pkeys Ram Pai (16): selftests/x86/pkeys: Move selftests to arch-neutral directory selftests/vm/pkeys: Rename all references to pkru to a generic name selftests/vm/pkeys: Move generic definitions to header file selftests/vm/pkeys: Fix pkey_disable_clear() selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear() selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random selftests/vm/pkeys: Introduce generic pkey abstractions selftests/vm/pkeys: Introduce powerpc support selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust() selftests/vm/pkeys: Improve checks to determine pkey support selftests/vm/pkeys: Associate key on a mapped page and detect access violation selftests/vm/pkeys: Associate key on a mapped page and detect write violation selftests/vm/pkeys: Detect write violation on a mapped access-denied-key page selftests/vm/pkeys: Introduce a sub-page allocator selftests/vm/pkeys: Test correct behaviour of pkey-0 selftests/vm/pkeys: Override access right definitions on powerpc Sandipan Das (5): selftests: vm: pkeys: Use sane types for pkey register selftests: vm: pkeys: Add helpers for pkey bits selftests: vm: pkeys: Use the correct huge page size selftests: vm: pkeys: Use the correct page size on powerpc selftests: vm: pkeys: Fix multilib builds for x86 Thiago Jung Bauermann (2): selftests/vm/pkeys: Move some definitions to arch-specific header selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf() tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 73 ++ tools/testing/selftests/vm/pkey-helpers.h | 225 ++ tools/testing/selftests/vm/pkey-powerpc.h | 136 tools/testing/selftests/vm/pkey-x86.h | 181 + .../selftests/{x86 => vm}/protection_keys.c | 696 ++ tools/testing/selftests/x86/.gitignore| 1 - tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/pkey-helpers.h| 219 -- 9 files changed, 1002 insertions(+), 532 deletions(-) create mode 100644 tools/testing/selftests/vm/pkey-helpers.h create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h create mode 100644 tools/testing/selftests/vm/pkey-x86.h rename tools/testing/selftests/{x86 => vm}/protection_keys.c (74%) delete mode 100644 tools/test
[PATCH v18 07/24] selftests: vm: pkeys: Add helpers for pkey bits
This introduces some functions that help with setting or clearing bits of a particular pkey. This also adds an abstraction for getting a pkey's bit position in the pkey register as this may vary across architectures. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 22 ++ tools/testing/selftests/vm/pkey-x86.h| 5 +++ tools/testing/selftests/vm/protection_keys.c | 32 ++-- 3 files changed, 36 insertions(+), 23 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index dfbce49269ce..0e3da7c8d628 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -80,6 +80,28 @@ extern void abort_hooks(void); #error Architecture not supported #endif /* arch */ +#define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) + +static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags) +{ + u32 shift = pkey_bit_position(pkey); + /* mask out bits from pkey in old value */ + reg &= ~((u64)PKEY_MASK << shift); + /* OR in new bits for pkey */ + reg |= (flags & PKEY_MASK) << shift; + return reg; +} + +static inline u64 get_pkey_bits(u64 reg, int pkey) +{ + u32 shift = pkey_bit_position(pkey); + /* +* shift down the relevant bits to the lowest two, then +* mask off all the other higher bits +*/ + return ((reg >> shift) & PKEY_MASK); +} + extern u64 shadow_pkey_reg; static inline u64 _read_pkey_reg(int line) diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h index 6ffea27e2d2d..def2a1bcf6a5 100644 --- a/tools/testing/selftests/vm/pkey-x86.h +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -118,6 +118,11 @@ static inline int cpu_has_pku(void) return 1; } +static inline u32 pkey_bit_position(int pkey) +{ + return pkey * PKEY_BITS_PER_PKEY; +} + #define XSTATE_PKEY_BIT(9) #define XSTATE_PKEY0x200 diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index efa35cc6f6b9..bed9d4de12b4 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -334,25 +334,13 @@ pid_t fork_lazy_child(void) static u32 hw_pkey_get(int pkey, unsigned long flags) { - u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE); u64 pkey_reg = __read_pkey_reg(); - u64 shifted_pkey_reg; - u32 masked_pkey_reg; dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n", __func__, pkey, flags, 0, 0); dprintf2("%s() raw pkey_reg: %016llx\n", __func__, pkey_reg); - shifted_pkey_reg = (pkey_reg >> (pkey * PKEY_BITS_PER_PKEY)); - dprintf2("%s() shifted_pkey_reg: %016llx\n", __func__, - shifted_pkey_reg); - masked_pkey_reg = shifted_pkey_reg & mask; - dprintf2("%s() masked pkey_reg: %x\n", __func__, masked_pkey_reg); - /* -* shift down the relevant bits to the lowest two, then -* mask off all the other high bits. -*/ - return masked_pkey_reg; + return (u32) get_pkey_bits(pkey_reg, pkey); } static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags) @@ -364,12 +352,8 @@ static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags) /* make sure that 'rights' only contains the bits we expect: */ assert(!(rights & ~mask)); - /* copy old pkey_reg */ - new_pkey_reg = old_pkey_reg; - /* mask out bits from pkey in old value: */ - new_pkey_reg &= ~(mask << (pkey * PKEY_BITS_PER_PKEY)); - /* OR in new bits for pkey: */ - new_pkey_reg |= (rights << (pkey * PKEY_BITS_PER_PKEY)); + /* modify bits accordingly in old pkey_reg and assign it */ + new_pkey_reg = set_pkey_bits(old_pkey_reg, pkey, rights); __write_pkey_reg(new_pkey_reg); @@ -403,7 +387,7 @@ void pkey_disable_set(int pkey, int flags) ret = hw_pkey_set(pkey, pkey_rights, syscall_flags); assert(!ret); /* pkey_reg and flags have the same format */ - shadow_pkey_reg |= flags << (pkey * 2); + shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights); dprintf1("%s(%d) shadow: 0x%016llx\n", __func__, pkey, shadow_pkey_reg); @@ -437,7 +421,7 @@ void pkey_disable_clear(int pkey, int flags) pkey_rights |= flags; ret = hw_pkey_set(pkey, pkey_rights, 0); - shadow_pkey_reg &= ~(flags << (pkey * 2)); + shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights); pkey_assert(ret >= 0); pkey_rights = hw_pkey_get(pkey, syscall_flags); @@ -513,7 +497,8 @@ int alloc_pkey(void) shadow_pkey_reg); if (ret) { /* clear both the bits: */ - shadow_pkey_reg &= ~(0x3
[PATCH v18 06/24] selftests: vm: pkeys: Use sane types for pkey register
The size of the pkey register can vary across architectures. This converts the data type of all its references to u64 in preparation for multi-arch support. To keep the definition of the u64 type consistent and remove format specifier related warnings, __SANE_USERSPACE_TYPES__ is defined as suggested by Michael Ellerman. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 31 +++ tools/testing/selftests/vm/pkey-x86.h| 8 +- tools/testing/selftests/vm/protection_keys.c | 86 3 files changed, 72 insertions(+), 53 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 7f18a82e54fc..dfbce49269ce 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -14,10 +14,10 @@ #include /* Define some kernel-like types */ -#define u8 uint8_t -#define u16 uint16_t -#define u32 uint32_t -#define u64 uint64_t +#define u8 __u8 +#define u16 __u16 +#define u32 __u32 +#define u64 __u64 #define PTR_ERR_ENOTSUP ((void *)-ENOTSUP) @@ -80,13 +80,14 @@ extern void abort_hooks(void); #error Architecture not supported #endif /* arch */ -extern unsigned int shadow_pkey_reg; +extern u64 shadow_pkey_reg; -static inline unsigned int _read_pkey_reg(int line) +static inline u64 _read_pkey_reg(int line) { - unsigned int pkey_reg = __read_pkey_reg(); + u64 pkey_reg = __read_pkey_reg(); - dprintf4("read_pkey_reg(line=%d) pkey_reg: %x shadow: %x\n", + dprintf4("read_pkey_reg(line=%d) pkey_reg: %016llx" + " shadow: %016llx\n", line, pkey_reg, shadow_pkey_reg); assert(pkey_reg == shadow_pkey_reg); @@ -95,15 +96,15 @@ static inline unsigned int _read_pkey_reg(int line) #define read_pkey_reg() _read_pkey_reg(__LINE__) -static inline void write_pkey_reg(unsigned int pkey_reg) +static inline void write_pkey_reg(u64 pkey_reg) { - dprintf4("%s() changing %08x to %08x\n", __func__, + dprintf4("%s() changing %016llx to %016llx\n", __func__, __read_pkey_reg(), pkey_reg); /* will do the shadow check for us: */ read_pkey_reg(); __write_pkey_reg(pkey_reg); shadow_pkey_reg = pkey_reg; - dprintf4("%s(%08x) pkey_reg: %08x\n", __func__, + dprintf4("%s(%016llx) pkey_reg: %016llx\n", __func__, pkey_reg, __read_pkey_reg()); } @@ -113,7 +114,7 @@ static inline void write_pkey_reg(unsigned int pkey_reg) */ static inline void __pkey_access_allow(int pkey, int do_allow) { - unsigned int pkey_reg = read_pkey_reg(); + u64 pkey_reg = read_pkey_reg(); int bit = pkey * 2; if (do_allow) @@ -121,13 +122,13 @@ static inline void __pkey_access_allow(int pkey, int do_allow) else pkey_reg |= (1< #include #include @@ -48,7 +49,7 @@ int iteration_nr = 1; int test_nr; -unsigned int shadow_pkey_reg; +u64 shadow_pkey_reg; int dprint_in_signal; char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE]; @@ -163,7 +164,7 @@ void dump_mem(void *dumpme, int len_bytes) for (i = 0; i < len_bytes; i += sizeof(u64)) { u64 *ptr = (u64 *)(c + i); - dprintf1("dump[%03d][@%p]: %016jx\n", i, ptr, *ptr); + dprintf1("dump[%03d][@%p]: %016llx\n", i, ptr, *ptr); } } @@ -205,7 +206,8 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) dprint_in_signal = 1; dprintf1("===SIGSEGV\n"); - dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__, + dprintf1("%s()::%d, pkey_reg: 0x%016llx shadow: %016llx\n", + __func__, __LINE__, __read_pkey_reg(), shadow_pkey_reg); trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO]; @@ -213,8 +215,9 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) fpregset = uctxt->uc_mcontext.fpregs; fpregs = (void *)fpregset; - dprintf2("%s() trapno: %d ip: 0x%lx info->si_code: %s/%d\n", __func__, - trapno, ip, si_code_str(si->si_code), si->si_code); + dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n", + __func__, trapno, ip, si_code_str(si->si_code), + si->si_code); #ifdef __i386__ /* * 32-bit has some extra padding so that userspace can tell whether @@ -256,8 +259,9 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) * need __read_pkey_reg() version so we do not do shadow_pkey_reg * checking */ - dprintf1("signal pkey_reg from pkey_reg: %08x\n", __read_pkey_reg()); - dprintf1("pkey from siginfo: %jx\n", siginfo_pkey); + dprintf1("signal pkey_reg from pkey_reg: %016llx\n", + __read_pkey_reg());
[PATCH v18 05/24] selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf()
From: Thiago Jung Bauermann This will help us ensure we print pkey_reg_t values correctly in different architectures. Signed-off-by: Thiago Jung Bauermann Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h | 4 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 3ed2f021bf7a..7f18a82e54fc 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -27,6 +27,10 @@ #define DPRINT_IN_SIGNAL_BUF_SIZE 4096 extern int dprint_in_signal; extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE]; + +#ifdef __GNUC__ +__attribute__((format(printf, 1, 2))) +#endif static inline void sigsafe_printf(const char *format, ...) { va_list ap; -- 2.17.1
[PATCH v18 04/24] selftests/vm/pkeys: Move some definitions to arch-specific header
From: Thiago Jung Bauermann In preparation for multi-arch support, move definitions which have arch-specific values to x86-specific header. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 111 + tools/testing/selftests/vm/pkey-x86.h| 156 +++ tools/testing/selftests/vm/protection_keys.c | 47 -- 3 files changed, 162 insertions(+), 152 deletions(-) create mode 100644 tools/testing/selftests/vm/pkey-x86.h diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 6ad1bd54ef94..3ed2f021bf7a 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -21,9 +21,6 @@ #define PTR_ERR_ENOTSUP ((void *)-ENOTSUP) -#define NR_PKEYS 16 -#define PKEY_BITS_PER_PKEY 2 - #ifndef DEBUG_LEVEL #define DEBUG_LEVEL 0 #endif @@ -73,19 +70,13 @@ extern void abort_hooks(void); } \ } while (0) +#if defined(__i386__) || defined(__x86_64__) /* arch */ +#include "pkey-x86.h" +#else /* arch */ +#error Architecture not supported +#endif /* arch */ + extern unsigned int shadow_pkey_reg; -static inline unsigned int __read_pkey_reg(void) -{ - unsigned int eax, edx; - unsigned int ecx = 0; - unsigned int pkey_reg; - - asm volatile(".byte 0x0f,0x01,0xee\n\t" -: "=a" (eax), "=d" (edx) -: "c" (ecx)); - pkey_reg = eax; - return pkey_reg; -} static inline unsigned int _read_pkey_reg(int line) { @@ -100,19 +91,6 @@ static inline unsigned int _read_pkey_reg(int line) #define read_pkey_reg() _read_pkey_reg(__LINE__) -static inline void __write_pkey_reg(unsigned int pkey_reg) -{ - unsigned int eax = pkey_reg; - unsigned int ecx = 0; - unsigned int edx = 0; - - dprintf4("%s() changing %08x to %08x\n", __func__, - __read_pkey_reg(), pkey_reg); - asm volatile(".byte 0x0f,0x01,0xef\n\t" -: : "a" (eax), "c" (ecx), "d" (edx)); - assert(pkey_reg == __read_pkey_reg()); -} - static inline void write_pkey_reg(unsigned int pkey_reg) { dprintf4("%s() changing %08x to %08x\n", __func__, @@ -157,83 +135,6 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write) dprintf4("pkey_reg now: %08x\n", read_pkey_reg()); } -#define PAGE_SIZE 4096 -#define MB (1<<20) - -static inline void __cpuid(unsigned int *eax, unsigned int *ebx, - unsigned int *ecx, unsigned int *edx) -{ - /* ecx is often an input as well as an output. */ - asm volatile( - "cpuid;" - : "=a" (*eax), - "=b" (*ebx), - "=c" (*ecx), - "=d" (*edx) - : "0" (*eax), "2" (*ecx)); -} - -/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */ -#define X86_FEATURE_PKU(1<<3) /* Protection Keys for Userspace */ -#define X86_FEATURE_OSPKE (1<<4) /* OS Protection Keys Enable */ - -static inline int cpu_has_pku(void) -{ - unsigned int eax; - unsigned int ebx; - unsigned int ecx; - unsigned int edx; - - eax = 0x7; - ecx = 0x0; - __cpuid(&eax, &ebx, &ecx, &edx); - - if (!(ecx & X86_FEATURE_PKU)) { - dprintf2("cpu does not have PKU\n"); - return 0; - } - if (!(ecx & X86_FEATURE_OSPKE)) { - dprintf2("cpu does not have OSPKE\n"); - return 0; - } - return 1; -} - -#define XSTATE_PKEY_BIT(9) -#define XSTATE_PKEY0x200 - -int pkey_reg_xstate_offset(void) -{ - unsigned int eax; - unsigned int ebx; - unsigned int ecx; - unsigned int edx; - int xstate_offset; - int xstate_size; - unsigned long XSTATE_CPUID = 0xd; - int leaf; - - /* assume that XSTATE_PKEY is set in XCR0 */ - leaf = XSTATE_PKEY_BIT; - { - eax = XSTATE_CPUID; - ecx = leaf; - __cpuid(&eax, &ebx, &ecx, &edx); - - if (leaf == XSTATE_PKEY_BIT) { - xstate_offset = ebx; - xstate_size = eax; - } - } - - if (xstate_size == 0) { - printf("could not find size/offset of PKEY in xsave state\n"); - return 0; - } - - return xstate_offset; -} - #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) #define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1)) #define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1)) diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h new file mode 100644 index ..2f04ade8ca9c --- /dev/null +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -0,0 +1,15
[PATCH v18 03/24] selftests/vm/pkeys: Move generic definitions to header file
From: Ram Pai Moved all the generic definition and helper functions to the header file. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 35 +--- tools/testing/selftests/vm/protection_keys.c | 27 --- 2 files changed, 30 insertions(+), 32 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index d5779be4793f..6ad1bd54ef94 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -13,6 +13,14 @@ #include #include +/* Define some kernel-like types */ +#define u8 uint8_t +#define u16 uint16_t +#define u32 uint32_t +#define u64 uint64_t + +#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP) + #define NR_PKEYS 16 #define PKEY_BITS_PER_PKEY 2 @@ -53,6 +61,18 @@ static inline void sigsafe_printf(const char *format, ...) #define dprintf3(args...) dprintf_level(3, args) #define dprintf4(args...) dprintf_level(4, args) +extern void abort_hooks(void); +#define pkey_assert(condition) do {\ + if (!(condition)) { \ + dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \ + __FILE__, __LINE__, \ + test_nr, iteration_nr); \ + dprintf0("errno at assert: %d", errno); \ + abort_hooks(); \ + exit(__LINE__); \ + } \ +} while (0) + extern unsigned int shadow_pkey_reg; static inline unsigned int __read_pkey_reg(void) { @@ -137,11 +157,6 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write) dprintf4("pkey_reg now: %08x\n", read_pkey_reg()); } -#define PROT_PKEY0 0x10/* protection key value (bit 0) */ -#define PROT_PKEY1 0x20/* protection key value (bit 1) */ -#define PROT_PKEY2 0x40/* protection key value (bit 2) */ -#define PROT_PKEY3 0x80/* protection key value (bit 3) */ - #define PAGE_SIZE 4096 #define MB (1<<20) @@ -219,4 +234,14 @@ int pkey_reg_xstate_offset(void) return xstate_offset; } +#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) +#define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1)) +#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1)) +#define ALIGN_PTR_UP(p, ptr_align_to) \ + ((typeof(p))ALIGN_UP((unsigned long)(p), ptr_align_to)) +#define ALIGN_PTR_DOWN(p, ptr_align_to)\ + ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to)) +#define __stringify_1(x...) #x +#define __stringify(x...) __stringify_1(x) + #endif /* _PKEYS_HELPER_H */ diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 2f4ab81c570d..42ffb58810f2 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -51,31 +51,10 @@ int test_nr; unsigned int shadow_pkey_reg; #define HPAGE_SIZE (1UL<<21) -#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) -#define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1)) -#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1)) -#define ALIGN_PTR_UP(p, ptr_align_to) ((typeof(p))ALIGN_UP((unsigned long)(p),ptr_align_to)) -#define ALIGN_PTR_DOWN(p, ptr_align_to) ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to)) -#define __stringify_1(x...) #x -#define __stringify(x...) __stringify_1(x) - -#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP) int dprint_in_signal; char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE]; -extern void abort_hooks(void); -#define pkey_assert(condition) do {\ - if (!(condition)) { \ - dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \ - __FILE__, __LINE__, \ - test_nr, iteration_nr); \ - dprintf0("errno at assert: %d", errno); \ - abort_hooks(); \ - exit(__LINE__); \ - } \ -} while (0) - void cat_into_file(char *str, char *file) { int fd = open(file, O_RDWR); @@ -186,12 +165,6 @@ void lots_o_noops_around_write(int *write_to_me) dprintf3("%s() done\n", __func__); } -/* Define some kernel-like types */ -#define u8 uint8_t -#define u16 uint16_t -#define u32 uint32_t -#define u64 uint64_t - #ifdef __i386__ #ifndef SYS_mprotect_key -- 2.17.1
[PATCH v18 02/24] selftests/vm/pkeys: Rename all references to pkru to a generic name
From: Ram Pai This renames PKRU references to "pkey_reg" or "pkey" based on the usage. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Reviewed-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 85 +++ tools/testing/selftests/vm/protection_keys.c | 240 ++- 2 files changed, 170 insertions(+), 155 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 254e5436bdd9..d5779be4793f 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -14,7 +14,7 @@ #include #define NR_PKEYS 16 -#define PKRU_BITS_PER_PKEY 2 +#define PKEY_BITS_PER_PKEY 2 #ifndef DEBUG_LEVEL #define DEBUG_LEVEL 0 @@ -53,85 +53,88 @@ static inline void sigsafe_printf(const char *format, ...) #define dprintf3(args...) dprintf_level(3, args) #define dprintf4(args...) dprintf_level(4, args) -extern unsigned int shadow_pkru; -static inline unsigned int __rdpkru(void) +extern unsigned int shadow_pkey_reg; +static inline unsigned int __read_pkey_reg(void) { unsigned int eax, edx; unsigned int ecx = 0; - unsigned int pkru; + unsigned int pkey_reg; asm volatile(".byte 0x0f,0x01,0xee\n\t" : "=a" (eax), "=d" (edx) : "c" (ecx)); - pkru = eax; - return pkru; + pkey_reg = eax; + return pkey_reg; } -static inline unsigned int _rdpkru(int line) +static inline unsigned int _read_pkey_reg(int line) { - unsigned int pkru = __rdpkru(); + unsigned int pkey_reg = __read_pkey_reg(); - dprintf4("rdpkru(line=%d) pkru: %x shadow: %x\n", - line, pkru, shadow_pkru); - assert(pkru == shadow_pkru); + dprintf4("read_pkey_reg(line=%d) pkey_reg: %x shadow: %x\n", + line, pkey_reg, shadow_pkey_reg); + assert(pkey_reg == shadow_pkey_reg); - return pkru; + return pkey_reg; } -#define rdpkru() _rdpkru(__LINE__) +#define read_pkey_reg() _read_pkey_reg(__LINE__) -static inline void __wrpkru(unsigned int pkru) +static inline void __write_pkey_reg(unsigned int pkey_reg) { - unsigned int eax = pkru; + unsigned int eax = pkey_reg; unsigned int ecx = 0; unsigned int edx = 0; - dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru); + dprintf4("%s() changing %08x to %08x\n", __func__, + __read_pkey_reg(), pkey_reg); asm volatile(".byte 0x0f,0x01,0xef\n\t" : : "a" (eax), "c" (ecx), "d" (edx)); - assert(pkru == __rdpkru()); + assert(pkey_reg == __read_pkey_reg()); } -static inline void wrpkru(unsigned int pkru) +static inline void write_pkey_reg(unsigned int pkey_reg) { - dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru); + dprintf4("%s() changing %08x to %08x\n", __func__, + __read_pkey_reg(), pkey_reg); /* will do the shadow check for us: */ - rdpkru(); - __wrpkru(pkru); - shadow_pkru = pkru; - dprintf4("%s(%08x) pkru: %08x\n", __func__, pkru, __rdpkru()); + read_pkey_reg(); + __write_pkey_reg(pkey_reg); + shadow_pkey_reg = pkey_reg; + dprintf4("%s(%08x) pkey_reg: %08x\n", __func__, + pkey_reg, __read_pkey_reg()); } /* * These are technically racy. since something could - * change PKRU between the read and the write. + * change PKEY register between the read and the write. */ static inline void __pkey_access_allow(int pkey, int do_allow) { - unsigned int pkru = rdpkru(); + unsigned int pkey_reg = read_pkey_reg(); int bit = pkey * 2; if (do_allow) - pkru &= (1<>>>===SIGSEGV\n"); - dprintf1("%s()::%d, pkru: 0x%x shadow: %x\n", __func__, __LINE__, - __rdpkru(), shadow_pkru); + dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__, + __read_pkey_reg(), shadow_pkey_reg); trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO]; ip = uctxt->uc_mcontext.gregs[REG_IP_IDX]; @@ -289,19 +289,19 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) */ fpregs += 0x70; #endif - pkru_offset = pkru_xstate_offset(); - pkru_ptr = (void *)(&fpregs[pkru_offset]); + pkey_reg_offset = pkey_reg_xstate_offset(); + pkey_reg_ptr = (void *)(&fpregs[pkey_reg_offset]); dprintf1("siginfo: %p\n", si); dprintf1(" fpregs: %p\n", fpregs); /* -* If we got a PKRU fault, we *HAVE* to have at least one bit set in +* If we got a PKEY fault, we *HAVE* to have at least one bit set in * here. */ - dprintf1("pkru_xstate_offset: %d\n", pkru_
[PATCH v18 01/24] selftests/x86/pkeys: Move selftests to arch-neutral directory
From: Ram Pai cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Acked-by: Ingo Molnar Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/{x86 => vm}/pkey-helpers.h| 0 tools/testing/selftests/{x86 => vm}/protection_keys.c | 0 tools/testing/selftests/x86/.gitignore| 1 - tools/testing/selftests/x86/Makefile | 2 +- 6 files changed, 3 insertions(+), 2 deletions(-) rename tools/testing/selftests/{x86 => vm}/pkey-helpers.h (100%) rename tools/testing/selftests/{x86 => vm}/protection_keys.c (100%) diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 31b3c98b6d34..c55837bf39fa 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -14,3 +14,4 @@ virtual_address_range gup_benchmark va_128TBswitch map_fixed_noreplace +protection_keys diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 7f9a8a8c31da..4e9c741be6af 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -18,6 +18,7 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd +TEST_GEN_FILES += protection_keys ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch diff --git a/tools/testing/selftests/x86/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h similarity index 100% rename from tools/testing/selftests/x86/pkey-helpers.h rename to tools/testing/selftests/vm/pkey-helpers.h diff --git a/tools/testing/selftests/x86/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c similarity index 100% rename from tools/testing/selftests/x86/protection_keys.c rename to tools/testing/selftests/vm/protection_keys.c diff --git a/tools/testing/selftests/x86/.gitignore b/tools/testing/selftests/x86/.gitignore index 7757f73ff9a3..eb30ffd83876 100644 --- a/tools/testing/selftests/x86/.gitignore +++ b/tools/testing/selftests/x86/.gitignore @@ -11,5 +11,4 @@ ldt_gdt iopl mpx-mini-test ioperm -protection_keys test_vdso diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 5d49bfec1e9a..5f16821c7f63 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -12,7 +12,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ - protection_keys test_vdso test_vsyscall mov_ss_trap \ + test_vdso test_vsyscall mov_ss_trap \ syscall_arg_fault TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ -- 2.17.1
[PATCH 1/2] pseries/vio: Remove stray #ifdef CONFIG_PPC_PSERIES
vio.c requires CONFIG_IBMVIO which in turn depends on PPC_PSERIES. In other words, this ifdef is pointless. At a guess it's a carry-over from pre-history. Signed-off-by: Oliver O'Halloran --- arch/powerpc/platforms/pseries/vio.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c index f682b7b..37f1f25 100644 --- a/arch/powerpc/platforms/pseries/vio.c +++ b/arch/powerpc/platforms/pseries/vio.c @@ -1628,7 +1628,6 @@ const void *vio_get_attribute(struct vio_dev *vdev, char *which, int *length) } EXPORT_SYMBOL(vio_get_attribute); -#ifdef CONFIG_PPC_PSERIES /* vio_find_name() - internal because only vio.c knows how we formatted the * kobject name */ @@ -1698,7 +1697,6 @@ int vio_disable_interrupts(struct vio_dev *dev) return rc; } EXPORT_SYMBOL(vio_disable_interrupts); -#endif /* CONFIG_PPC_PSERIES */ static int __init vio_init(void) { -- 2.9.5
[PATCH 2/2] pseries/makefile: Remove CONFIG_PPC_PSERIES check
The platform makefile (arch/powerpc/platforms/pseries/Makefile) is only included by the platform makefile (arch/powerpc/platform/Makefile) when CONFIG_PPC_PSERIES is selected, so checking for CONFIG_PPC_PSERIES in the pseries makefile is pointless. Signed-off-by: Oliver O'Halloran --- arch/powerpc/platforms/pseries/Makefile | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile index a3c74a5..c8a2b0b 100644 --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -29,6 +29,4 @@ obj-$(CONFIG_PPC_SPLPAR) += vphn.o obj-$(CONFIG_PPC_SVM) += svm.o obj-$(CONFIG_FA_DUMP) += rtas-fadump.o -ifdef CONFIG_PPC_PSERIES obj-$(CONFIG_SUSPEND) += suspend.o -endif -- 2.9.5
Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys
Hi Dave, On 30/01/20 12:29 am, Dave Hansen wrote: > On 1/28/20 1:38 AM, Sandipan Das wrote: >> On 27/01/20 9:12 pm, Dave Hansen wrote: >>> How have you tested this patch (and the whole series for that matter)? >>> >> I replaced the second patch with this one and did a build test. >> Till v16, I had tested the whole series (build + run) on both a POWER8 >> system (with 4K and 64K page sizes) and a Skylake SP system but for >> x86_64 only. > > Do you have any idea why I was seeing x86 build errors and you were not? > There were problems with patch 2 from v17. The fixed patch is what I replied with previously in this thread. The test results that I posted were with that patch included. Will post out v18 today with the fix. - Sandipan
[PATCH] powerpc/papr_scm: Mark papr_scm_ndctl() as static
Function papr_scm_ndctl() is neither exported from the module nor called directly from outside 'papr.c' hence should be marked 'static'. Signed-off-by: Vaibhav Jain --- arch/powerpc/platforms/pseries/papr_scm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 0b4467e378e5..e4606100e286 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -246,8 +246,9 @@ static int papr_scm_meta_set(struct papr_scm_priv *p, return 0; } -int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, - unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc) +static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, + struct nvdimm *nvdimm, unsigned int cmd, void *buf, + unsigned int buf_len, int *cmd_rc) { struct nd_cmd_get_config_size *get_size_hdr; struct papr_scm_priv *p; -- 2.24.1
Re: [PATCH FIX] KVM: PPC: Book3S HV: Release lock on page-out failure path
On Wed, Jan 22, 2020 at 10:25:42AM +0530, Bharata B Rao wrote: > When migrate_vma_setup() fails in kvmppc_svm_page_out(), > release kvm->arch.uvmem_lock before returning. > > Fixes: ca9f4942670 ("KVM: PPC: Book3S HV: Support for running secure guests") > Signed-off-by: Bharata B Rao Thanks, applied to my kvm-ppc-next branch. Paul.
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
On Mon, 27 Jan 2020 22:33:08 -0500 Qian Cai wrote: > > > >> Did those tests ever find any regression or this is almost only useful for > >> new > > > > The test has already found problems with s390 page table helpers. > > Hmm, that is pretty weak where s390 is not even official supported with this > version. > I first had to get the three patches upstream, each fixing non-conform behavior on s390, and each issue was found by this extremely useful test: 2416cefc504b s390/mm: add mm_pxd_folded() checks to pxd_free() 2d1fc1eb9b54 s390/mm: simplify page table helpers for large entries 1c27a4bc817b s390/mm: make pmd/pud_bad() report large entries as bad I did not see any direct effect of this misbehavior yet, but I am very happy that this could be found and fixed in order to prevent future issues. And this is exactly the value of this test, to make sure that all architectures have a common understanding of how the various page table helpers are supposed to work. For example, who would have thought that pXd_bad() is supposed to report large entries as bad? It's not really documented anywhere, so we just checked them for sanity like normal entries, which apparently worked fine so far, but for how long?
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
On Tue, 28 Jan 2020 06:57:53 +0530 Anshuman Khandual wrote: > This adds tests which will validate architecture page table helpers and > other accessors in their compliance with expected generic MM semantics. > This will help various architectures in validating changes to existing > page table helpers or addition of new ones. > > This test covers basic page table entry transformations including but not > limited to old, young, dirty, clean, write, write protect etc at various > level along with populating intermediate entries with next page table page > and validating them. > > Test page table pages are allocated from system memory with required size > and alignments. The mapped pfns at page table levels are derived from a > real pfn representing a valid kernel text symbol. This test gets called > right after page_alloc_init_late(). > > This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with > CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to > select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and > arm64. Going forward, other architectures too can enable this after fixing > build or runtime problems (if any) with their page table helpers. > > Folks interested in making sure that a given platform's page table helpers > conform to expected generic MM semantics should enable the above config > which will just trigger this test during boot. Any non conformity here will > be reported as an warning which would need to be fixed. This test will help > catch any changes to the agreed upon semantics expected from generic MM and > enable platforms to accommodate it thereafter. > [...] > > Tested-by: Christophe Leroy #PPC32 Tested-by: Gerald Schaefer # s390 Thanks again for this effort, and for keeping up the spirit against all odds and even after 12 iterations :-) > > diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > new file mode 100644 > index ..f3f8111edbe3 > --- /dev/null > +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > @@ -0,0 +1,35 @@ > +# > +# Feature name: debug-vm-pgtable > +# Kconfig: ARCH_HAS_DEBUG_VM_PGTABLE > +# description: arch supports pgtable tests for semantics compliance > +# > +--- > +| arch |status| > +--- > +| alpha: | TODO | > +| arc: | ok | > +| arm: | TODO | > +| arm64: | ok | > +| c6x: | TODO | > +|csky: | TODO | > +| h8300: | TODO | > +| hexagon: | TODO | > +|ia64: | TODO | > +|m68k: | TODO | > +| microblaze: | TODO | > +|mips: | TODO | > +| nds32: | TODO | > +| nios2: | TODO | > +|openrisc: | TODO | > +| parisc: | TODO | > +| powerpc/32: | ok | > +| powerpc/64: | TODO | > +| riscv: | TODO | > +|s390: | TODO | s390 is ok now, with my patches included in v5.5-rc1. So you can now add --- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt @@ -25,7 +25,7 @@ | powerpc/32: | ok | | powerpc/64: | TODO | | riscv: | TODO | -|s390: | TODO | +|s390: | ok | | sh: | TODO | | sparc: | TODO | | um: | TODO | --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -59,6 +59,7 @@ config KASAN_SHADOW_OFFSET config S390 def_bool y select ARCH_BINFMT_ELF_STATE + select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE
Re: [PATCH] powerpc/process: Remove unneccessary #ifdef CONFIG_PPC64 in copy_thread_tls()
On Wed, Jan 29, 2020 at 07:50:07PM +, Christophe Leroy wrote: > is_32bit_task() exists on both PPC64 and PPC32, no need of an ifdefery. > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/kernel/process.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index fad50db9dcf2..e730b8e522b0 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -1634,11 +1634,9 @@ int copy_thread_tls(unsigned long clone_flags, > unsigned long usp, > p->thread.regs = childregs; > childregs->gpr[3] = 0; /* Result from fork() */ > if (clone_flags & CLONE_SETTLS) { > -#ifdef CONFIG_PPC64 > if (!is_32bit_task()) > childregs->gpr[13] = tls; > else > -#endif > childregs->gpr[2] = tls; > } > Reviewed-by: Michal Suchanek Thanks Michal
[PATCH] powerpc/process: Remove unneccessary #ifdef CONFIG_PPC64 in copy_thread_tls()
is_32bit_task() exists on both PPC64 and PPC32, no need of an ifdefery. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/process.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index fad50db9dcf2..e730b8e522b0 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1634,11 +1634,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp, p->thread.regs = childregs; childregs->gpr[3] = 0; /* Result from fork() */ if (clone_flags & CLONE_SETTLS) { -#ifdef CONFIG_PPC64 if (!is_32bit_task()) childregs->gpr[13] = tls; else -#endif childregs->gpr[2] = tls; } -- 2.25.0
Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys
* Dave Hansen: > Still doesn't build for me: > >> # make >> make --no-builtin-rules ARCH=x86_64 -C ../../../.. headers_install >> make[1]: Entering directory '/home/dave/linux.git' >> INSTALL ./usr/include >> make[1]: Leaving directory '/home/dave/linux.git' >> make: *** No rule to make target >> '/home/dave/linux.git/tools/testing/selftests/vm/protection_keys_32', needed >> by 'all'. Stop. Do you have 32-bit libraries installed? Thanks, Florian
Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys
On 1/28/20 1:38 AM, Sandipan Das wrote: > On 27/01/20 9:12 pm, Dave Hansen wrote: >> How have you tested this patch (and the whole series for that matter)? >> > I replaced the second patch with this one and did a build test. > Till v16, I had tested the whole series (build + run) on both a POWER8 > system (with 4K and 64K page sizes) and a Skylake SP system but for > x86_64 only. Do you have any idea why I was seeing x86 build errors and you were not?
Re: [PATCH] powerpc/drmem: cache LMBs in xarray to accelerate lookup
On Tue, Jan 28, 2020 at 05:56:55PM -0600, Nathan Lynch wrote: > Scott Cheloha writes: > > LMB lookup is currently an O(n) linear search. This scales poorly when > > there are many LMBs. > > > > If we cache each LMB by both its base address and its DRC index > > in an xarray we can cut lookups to O(log n), greatly accelerating > > drmem initialization and memory hotplug. > > > > This patch introduces two xarrays of of LMBs and fills them during > > drmem initialization. The patch also adds two interfaces for LMB > > lookup. > > Good but can you replace the array of LMBs altogether > (drmem_info->lmbs)? xarray allows iteration over the members if needed. I don't think we can without potentially changing the current behavior. The current behavior in dlpar_memory_{add,remove}_by_ic() is to advance linearly through the array from the LMB with the matching DRC index. Iteration through the xarray via xa_for_each_start() will return LMBs indexed with monotonically increasing DRC indices. Are they equivalent? Or can we have an LMB with a smaller DRC index appear at a greater offset in the array? If the following condition is possible: drmem_info->lmbs[i].drc_index > drmem_info->lmbs[j].drc_index where i < j, then we have a possible behavior change because xa_for_each_start() may not return a contiguous array slice. It might "leap backwards" in the array. Or it might skip over a chunk of LMBs.
Re: powerpc Linux scv support and scv system call ABI proposal
On Wed, Jan 29, 2020 at 06:02:34PM +0100, Florian Weimer wrote: > * Segher Boessenkool: > > > On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote: > >> * Segher Boessenkool: > >> >> But GCC doesn't expose them as integers to C code, so you can't do much > >> >> without them. > >> > > >> > Sure, it doesn't expose any other registers directly, either. > >> > >> I can use r0 & 1 with a register variable r0 to check a bit. > > > > That is not reliable, or supported, and it *will* break. This is > > explicit for local register asm, and global register asm is > > underdefined. > > Ugh. I did not know that. And neither did the person who wrote > powerpc64/sysdep.h because it uses register variables in regular C > expressions. 8-( Other architectures are affected as well. Where? I don't see any? Ah, the other one, heh (there are two). No, that *is* supported: as input to or output from an asm, a local register asm variable *is* guaranteed to live in the specified register. This is the *only* supported use. Other uses may sometimes still work, but they never worked reliably, and it cannot be made reliable; it has been documented as not supported since ages, and it will not work at all anymore some day. Segher
Re: powerpc Linux scv support and scv system call ABI proposal
* Segher Boessenkool: > On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote: >> * Segher Boessenkool: >> >> But GCC doesn't expose them as integers to C code, so you can't do much >> >> without them. >> > >> > Sure, it doesn't expose any other registers directly, either. >> >> I can use r0 & 1 with a register variable r0 to check a bit. > > That is not reliable, or supported, and it *will* break. This is > explicit for local register asm, and global register asm is > underdefined. Ugh. I did not know that. And neither did the person who wrote powerpc64/sysdep.h because it uses register variables in regular C expressions. 8-( Other architectures are affected as well. One set of issues is less of a problem if system call arguments are variables and not complex expressions, so that side effects do not clobber registers in the initialization: register long __a0 asm ("$4") = (long) (arg1); But I wasn't aware of that constraint on the macro users at all. Thanks, Florian
Re: powerpc Linux scv support and scv system call ABI proposal
On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote: > * Segher Boessenkool: > >> But GCC doesn't expose them as integers to C code, so you can't do much > >> without them. > > > > Sure, it doesn't expose any other registers directly, either. > > I can use r0 & 1 with a register variable r0 to check a bit. That is not reliable, or supported, and it *will* break. This is explicit for local register asm, and global register asm is underdefined. > I don't > think writing a similar check against a condition register is possible > today. That's right. You cannot express CCmode (MODE_CC) values in C at all. Segher
Re: powerpc Linux scv support and scv system call ABI proposal
* Segher Boessenkool: > On Tue, Jan 28, 2020 at 05:04:49PM +0100, Florian Weimer wrote: >> * Segher Boessenkool: >> >> >> > I don't think we can save LR in a regular register around the system >> >> > call, explicitly in the inline asm statement, because we still have to >> >> > generate proper unwinding information using CFI directives, something >> >> > that you cannot do from within the asm statement. >> > >> > Why not? >> >> As far as I knowm there isn't a CFI directive that allows us to restore >> the CFI state at the end of the inline assembly. If we say that LR is >> stored in a different register than what the rest of the function uses, >> that would lead to incorrect CFI after the exit of the inline assembler >> fragment. >> >> At least that's what I think. Compilers aren't really my thing. > > .cfi_restore? Or .cfi_remember_state / .cfi_restore_state, that is > probably easiest in inline assembler. Oh, right, .cfi_remember_state and .cfi_restore_state should work, as long as -fno-dwarf2-cfi-asm isn't used (but that's okay given that we need this only for the glibc build). But it looks like we can use an explicit "lr" clobber, so we should be good anyway. >> >> > GCC does not model the condition registers, >> > >> > Huh? It does model the condition register, as 8 registers in GCC's >> > internal model (one each for CR0..CR7). >> >> But GCC doesn't expose them as integers to C code, so you can't do much >> without them. > > Sure, it doesn't expose any other registers directly, either. I can use r0 & 1 with a register variable r0 to check a bit. I don't think writing a similar check against a condition register is possible today. Thanks, Florian
Re: powerpc Linux scv support and scv system call ABI proposal
Nicholas Piggin writes: > Adhemerval Zanella's on January 29, 2020 3:26 am: >> >> We already had to push a similar hack where glibc used to abort transactions >> prior syscalls to avoid some side-effects on kernel (commit 56cf2763819d2f). >> It was eventually removed from syscall handling by f0458cf4f9ff3d870, where >> we only enable TLE if kernel suppors PPC_FEATURE2_HTM_NOSC. >> >> The transaction syscall abort used to read a variable directly from TCB, >> so this could be an option. I would expect that we could optimize it where >> if glibc is building against a recent kernel and compiler is building >> for a ISA 3.0+ cpu we could remove the 'sc' code. >> > > We would just have to be careful of running on ISA 3.0 CPUs on older > kernels which do not support scv. Can we assume that, if a syscall is available through sc it's also available in scv 0? Because if that's true, I believe your suggestion to interpret PPC_FEATURE2_SCV as scv 0 support would be helpful to provide this support via IFUNC even when glibc is built using --with-cpu=power8, which is the most common scenario in ppc64le. In that scenario, it seems new HWCAP bits for new vectors wouldn't be too frequent, which was the only downside of this proposal. -- Tulio Magno
[RFC PATCH 6/6] powerpc/papr_scm: Implement support for DSM_PAPR_SCM_STATS
The DSM 'DSM_PAPR_SCM_STATS' should return the PAPR defined buffer that holds various dimm performance attributes as defined in Ref[1] back to user-space in response to ND_CMD_CALL. Presently the module doesn't interpret nor consume these stat as they are only intended to be consumer and reported by 'libndctl'[2]. PAPR sped [1] states that input buffer to be provided to H_SCM_PERFORMANCE_HEALTH hcall should be 4KiB in size. However due to limitations of the libnvdimm envelop (which is 256 bytes in size) such a large buffer cannot be copied back to user-space. Hence we do an optimization of querying the size of the output buffer from H_SCM_PERFORMANCE_HEALTH hcall and copy only the needed portion of the buffer to the user-space payload package. This patch implements this DSM by implementing a new function papr_scm_get_stats that queries the DIMM stat information and then copies PHYP provided buffer that holds these stat attributes to the package payload whose layout is defined by 'struct papr_scm_perf_stats' References: [1]: https://patchwork.ozlabs.org/patch/1154292/ [2]: https://github.com/pmem/ndctl Signed-off-by: Vaibhav Jain --- arch/powerpc/platforms/pseries/papr_scm.c | 51 +++ 1 file changed, 51 insertions(+) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 6c0bc8f027db..ac50968453b0 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -34,6 +34,7 @@ enum { DSM_PAPR_MIN = 0x1, DSM_PAPR_SCM_HEALTH, + DSM_PAPR_SCM_STATS, DSM_PAPR_MAX, }; @@ -410,6 +411,51 @@ static int papr_scm_get_health(struct papr_scm_priv *p, return 0; } +/* Fetch the DIMM stats and populate it in provided papr_scm package */ +static int papr_scm_get_stats(struct papr_scm_priv *p, + struct nd_pkg_papr_scm *pkg) +{ + struct papr_scm_perf_stats *retbuffer; + int rc; + size_t copysize; + + /* Return buffer for phyp where stats are written */ + retbuffer = kzalloc(PAPR_SCM_MAX_PERF_STAT, GFP_KERNEL); + + if (!retbuffer) + return -ENOMEM; + + rc = drc_pmem_query_stats(p, retbuffer); + if (rc) + goto out; + + /* +* Parse the retbuffer, fetch the size returned and return the +* first nd_size_out bytes back to userspce. +*/ + pkg->hdr.nd_fw_size = be16_to_cpu(retbuffer->size); + copysize = min_t(__u32, pkg->hdr.nd_fw_size, pkg->hdr.nd_size_out); + + memcpy(pkg->payload, retbuffer, copysize); + + /* Verify if the returned buffer was copied completely */ + if (pkg->hdr.nd_fw_size > copysize) { + rc = -ENOSPC; + goto out; + } + +out: + kfree(retbuffer); + /* +* Put the error in out package and return success from function +* so that errors if any are propogated back to userspace. +*/ + pkg->cmd_status = rc; + dev_dbg(&p->pdev->dev, "%s completion code = %d\n", __func__, rc); + + return 0; +} + int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc) { @@ -460,6 +506,11 @@ int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, *cmd_rc = papr_scm_get_health(p, call_pkg); break; + case DSM_PAPR_SCM_STATS: + call_pkg = (struct nd_pkg_papr_scm *) buf; + *cmd_rc = papr_scm_get_stats(p, call_pkg); + break; + default: dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd_in); *cmd_rc = -EINVAL; -- 2.24.1
[RFC PATCH 5/6] powerpc/papr_scm: Implement support for DSM_PAPR_SCM_HEALTH
The DSM 'DSM_PAPR_SCM_HEALTH' should return the health-bitmap and health-valid-bitmap information for a dimm back to userspace in response to ND_CMD_CALL. This patch implements this DSM by implementing a new function papr_scm_get_health() that queries the DIMM health information and then copies these bitmaps to the package payload whose layout is defined by 'struct papr_scm_ndctl_health'. Signed-off-by: Vaibhav Jain --- arch/powerpc/platforms/pseries/papr_scm.c | 46 +++ 1 file changed, 46 insertions(+) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index e1a2c0e61077..6c0bc8f027db 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -33,9 +33,16 @@ */ enum { DSM_PAPR_MIN = 0x1, + DSM_PAPR_SCM_HEALTH, DSM_PAPR_MAX, }; +/* Struct as returned by kernel in response to PAPR_DSM_PAPR_SMART_HEALTH */ +struct papr_scm_ndctl_health { + __be64 health_bitmap; + __be64 health_bitmap_valid; +} __packed; + /* Payload expected with ND_CMD_CALL ioctl from libnvdimm */ struct nd_pkg_papr_scm { struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */ @@ -369,6 +376,40 @@ static int cmd_to_func(struct nvdimm *nvdimm, unsigned int cmd, void *buf, return pkg->hdr.nd_command; } +/* Fetch the DIMM health info and populate it in provided papr_scm package */ +static int papr_scm_get_health(struct papr_scm_priv *p, + struct nd_pkg_papr_scm *pkg) +{ + int rc; + struct papr_scm_ndctl_health *health = + (struct papr_scm_ndctl_health *)pkg->payload; + + pkg->hdr.nd_fw_size = sizeof(struct papr_scm_ndctl_health); + + if (pkg->hdr.nd_size_out < sizeof(struct papr_scm_ndctl_health)) { + rc = -ENOSPC; + goto out; + } + + rc = drc_pmem_query_health(p); + if (rc) + goto out; + + /* Copy the health data to the payload */ + health->health_bitmap = p->health_bitmap; + health->health_bitmap_valid = p->health_bitmap_valid; + +out: + /* +* Put the error in out package and return success from function +* so that errors if any are propogated back to userspace. +*/ + pkg->cmd_status = rc; + dev_dbg(&p->pdev->dev, "%s completion code = %d\n", __func__, rc); + + return 0; +} + int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc) { @@ -414,6 +455,11 @@ int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, *cmd_rc = 0; break; + case DSM_PAPR_SCM_HEALTH: + call_pkg = (struct nd_pkg_papr_scm *) buf; + *cmd_rc = papr_scm_get_health(p, call_pkg); + break; + default: dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd_in); *cmd_rc = -EINVAL; -- 2.24.1
[RFC PATCH 4/6] powerpc/papr_scm: Add support for handling PAPR DSM commands
Implement support for handling PAPR DSM commands in papr_scm module. We advertise support for ND_CMD_CALL for the dimm command mask and implement necessary scaffolding in the module to handle ND_CMD_CALL ioctl and DSM commands that we receive. The layout of the DSM commands as we expect from libnvdimm/libndctl is defined in 'struct nd_pkg_papr_scm' which contains a 'struct nd_cmd_pkg' as header. This header is used to communicate the DSM command via 'nd_pkg_papr_scm->nd_command' and size of payload that need to be sent/received for servicing the DSM. The PAPR DSM commands are assigned indexes started from 0x1 to prevent them from overlapping ND_CMD_* values and also makes handling dimm commands in papr_scm_ndctl() easier via a simplified switch-case block. For this a new function cmd_to_func() is implemented that reads the args to papr_scm_ndctl() , performs sanity tests on them and converts them to PAPR DSM commands which can then be handled via the switch-case block. Signed-off-by: Vaibhav Jain --- arch/powerpc/platforms/pseries/papr_scm.c | 96 +-- 1 file changed, 89 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index deaece6e4d18..e1a2c0e61077 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -20,10 +20,29 @@ #define PAPR_SCM_DIMM_CMD_MASK \ ((1ul << ND_CMD_GET_CONFIG_SIZE) | \ (1ul << ND_CMD_GET_CONFIG_DATA) | \ -(1ul << ND_CMD_SET_CONFIG_DATA)) +(1ul << ND_CMD_SET_CONFIG_DATA) | \ +(1ul << ND_CMD_CALL)) #define PAPR_SCM_MAX_PERF_STAT 4096 +/* + * Sub commands for ND_CMD_CALL. To prevent overlap from ND_CMD_*, values for + * these enums start at 0x1. These values are then returned from + * cmd_to_func() making it easy to implement the switch-case block in + * papr_scm_ndctl() + */ +enum { + DSM_PAPR_MIN = 0x1, + DSM_PAPR_MAX, +}; + +/* Payload expected with ND_CMD_CALL ioctl from libnvdimm */ +struct nd_pkg_papr_scm { + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */ + uint32_t cmd_status;/* Out: Sub-cmd status returned back */ + uint8_t payload[]; /* Out: Sub-cmd data buffer */ +} __packed; + /* Buffer layout returned by phyp when reporting drc perf stats */ struct papr_scm_perf_stats { uint8_t version;/* Should be 0x01 */ @@ -303,19 +322,74 @@ static int papr_scm_meta_set(struct papr_scm_priv *p, return 0; } +/* + * Validate the input to dimm-control function and return papr_scm specific + * commands. This does sanity validation to ND_CMD_CALL sub-command packages. + */ +static int cmd_to_func(struct nvdimm *nvdimm, unsigned int cmd, void *buf, + unsigned int buf_len) +{ + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK; + struct nd_pkg_papr_scm *pkg = (struct nd_pkg_papr_scm *)buf; + + /* Only dimm-specific calls are supported atm */ + if (!nvdimm) + return -EINVAL; + + if (!test_bit(cmd, &cmd_mask)) { + + pr_debug("%s: Unsupported cmd=%u\n", __func__, cmd); + return -EINVAL; + + } else if (cmd != ND_CMD_CALL) { + + return cmd; + + } else if (buf_len < sizeof(struct nd_pkg_papr_scm)) { + + pr_debug("%s: Invalid pkg size=%u\n", __func__, buf_len); + return -EINVAL; + + } else if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) { + + pr_debug("%s: Invalid pkg family=0x%llx\n", __func__, +pkg->hdr.nd_family); + return -EINVAL; + + } else if (pkg->hdr.nd_command <= DSM_PAPR_MIN || + pkg->hdr.nd_command >= DSM_PAPR_MAX) { + + /* for unknown subcommands return ND_CMD_CALL */ + pr_debug("%s: Unknown sub-command=0x%llx\n", __func__, +pkg->hdr.nd_command); + return ND_CMD_CALL; + } + + /* Return the DSM_PAPR_SCM_* command */ + return pkg->hdr.nd_command; +} + int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc) { struct nd_cmd_get_config_size *get_size_hdr; struct papr_scm_priv *p; + struct nd_pkg_papr_scm *call_pkg = NULL; + int cmd_in, rc; - /* Only dimm-specific calls are supported atm */ - if (!nvdimm) - return -EINVAL; + /* Use a local variable in case cmd_rc pointer is NULL */ + if (cmd_rc == NULL) + cmd_rc = &rc; + + cmd_in = cmd_to_func(nvdimm, cmd, buf, buf_len); + if (cmd_in < 0) { + pr_debug("%s: Invalid cmd=%u. Err=%d\n", __func__, cmd, cmd_in); + return cmd_in; + } p = nvdimm_provider_data(nvdimm); - switch
[RFC PATCH 3/6] UAPI: ndctl: Introduce NVDIMM_FAMILY_PAPR as a new NVDIMM DSM family
Add PAPR-scm family of DSM command-set to to the white list of NVDIMM command sets. Signed-off-by: Vaibhav Jain --- include/uapi/linux/ndctl.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h index de5d90212409..0e09dc5cec19 100644 --- a/include/uapi/linux/ndctl.h +++ b/include/uapi/linux/ndctl.h @@ -244,6 +244,7 @@ struct nd_cmd_pkg { #define NVDIMM_FAMILY_HPE2 2 #define NVDIMM_FAMILY_MSFT 3 #define NVDIMM_FAMILY_HYPERV 4 +#define NVDIMM_FAMILY_PAPR 5 #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\ struct nd_cmd_pkg) -- 2.24.1
[RFC PATCH 2/6] powerpc/papr_scm: Fetch dimm performance stats from PHYP
Implement support for fetching dimm performance metrics via H_SCM_PERFORMANCE_HEALTH hcall as documented in Ref[1]. The hcall returns a structure as described in Ref[1] and defined as newly introduced 'struct papr_scm_perf_stats'. The struct has a header followed by key-value pairs of performance attributes. The module 'papr_scm' however doesn't interpret or parses these performance attributes as it will be returning this buffer as it is to user-space for further parsing. For now we only implement a dimm specific sysfs attribute named 'papr_stats_version' that returns the version of the structure returned from H_SCM_PERFORMANCE_HEALTH hcall. Hence this patch implements a new function drc_pmem_query_stats() that issues hcall H_SCM_PERFORMANCE_HEALTH ,requesting PHYP to store performance stats in pre-allocated 'struct papr_scm_perf_stats' buffer 'stats'. References: [1]: https://patchwork.ozlabs.org/patch/1154292/ Signed-off-by: Vaibhav Jain --- arch/powerpc/platforms/pseries/papr_scm.c | 56 +++ 1 file changed, 56 insertions(+) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 1a0cc66f3dc9..deaece6e4d18 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -22,6 +22,16 @@ (1ul << ND_CMD_GET_CONFIG_DATA) | \ (1ul << ND_CMD_SET_CONFIG_DATA)) +#define PAPR_SCM_MAX_PERF_STAT 4096 + +/* Buffer layout returned by phyp when reporting drc perf stats */ +struct papr_scm_perf_stats { + uint8_t version;/* Should be 0x01 */ + uint8_t reserved1; + __be16 size;/* Size of this struct in bytes */ + uint8_t buffer[]; /* Performance matrics */ +} __packed; + struct papr_scm_priv { struct platform_device *pdev; struct device_node *dn; @@ -148,6 +158,25 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p) return drc_pmem_bind(p); } +static int drc_pmem_query_stats(struct papr_scm_priv *p, + struct papr_scm_perf_stats *stats) +{ + unsigned long ret[PLPAR_HCALL_BUFSIZE]; + int64_t rc; + + if (!stats) + return -EINVAL; + + rc = plpar_hcall(H_SCM_PERFORMANCE_STATS, ret, p->drc_index, +__pa(stats)); + if (rc != H_SUCCESS) { + dev_err(&p->pdev->dev, +"Failed to query performance stats, Err:%lld\n", rc); + return -ENXIO; + } else + return 0; +} + static int drc_pmem_query_health(struct papr_scm_priv *p) { unsigned long ret[PLPAR_HCALL_BUFSIZE]; @@ -332,6 +361,32 @@ static inline int papr_scm_node(int node) return min_node; } +static ssize_t papr_stats_version_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvdimm *dimm = to_nvdimm(dev); + struct papr_scm_priv *p = nvdimm_provider_data(dimm); + struct papr_scm_perf_stats *retbuffer; + int rc; + + /* Return buffer for phyp where stats are written */ + retbuffer = kzalloc(PAPR_SCM_MAX_PERF_STAT, GFP_KERNEL); + if (!retbuffer) + return -ENOMEM; + + rc = drc_pmem_query_stats(p, retbuffer); + if (rc) + goto out; + else + rc = sprintf(buf, "%d\n", retbuffer->version); + +out: + kfree(retbuffer); + return rc; + +} +DEVICE_ATTR_RO(papr_stats_version); + static ssize_t papr_health_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -353,6 +408,7 @@ DEVICE_ATTR_RO(papr_health); /* papr_scm specific dimm attributes */ static struct attribute *papr_scm_nd_attributes[] = { &dev_attr_papr_health.attr, + &dev_attr_papr_stats_version.attr, NULL, }; -- 2.24.1
[RFC PATCH 1/6] powerpc/papr_scm: Provide support for fetching dimm health information
Implement support for fetching dimm health information via H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair of 64-bit big-endian integers which are then stored in 'struct papr_scm_priv' and subsequently exposed to userspace via dimm attribute 'papr_health'. References: [1]: https://patchwork.ozlabs.org/patch/1154292/ Signed-off-by: Vaibhav Jain --- arch/powerpc/platforms/pseries/papr_scm.c | 65 ++- 1 file changed, 63 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 0b4467e378e5..1a0cc66f3dc9 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -39,6 +39,10 @@ struct papr_scm_priv { struct resource res; struct nd_region *region; struct nd_interleave_set nd_set; + + /* Health information for the dimm */ + __be64 health_bitmap; + __be64 health_bitmap_valid; }; static int drc_pmem_bind(struct papr_scm_priv *p) @@ -144,6 +148,30 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p) return drc_pmem_bind(p); } +static int drc_pmem_query_health(struct papr_scm_priv *p) +{ + unsigned long ret[PLPAR_HCALL_BUFSIZE]; + int64_t rc; + + rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index); + if (rc != H_SUCCESS) { + dev_err(&p->pdev->dev, +"Failed to query health information, Err:%lld\n", rc); + return -ENXIO; + } + + /* Store the retrieved health information in dimm platform data */ + + p->health_bitmap = ret[0]; + p->health_bitmap_valid = ret[1]; + + dev_dbg(&p->pdev->dev, + "Queried dimm health info. Bitmap:0x%016llx Mask:0x%016llx\n", + be64_to_cpu(p->health_bitmap), + be64_to_cpu(p->health_bitmap_valid)); + + return 0; +} static int papr_scm_meta_get(struct papr_scm_priv *p, struct nd_cmd_get_config_data_hdr *hdr) @@ -304,6 +332,39 @@ static inline int papr_scm_node(int node) return min_node; } +static ssize_t papr_health_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvdimm *dimm = to_nvdimm(dev); + struct papr_scm_priv *p = nvdimm_provider_data(dimm); + int rc; + + rc = drc_pmem_query_health(p); + + if (rc) + return rc; + else + return sprintf(buf, "0x%016llX 0x%016llX\n", + be64_to_cpu(p->health_bitmap), + be64_to_cpu(p->health_bitmap_valid)); +} +DEVICE_ATTR_RO(papr_health); + +/* papr_scm specific dimm attributes */ +static struct attribute *papr_scm_nd_attributes[] = { + &dev_attr_papr_health.attr, + NULL, +}; + +static struct attribute_group papr_scm_nd_attribute_group = { + .attrs = papr_scm_nd_attributes, +}; + +static const struct attribute_group *papr_scm_dimm_attr_groups[] = { + &papr_scm_nd_attribute_group, + NULL, +}; + static int papr_scm_nvdimm_init(struct papr_scm_priv *p) { struct device *dev = &p->pdev->dev; @@ -330,8 +391,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) dimm_flags = 0; set_bit(NDD_ALIASING, &dimm_flags); - p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags, - PAPR_SCM_DIMM_CMD_MASK, 0, NULL); + p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_attr_groups, + dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL); if (!p->nvdimm) { dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn); goto err; -- 2.24.1
[RFC PATCH 0/6] powerpc/papr_scm: Implement support for reporting DIMM health and stats
The PAPR standard provides suitable mechanisms to query the health and performance stats of an NVDIMM via various hcalls as described in Ref[2]. Until now these stats were never fetched in the papr_scm modules nor exposed to the user-space tools like 'ndctl'. This is partly due to PAPR platform not having support for ACPI and NFIT. Hence 'ndctl' is unable to query and report the dimm health status and a user had no way to determine the current health status of a NDVIMM. To overcome this limitation this RFC patch-set proposes a new set of Dimm-Specific-Methods(DSM) for querying and fetching health and stat information for dimms that support PAPR and provides an implementation in kernel for these DSM in papr_scm modules. These changes coupled with draft/proposed ndtcl changes located at Ref[4] should provide a way for user to retributive NVDIMM status using ndtcl. Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM in an emulation environment: # ndctl list -DH [ { "dev":"nmem0", "health":{ "health_state":"fatal", "life_used_percentage":60, "shutdown_state":"dirty" } } ] PAPR Dimm-Specific-Methods(DSM) As the name suggests DSMs are used by vendor specific code in libndctl to execute certain operations or fetch certain information for NVDIMMS. DSMs can be sent to papr_scm module via libndctl (userspace) and libnvdimm(kernel) using the ND_CMD_CALL ioctl which can be handled in the dimm control function papr_scm_ndctl(). For PAPR this RFC proposes two DSMs that directly map to hcalls provided by PHYP to query NVDIMM health and stats. These DSMs are: * DSM_PAPR_SCM_HEALTH: Which map to hcall H_SCM_HEALTH and returns dimm health. * DSM_PAPR_SCM_STATS: Which map to hcall H_SCM_PERFORMANCE_STATS and returns dimm performance stats. The ioctl ND_CMD_CALL can also transfer data between user-space and kernel via 'envelopes'. The envelop is part of a 'struct nd_cmd_pkg' which in return is wrapped in a user defined struct which in our case is called 'struct nd_pkg_papr_scm' (packaged). These struct is defined as: struct nd_pkg_papr_scm { struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */ uint32_t cmd_status;/* Out: Sub-cmd status returned back */ uint8_t payload[]; /* Out: Sub-cmd data buffer */ }; The 'payload' field of the package holds the libnvdimm defined 'envelope' which is used to send/receive data from userspace buffer (libndctl). This RFC uses this field to copy the results of hcalls that are executed in response to the DSM commands. Please note that results of hcalls are not interpreted (with few exceptions) in papr_scm module at all. Instead they are directly copied to the 'payload' field and sent to userspace(libndctl) for interpretation. This essentially means that the papr_scm module simply acts as a conduit for libndctl to issue hcalls and fetch its output. This should make parsing and interpreting with output buffers of hcalls easier as it can be performed in userspace. References: [1]: "Power Architecture Platform Reference" https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference [2]: "[DOC,v2] powerpc: Provide initial documentation for PAPR hcalls" https://patchwork.ozlabs.org/patch/1154292/ [3]: "Linux on Power Architecture Platform Reference" https://members.openpowerfoundation.org/document/dl/469 [4]: https://github.com/vaibhav92/ndctl/tree/papr_scm_health Vaibhav Jain (6): powerpc/papr_scm: Provide support for fetching dimm health information powerpc/papr_scm: Fetch dimm performance stats from PHYP UAPI: ndctl: Introduce NVDIMM_FAMILY_PAPR as a new NVDIMM DSM family powerpc/papr_scm: Add support for handling PAPR DSM commands powerpc/papr_scm: Implement support for DSM_PAPR_SCM_HEALTH powerpc/papr_scm: Implement support for DSM_PAPR_SCM_STATS arch/powerpc/platforms/pseries/papr_scm.c | 314 +- include/uapi/linux/ndctl.h| 1 + 2 files changed, 306 insertions(+), 9 deletions(-) -- 2.24.1
[PATCH v6 4/5] powerpc/numa: Early request for home node associativity
Currently the kernel detects if its running on a shared lpar platform and requests home node associativity before the scheduler sched_domains are setup. However between the time NUMA setup is initialized and the request for home node associativity, workqueue initializes its per node cpumask. The per node workqueue possible cpumask may turn invalid after home node associativity resulting in weird situations like workqueue possible cpumask being a subset of workqueue online cpumask. This can be fixed by requesting home node associativity earlier just before NUMA setup. However at the NUMA setup time, kernel may not be in a position to detect if its running on a shared lpar platform. So request for home node associativity and if the request fails, fallback on the device tree property. Signed-off-by: Srikar Dronamraju Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: linuxppc-dev@lists.ozlabs.org Cc: Satheesh Rajendran Reported-by: Abdul Haleem --- Changelog (v5->v6): - Handled comments from Michael Ellerman * hardware-id is now deciphered in vphn_get_nid * No need to pass extra bool Changelog (v2->v3): - Handled comments from Nathan Lynch * Use first thread of the core for cpu-to-node map. * get hardware-id in numa_setup_cpu Changelog (v1->v2): - Handled comments from Nathan Lynch * Dont depend on pacas to be setup for the hwid arch/powerpc/mm/numa.c | 41 - 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 63ec0c3..ee7400f 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -461,6 +461,41 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb) return nid; } +#ifdef CONFIG_PPC_SPLPAR +static int vphn_get_nid(long lcpu) +{ + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0}; + long rc, hwid; + + /* +* On a shared lpar, device tree will not have node associativity. +* At this time lppaca, or its __old_status field may not be +* updated. Hence kernel cannot detect if its on a shared lpar. So +* request an explicit associativity irrespective of whether the +* lpar is shared or dedicated. Use the device tree property as a +* fallback. cpu_to_phys_id is only valid between +* smp_setup_cpu_maps() and smp_setup_pacas(). +*/ + if (firmware_has_feature(FW_FEATURE_VPHN)) { + if (cpu_to_phys_id) + hwid = cpu_to_phys_id[lcpu]; + else + hwid = get_hard_smp_processor_id(lcpu); + + rc = hcall_vphn(hwid, VPHN_FLAG_VCPU, associativity); + if (rc == H_SUCCESS) + return associativity_to_nid(associativity); + } + + return NUMA_NO_NODE; +} +#else +static int vphn_get_nid(long unused) +{ + return NUMA_NO_NODE; +} +#endif /* CONFIG_PPC_SPLPAR */ + /* * Figure out to which domain a cpu belongs and stick it there. * Return the id of the domain used. @@ -485,6 +520,10 @@ static int numa_setup_cpu(unsigned long lcpu) return nid; } + nid = vphn_get_nid(lcpu); + if (nid != NUMA_NO_NODE) + goto out_present; + cpu = of_get_cpu_node(lcpu, NULL); if (!cpu) { @@ -496,6 +535,7 @@ static int numa_setup_cpu(unsigned long lcpu) } nid = of_node_to_nid_single(cpu); + of_node_put(cpu); out_present: if (nid < 0 || !node_possible(nid)) @@ -515,7 +555,6 @@ static int numa_setup_cpu(unsigned long lcpu) } map_cpu_to_node(lcpu, nid); - of_node_put(cpu); out: return nid; } -- 1.8.3.1
[PATCH v6 5/5] powerpc/numa: Remove late request for home node associativity
With commit ("powerpc/numa: Early request for home node associativity"), commit 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot") which was requesting home node associativity becomes redundant. Hence remove the late request for home node associativity. Signed-off-by: Srikar Dronamraju Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: linuxppc-dev@lists.ozlabs.org Cc: Abdul Haleem Cc: Satheesh Rajendran Reported-by: Abdul Haleem Reviewed-by: Nathan Lynch --- arch/powerpc/include/asm/topology.h | 4 arch/powerpc/kernel/smp.c | 5 - arch/powerpc/mm/numa.c | 9 - 3 files changed, 18 deletions(-) diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 2f7e1ea..9bd396f 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -98,7 +98,6 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) extern int prrn_is_enabled(void); extern int find_and_online_cpu_nid(int cpu); extern int timed_topology_update(int nsecs); -extern void __init shared_proc_topology_init(void); #else static inline int start_topology_update(void) { @@ -121,9 +120,6 @@ static inline int timed_topology_update(int nsecs) return 0; } -#ifdef CONFIG_SMP -static inline void shared_proc_topology_init(void) {} -#endif #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */ #include diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index ea6adbf..cdd39a0 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1359,11 +1359,6 @@ void __init smp_cpus_done(unsigned int max_cpus) if (smp_ops && smp_ops->bringup_done) smp_ops->bringup_done(); - /* -* On a shared LPAR, associativity needs to be requested. -* Hence, get numa topology before dumping cpu topology -*/ - shared_proc_topology_init(); dump_numa_cpu_topology(); #ifdef CONFIG_SCHED_SMT diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index ee7400f..01a03f2 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1632,15 +1632,6 @@ int prrn_is_enabled(void) return prrn_enabled; } -void __init shared_proc_topology_init(void) -{ - if (lppaca_shared_proc(get_lppaca())) { - bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask), - nr_cpumask_bits); - numa_update_cpu_topology(false); - } -} - static int topology_read(struct seq_file *file, void *v) { if (vphn_enabled || prrn_enabled) -- 1.8.3.1
[PATCH v6 2/5] powerpc/numa: Handle extra hcall_vphn error cases
Currently code handles H_FUNCTION, H_SUCCESS, H_HARDWARE return codes. However hcall_vphn can return other return codes. Now it also handles H_PARAMETER return code. Also the rest return codes are handled under the default case. Signed-off-by: Srikar Dronamraju Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: linuxppc-dev@lists.ozlabs.org Cc: Abdul Haleem Cc: Satheesh Rajendran Reported-by: Abdul Haleem Reviewed-by: Nathan Lynch --- Changelog (v2->v1): Handled comments from Nathan: - Split patch from patch 1. - Corrected a problem where I missed calling stop_topology_update(). - Using pr_err_ratelimited instead of printk. arch/powerpc/mm/numa.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 50d68d2..8fbe57c 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1191,23 +1191,30 @@ static long vphn_get_associativity(unsigned long cpu, VPHN_FLAG_VCPU, associativity); switch (rc) { + case H_SUCCESS: + dbg("VPHN hcall succeeded. Reset polling...\n"); + timed_topology_update(0); + goto out; + case H_FUNCTION: - printk_once(KERN_INFO - "VPHN is not supported. Disabling polling...\n"); - stop_topology_update(); + pr_err_ratelimited("VPHN unsupported. Disabling polling...\n"); break; case H_HARDWARE: - printk(KERN_ERR - "hcall_vphn() experienced a hardware fault " + pr_err_ratelimited("hcall_vphn() experienced a hardware fault " "preventing VPHN. Disabling polling...\n"); - stop_topology_update(); break; - case H_SUCCESS: - dbg("VPHN hcall succeeded. Reset polling...\n"); - timed_topology_update(0); + case H_PARAMETER: + pr_err_ratelimited("hcall_vphn() was passed an invalid parameter. " + "Disabling polling...\n"); + break; + default: + pr_err_ratelimited("hcall_vphn() returned %ld. Disabling polling...\n" + , rc); break; } + stop_topology_update(); +out: return rc; } -- 1.8.3.1
[PATCH v6 3/5] powerpc/numa: Use cpu node map of first sibling thread
All the sibling threads of a core have to be part of the same node. To ensure that all the sibling threads map to the same node, always lookup/update the cpu-to-node map of the first thread in the core. Signed-off-by: Srikar Dronamraju Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: linuxppc-dev@lists.ozlabs.org Cc: Abdul Haleem Cc: Satheesh Rajendran Reported-by: Abdul Haleem Reviewed-by: Nathan Lynch --- Changelog: v3->v4 As suggested by Nathan, add a warning while mapping offline cpu. arch/powerpc/mm/numa.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 8fbe57c..63ec0c3 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -467,15 +467,20 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb) */ static int numa_setup_cpu(unsigned long lcpu) { - int nid = NUMA_NO_NODE; struct device_node *cpu; + int fcpu = cpu_first_thread_sibling(lcpu); + int nid = NUMA_NO_NODE; /* * If a valid cpu-to-node mapping is already available, use it * directly instead of querying the firmware, since it represents * the most recent mapping notified to us by the platform (eg: VPHN). +* Since cpu_to_node binding remains the same for all threads in the +* core. If a valid cpu-to-node mapping is already available, for +* the first thread in the core, use it. */ - if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) { + nid = numa_cpu_lookup_table[fcpu]; + if (nid >= 0) { map_cpu_to_node(lcpu, nid); return nid; } @@ -496,6 +501,19 @@ static int numa_setup_cpu(unsigned long lcpu) if (nid < 0 || !node_possible(nid)) nid = first_online_node; + /* +* Update for the first thread of the core. All threads of a core +* have to be part of the same node. This not only avoids querying +* for every other thread in the core, but always avoids a case +* where virtual node associativity change causes subsequent threads +* of a core to be associated with different nid. However if first +* thread is already online, expect it to have a valid mapping. +*/ + if (fcpu != lcpu) { + WARN_ON(cpu_online(fcpu)); + map_cpu_to_node(fcpu, nid); + } + map_cpu_to_node(lcpu, nid); of_node_put(cpu); out: -- 1.8.3.1
[PATCH v6 1/5] powerpc/vphn: Check for error from hcall_vphn
There is no value in unpacking associativity, if H_HOME_NODE_ASSOCIATIVITY hcall has returned an error. Signed-off-by: Srikar Dronamraju Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: linuxppc-dev@lists.ozlabs.org Cc: Abdul Haleem Cc: Satheesh Rajendran Reported-by: Abdul Haleem Reviewed-by: Nathan Lynch --- Changelog (v2->v1): - Split the patch into 2(Suggested by Nathan). arch/powerpc/platforms/pseries/vphn.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/vphn.c b/arch/powerpc/platforms/pseries/vphn.c index 3f07bf6..cca474a 100644 --- a/arch/powerpc/platforms/pseries/vphn.c +++ b/arch/powerpc/platforms/pseries/vphn.c @@ -82,7 +82,8 @@ long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity) long retbuf[PLPAR_HCALL9_BUFSIZE] = {0}; rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, retbuf, flags, cpu); - vphn_unpack_associativity(retbuf, associativity); + if (rc == H_SUCCESS) + vphn_unpack_associativity(retbuf, associativity); return rc; } -- 1.8.3.1
[PATCH v6 0/5] Early node associativity
Note: (Only patch 4 changes from v5) Abdul reported a warning on a shared lpar. "WARNING: workqueue cpumask: online intersect > possible intersect". This is because per node workqueue possible mask is set very early in the boot process even before the system was querying the home node associativity. However per node workqueue online cpumask gets updated dynamically. Hence there is a chance when per node workqueue online cpumask is a superset of per node workqueue possible mask. Link for v5: http://lkml.kernel.org/r/20191216144904.6776-1-sri...@linux.vnet.ibm.com Changelog: v5->v6 - Handled comments from Michael Ellerman - Rebased to v5.5-rc6 Link for v4: https://patchwork.ozlabs.org/patch/1161979 Changelog: v4->v5: - Rebased to v5.5-rc2 Link for v3: http://lkml.kernel.org/r/20190906135020.19772-1-sri...@linux.vnet.ibm.com Changelog: v3->v4 - Added a warning as suggested by Nathan Lynch. Link for v2: http://lkml.kernel.org/r/20190829055023.6171-1-sri...@linux.vnet.ibm.com Changelog: v2->v3 - Handled comments from Nathan Lynch. Link for v1: https://patchwork.ozlabs.org/patch/1151658 Changelog: v1->v2 - Handled comments from Nathan Lynch. Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Abdul Haleem Cc: Nathan Lynch Cc: linuxppc-dev@lists.ozlabs.org Cc: Satheesh Rajendran Srikar Dronamraju (5): powerpc/vphn: Check for error from hcall_vphn powerpc/numa: Handle extra hcall_vphn error cases powerpc/numa: Use cpu node map of first sibling thread powerpc/numa: Early request for home node associativity powerpc/numa: Remove late request for home node associativity arch/powerpc/include/asm/topology.h | 4 -- arch/powerpc/kernel/smp.c | 5 -- arch/powerpc/mm/numa.c| 97 +++ arch/powerpc/platforms/pseries/vphn.c | 3 +- 4 files changed, 78 insertions(+), 31 deletions(-) -- 1.8.3.1
[PATCH v4] powerpc/smp: Use nid as fallback for package_id
Package_id is to find out all cores that are part of the same chip. On PowerNV machines, package_id defaults to chip_id. However ibm,chip_id property is not present in device-tree of PowerVM Lpars. Hence lscpu output shows one core per socket and multiple cores. To overcome this, use nid as the package_id on PowerVM Lpars. Before the patch. --- Architecture:ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 Core(s) per socket: 1 <-- Socket(s): 16 <-- NUMA node(s):2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache:512K L3 cache:10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id -1 # After the patch --- Architecture:ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 <-- Core(s) per socket: 8 <-- Socket(s): 2 NUMA node(s):2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache:512K L3 cache:10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id 0 # Now lscpu output is more in line with the system configuration. Signed-off-by: Srikar Dronamraju Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman Cc: Vasant Hegde Cc: Vaidyanathan Srinivasan --- Changelog from v3: https://lore.kernel.org/linuxppc-dev/20191216142119.5937-1-sri...@linux.vnet.ibm.com/t/#u - Handled comments from Michael Ellerman. - Rename function cpu_to_nid to get_physical_package_id - Moved code back to arch/powerpc/kernel/smp.c from pseries/lpar.c - Use export_symbol_gpl instead of export_symbol. - Rebased to v5.5-rc6 Changelog from v2: https://patchwork.ozlabs.org/patch/1151649 - Rebased to v5.5-rc2 - Renamed the new function to cpu_to_nid - Removed checks to fadump property. (Looked too excessive) - Moved pseries specific code to pseries/lpar.c Changelog from v1: https://patchwork.ozlabs.org/patch/1126145 - In v1 cpu_to_chip_id was overloaded to fallback on nid. Michael Ellerman wasn't comfortable with nid being shown up as chip_id. arch/powerpc/include/asm/topology.h | 6 ++ arch/powerpc/kernel/smp.c | 30 ++--- 2 files changed, 33 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 2f7e1ea5089e..e2e1ccd4a18d 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -134,7 +134,13 @@ static inline void shared_proc_topology_init(void) {} #ifdef CONFIG_PPC64 #include +#ifdef CONFIG_PPC_SPLPAR +int get_physical_package_id(int cpu); +#define topology_physical_package_id(cpu) (get_physical_package_id(cpu)) +#else #define topology_physical_package_id(cpu) (cpu_to_chip_id(cpu)) +#endif + #define topology_sibling_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu)) #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu)) #define topology_core_id(cpu) (cpu_to_core_id(cpu)) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index ea6adbf6a221..d75fc8fd8168 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1185,10 +1185,34 @@ static inline void add_cpu_to_smallcore_masks(int cpu) } } +int get_physical_package_id(int cpu) +{ + int ppid = cpu_to_chip_id(cpu); + +#ifdef CONFIG_PPC_SPLPAR + /* +* If the platform is PowerNV or Guest on KVM, ibm,chip-id is +* defined. Hence we would return the chip-id as the +* get_physical_package_id. +*/ + if (ppid == -1 && firmware_has_feature(FW_FEATURE_LPAR)) { + struct device_node *np = of_get_cpu_node(cpu, NULL); + + if (np) { + ppid = of_node_to_nid(np); + of_node_put(np); + } + } +#endif /* CONFIG_PPC_SPLPAR */ + + return ppid; +} +EXPORT_SYMBOL_GPL(get_physical_package_id); + static void add_cpu_to_masks(int cpu) { int first_thread = cpu_first_thread_sibling(cpu); - int chipid = cpu_to_chip_id(cpu); + int ppid = get_physical_package_id(cpu); int i; /* @@ -1217,11 +1241,11 @@ static void add_cpu_to_masks(int cpu) for_each_cpu(i, cpu_l2_cache_mask(cpu))
Re: powerpc Linux scv support and scv system call ABI proposal
On Wed, Jan 29, 2020 at 02:58:44PM +1000, Nicholas Piggin wrote: > Adhemerval Zanella's on January 29, 2020 3:26 am: > > __asm__ __volatile__\ > > ("sc\n\t" \ > >"bns+ 1f\n\t"\ > >"neg %1, %1\n\t" \ > >"1:\n\t" \ > True, but the taken branch would be a 1 cycle bubble in fetch. Could > avoid that by branching out of line then back for the error case. But > mfocrf is fine (only sources one register), that's what should be used > here I think. neg %9,%1 ; isel %1,%9,%1,so > That probably makes the performance argument for avoiding CR[SO] for > error return indication less significant. Commonality with other > architectures is probably the bigger reason for it. Yes, and to have the syscall calling convention closer to the normal function calling convention would be good, too. Segher
[PATCH] powerpc/32s: Fix kasan_early_hash_table() for CONFIG_VMAP_STACK
On book3s/32 CPUs that are handling MMU through a hash table, MMU_init_hw() function was adapted for VMAP_STACK in order to handle virtual addresses instead of physical addresses in the low level hash functions. When using KASAN, the same adaptations are required for the early hash table set up by kasan_early_hash_table() function. Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK") Signed-off-by: Christophe Leroy --- arch/powerpc/mm/kasan/kasan_init_32.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/kasan_init_32.c index d3cacd462560..16dd95bd0749 100644 --- a/arch/powerpc/mm/kasan/kasan_init_32.c +++ b/arch/powerpc/mm/kasan/kasan_init_32.c @@ -185,8 +185,11 @@ u8 __initdata early_hash[256 << 10] __aligned(256 << 10) = {0}; static void __init kasan_early_hash_table(void) { - modify_instruction_site(&patch__hash_page_A0, 0x, __pa(early_hash) >> 16); - modify_instruction_site(&patch__flush_hash_A0, 0x, __pa(early_hash) >> 16); + unsigned int hash = IS_ENABLED(CONFIG_VMAP_STACK) ? (unsigned int)early_hash : + __pa(early_hash); + + modify_instruction_site(&patch__hash_page_A0, 0x, hash >> 16); + modify_instruction_site(&patch__flush_hash_A0, 0x, hash >> 16); Hash = (struct hash_pte *)early_hash; } -- 2.25.0
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
> On Jan 29, 2020, at 5:36 AM, Catalin Marinas wrote: > > On Tue, Jan 28, 2020 at 02:07:10PM -0500, Qian Cai wrote: >> On Jan 28, 2020, at 12:47 PM, Catalin Marinas >> wrote: >>> The primary goal here is not finding regressions but having clearly >>> defined semantics of the page table accessors across architectures. x86 >>> and arm64 are a good starting point and other architectures will be >>> enabled as they are aligned to the same semantics. >> >> This still does not answer the fundamental question. If this test is >> simply inefficient to find bugs, > > Who said this is inefficient (other than you)? Inefficient of finding bugs. It said only found a bug or two in its lifetime? > >> who wants to spend time to use it regularly? > > Arch maintainers, mm maintainers introducing new macros or assuming > certain new semantics of the existing macros. > >> If this is just one off test that may get running once in a few years >> (when introducing a new arch), how does it justify the ongoing cost to >> maintain it? > > You are really missing the point. It's not only for a new arch but > changes to existing arch code. And if the arch code churn in this area > is relatively small, I'd expect a similarly small cost of maintaining > this test. > > If you only turn DEBUG_VM on once every few years, don't generalise this > to the rest of the kernel developers (as others pointed out, this test > is default y if DEBUG_VM). Quite the opposite, I am running DEBUG_VM almost daily for regression workload while I felt strongly this thing does not add any value mixing there. So, I would suggest to decouple this away from DEBUG_VM, and clearly document that this test is not something intended for automated regression workloads, so those people don’t need to waste time running this. > > Anyway, I think that's a pointless discussion, so not going to reply > further (unless you have technical content to add). > > -- > Catalin
[PATCH v2] powerpc/uaccess: simplify the get_fs() set_fs() logic
On powerpc, we only have USER_DS and KERNEL_DS Today, this is managed as an 'unsigned long' data space limit which is used to compare the passed address with, plus a bit in the thread_info flags that is set whenever modifying the limit to enable the verification in addr_limit_user_check() The limit is either the last address of user space when USER_DS is set, and the last address of address space when KERNEL_DS is set. In both cases, the limit is a compiletime constant. get_fs() returns the limit, which is part of thread_info struct set_fs() updates the limit then set the TI_FSCHECK flag. addr_limit_user_check() check the flag, and if it is set it checks the limit is the user limit, then unsets the TI_FSCHECK flag. In addition, when the flag is set the syscall exit work is involved. This exit work is heavy compared to normal syscall exit as it goes through normal exception exit instead of the fast syscall exit. Rename this TI_FSCHECK flag to TIF_KERNEL_DS flag which tells whether KERNEL_DS or USER_DS is set. Get mm_segment_t be redifined as a bool struct that is either false (for USER_DS) or true (for KERNEL_DS). When TIF_KERNEL_DS is set, the limit is ~0UL. Otherwise it is TASK_SIZE_USER (resp TASK_SIZE_USER64 on PPC64). When KERNEL_DS is set, there is no range to check. Define TI_FSCHECK as an alias to TIF_KERNEL_DS. On exit, involve exit work when the bit is set, i.e. when KERNEL_DS is set. addr_limit_user_check() will clear the bit and kill the user process. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/processor.h | 5 +--- arch/powerpc/include/asm/thread_info.h | 9 --- arch/powerpc/include/asm/uaccess.h | 35 +- arch/powerpc/lib/sstep.c | 2 +- 4 files changed, 25 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 8387698bd5b6..e9e3c3b0f05e 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -84,7 +84,7 @@ void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp); void release_thread(struct task_struct *); typedef struct { - unsigned long seg; + bool is_kernel_ds; } mm_segment_t; #define TS_FPR(i) fp_state.fpr[i][TS_FPROFFSET] @@ -148,7 +148,6 @@ struct thread_struct { unsigned long ksp_vsid; #endif struct pt_regs *regs; /* Pointer to saved register state */ - mm_segment_taddr_limit; /* for get_fs() validation */ #ifdef CONFIG_BOOKE /* BookE base exception scratch space; align on cacheline */ unsigned long normsave[8] cacheline_aligned; @@ -289,7 +288,6 @@ struct thread_struct { #define INIT_THREAD { \ .ksp = INIT_SP, \ .ksp_limit = INIT_SP_LIMIT, \ - .addr_limit = KERNEL_DS, \ .pgdir = swapper_pg_dir, \ .fpexc_mode = MSR_FE0 | MSR_FE1, \ SPEFSCR_INIT \ @@ -298,7 +296,6 @@ struct thread_struct { #define INIT_THREAD { \ .ksp = INIT_SP, \ .regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \ - .addr_limit = KERNEL_DS, \ .fpexc_mode = 0, \ .fscr = FSCR_TAR | FSCR_EBB \ } diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index a2270749b282..a681ce624ab7 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -69,7 +69,7 @@ struct thread_info { #define INIT_THREAD_INFO(tsk) \ { \ .preempt_count = INIT_PREEMPT_COUNT,\ - .flags =0, \ + .flags =_TIF_KERNEL_DS, \ } #define THREAD_SIZE_ORDER (THREAD_SHIFT - PAGE_SHIFT) @@ -90,7 +90,8 @@ void arch_setup_new_exec(void); #define TIF_SYSCALL_TRACE 0 /* syscall trace active */ #define TIF_SIGPENDING 1 /* signal pending */ #define TIF_NEED_RESCHED 2 /* rescheduling necessary */ -#define TIF_FSCHECK3 /* Check FS is USER_DS on return */ +#define TIF_KERNEL_DS 3 /* KERNEL_DS is set */ +#define TIF_FSCHECKTIF_KERNEL_DS #define TIF_SYSCALL_EMU4 /* syscall emulation active */ #define TIF_RESTORE_TM 5 /* need to restore TM FP/VEC/VSX */ #define TIF_PATCH_PENDING 6 /* pending live patching update */ @@ -130,7 +131,7 @@ void arch_setup_new_exec(void); #define _TIF_SYSCALL_TRACEPOINT(1thread.addr_limit = fs; - /* On user-mode return check addr_limit (fs) is correct */ - set_thread_flag(TIF_FSCHECK); + update_thread_flag(TIF_KERNEL_DS, segment_eq(fs, KERNEL_DS)); } -#define segment_eq(a, b) ((a).seg == (b).seg) - -#define user_addr_max()
[PATCH v1] powerpc/uaccess: simplify the get_fs() set_fs() logic
On powerpc, we only have USER_DS and KERNEL_DS Today, this is managed as an 'unsigned long' data space limit which is used to compare the passed address with, plus a bit in the thread_info flags that is set whenever modifying the limit to enable the verification in addr_limit_user_check() The limit is either the last address of user space when USER_DS is set, and the last address of address space when KERNEL_DS is set. In both cases, the limit is a compiletime constant. get_fs() returns the limit, which is part of thread_info struct set_fs() updates the limit then set the TI_FSCHECK flag. addr_limit_user_check() check the flag, and if it is set it checks the limit is the user limit, then unsets the TI_FSCHECK flag. In addition, when the flag is set the syscall exit work is involved. Remove this TI_FSCHECK flag, and replace it by a TIF_KERNEL_DS flag which tells whether KERNEL_DS or USER_DS is set. When TIF_KERNEL_DS is set, the limit is ~0UL. Otherwise it is TASK_SIZE_USER (resp TASK_SIZE_USER64 on PPC64). When KERNEL_DS is set, there is no range to check. On exit, involve exit work when the bit is set, i.e. when KERNEL_DS is set. As TI_FSCHECK is not set anymore, test will be done everytime exit work is run, but doing the check is now as costly as checking whether the check is to be done. Signed-off-by: Christophe Leroy --- This version is first version. The intention was to not modify things much, but the resulting assembly is bad, so lets take v2 instead. --- arch/powerpc/include/asm/processor.h | 3 --- arch/powerpc/include/asm/thread_info.h | 8 arch/powerpc/include/asm/uaccess.h | 10 -- 3 files changed, 8 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 8387698bd5b6..0747f930a680 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -148,7 +148,6 @@ struct thread_struct { unsigned long ksp_vsid; #endif struct pt_regs *regs; /* Pointer to saved register state */ - mm_segment_taddr_limit; /* for get_fs() validation */ #ifdef CONFIG_BOOKE /* BookE base exception scratch space; align on cacheline */ unsigned long normsave[8] cacheline_aligned; @@ -289,7 +288,6 @@ struct thread_struct { #define INIT_THREAD { \ .ksp = INIT_SP, \ .ksp_limit = INIT_SP_LIMIT, \ - .addr_limit = KERNEL_DS, \ .pgdir = swapper_pg_dir, \ .fpexc_mode = MSR_FE0 | MSR_FE1, \ SPEFSCR_INIT \ @@ -298,7 +296,6 @@ struct thread_struct { #define INIT_THREAD { \ .ksp = INIT_SP, \ .regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \ - .addr_limit = KERNEL_DS, \ .fpexc_mode = 0, \ .fscr = FSCR_TAR | FSCR_EBB \ } diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index a2270749b282..8980bbddc4d6 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -69,7 +69,7 @@ struct thread_info { #define INIT_THREAD_INFO(tsk) \ { \ .preempt_count = INIT_PREEMPT_COUNT,\ - .flags =0, \ + .flags =_TIF_KERNEL_DS, \ } #define THREAD_SIZE_ORDER (THREAD_SHIFT - PAGE_SHIFT) @@ -90,7 +90,7 @@ void arch_setup_new_exec(void); #define TIF_SYSCALL_TRACE 0 /* syscall trace active */ #define TIF_SIGPENDING 1 /* signal pending */ #define TIF_NEED_RESCHED 2 /* rescheduling necessary */ -#define TIF_FSCHECK3 /* Check FS is USER_DS on return */ +#define TIF_KERNEL_DS 3 /* KERNEL_DS is set */ #define TIF_SYSCALL_EMU4 /* syscall emulation active */ #define TIF_RESTORE_TM 5 /* need to restore TM FP/VEC/VSX */ #define TIF_PATCH_PENDING 6 /* pending live patching update */ @@ -130,7 +130,7 @@ void arch_setup_new_exec(void); #define _TIF_SYSCALL_TRACEPOINT(1thread.addr_limit = fs; - /* On user-mode return check addr_limit (fs) is correct */ - set_thread_flag(TIF_FSCHECK); + update_thread_flag(TIF_KERNEL_DS, segment_eq(fs, KERNEL_DS)); } -#define segment_eq(a, b) ((a).seg == (b).seg) - #define user_addr_max()(get_fs().seg) #ifdef __powerpc64__ -- 2.25.0
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
On Tue, Jan 28, 2020 at 02:07:10PM -0500, Qian Cai wrote: > On Jan 28, 2020, at 12:47 PM, Catalin Marinas wrote: > > The primary goal here is not finding regressions but having clearly > > defined semantics of the page table accessors across architectures. x86 > > and arm64 are a good starting point and other architectures will be > > enabled as they are aligned to the same semantics. > > This still does not answer the fundamental question. If this test is > simply inefficient to find bugs, Who said this is inefficient (other than you)? > who wants to spend time to use it regularly? Arch maintainers, mm maintainers introducing new macros or assuming certain new semantics of the existing macros. > If this is just one off test that may get running once in a few years > (when introducing a new arch), how does it justify the ongoing cost to > maintain it? You are really missing the point. It's not only for a new arch but changes to existing arch code. And if the arch code churn in this area is relatively small, I'd expect a similarly small cost of maintaining this test. If you only turn DEBUG_VM on once every few years, don't generalise this to the rest of the kernel developers (as others pointed out, this test is default y if DEBUG_VM). Anyway, I think that's a pointless discussion, so not going to reply further (unless you have technical content to add). -- Catalin
Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys
Hi Dave, On 28/01/20 3:08 pm, Sandipan Das wrote: > On 27/01/20 9:12 pm, Dave Hansen wrote: >> >> How have you tested this patch (and the whole series for that matter)? >> > > I replaced the second patch with this one and did a build test. > Till v16, I had tested the whole series (build + run) on both a POWER8 > system (with 4K and 64K page sizes) and a Skylake SP system but for > x86_64 only. Following that, I could only do a build test locally on > my laptop for i386 and x86_64 on my laptop as I did not have access to > the Skylake system anymore. > > I got a chance to use the Skylake system today and tested (build + run) the whole series (v17 with the fixed Makefile) for both i386 and x86_64. Everything passed. - Sandipan