Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-29 Thread Mike Rapoport
On Wed, Jan 29, 2020 at 11:20:44PM +0100, Gerald Schaefer wrote:
> On Mon, 27 Jan 2020 22:33:08 -0500
> 
> For example, who would have thought that pXd_bad() is supposed to
> report large entries as bad? It's not really documented anywhere,

A bit off-topic,

@Anshuman, maybe you could start a Documentation/ patch that describes at
least some of the pXd_whaterver()?
Or that would be too much to ask? ;-)

> so we just checked them for sanity like normal entries, which
> apparently worked fine so far, but for how long?

-- 
Sincerely yours,
Mike.



[PATCH 1/2] selftests: vm: Do not override definition of ARCH

2020-01-29 Thread Sandipan Das
Independent builds of the vm selftests is currently broken
because commit 7549b3364201 overrides the value of ARCH with
the machine name from uname. This does not always match the
architecture names used for tasks like header installation.

E.g. for building tests on powerpc64, we need ARCH=powerpc
and not ARCH=ppc64 or ARCH=ppc64le. Otherwise, the build
fails as shown below.

  $ uname -m
  ppc64le

  $ make -C tools/testing/selftests/vm
  make: Entering directory '/home/sandipan/linux/tools/testing/selftests/vm'
  make --no-builtin-rules ARCH=ppc64le -C ../../../.. headers_install
  make[1]: Entering directory '/home/sandipan/linux'
  Makefile:653: arch/ppc64le/Makefile: No such file or directory
  make[1]: *** No rule to make target 'arch/ppc64le/Makefile'.  Stop.
  make[1]: Leaving directory '/home/sandipan/linux'
  ../lib.mk:50: recipe for target 'khdr' failed
  make: *** [khdr] Error 2
  make: Leaving directory '/home/sandipan/linux/tools/testing/selftests/vm'

Fixes: 7549b3364201 ("selftests: vm: Build/Run 64bit tests only on 64bit arch")
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 7f9a8a8c31da..3f2e2f0ccbc9 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 # Makefile for vm selftests
 uname_M := $(shell uname -m 2>/dev/null || echo not)
-ARCH ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/')
+MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/')
 
 CFLAGS = -Wall -I ../../../../usr/include $(EXTRA_CFLAGS)
 LDLIBS = -lrt
@@ -19,7 +19,7 @@ TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
 
-ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 
sparc64 x86_64))
+ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x 
sh64 sparc64 x86_64))
 TEST_GEN_FILES += va_128TBswitch
 TEST_GEN_FILES += virtual_address_range
 endif
-- 
2.17.1



[PATCH 2/2] selftests: vm: Fix 64-bit test builds for powerpc64le

2020-01-29 Thread Sandipan Das
Some tests are built only for 64-bit systems. This makes
sure that these tests are built for both big and little
endian variants of powerpc64.

Fixes: 7549b3364201 ("selftests: vm: Build/Run 64bit tests only on 64bit arch")
Reviewed-by: Kamalesh Babulal 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/Makefile| 2 +-
 tools/testing/selftests/vm/run_vmtests | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 3f2e2f0ccbc9..8074340c6b3a 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -19,7 +19,7 @@ TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
 
-ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x 
sh64 sparc64 x86_64))
+ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 
s390x sh64 sparc64 x86_64))
 TEST_GEN_FILES += va_128TBswitch
 TEST_GEN_FILES += virtual_address_range
 endif
diff --git a/tools/testing/selftests/vm/run_vmtests 
b/tools/testing/selftests/vm/run_vmtests
index a692ea828317..db8e0d1c7b39 100755
--- a/tools/testing/selftests/vm/run_vmtests
+++ b/tools/testing/selftests/vm/run_vmtests
@@ -59,7 +59,7 @@ else
 fi
 
 #filter 64bit architectures
-ARCH64STR="arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64"
+ARCH64STR="arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 s390x sh64 sparc64 
x86_64"
 if [ -z $ARCH ]; then
   ARCH=`uname -m 2>/dev/null | sed -e 's/aarch64.*/arm64/'`
 fi
-- 
2.17.1



[PATCH 0/2] selftests: vm: Build fixes for powerpc64

2020-01-29 Thread Sandipan Das
The second patch was already posted independently but because
of the changes in the first patch, the second one now depends
on it. Hence posting it now as a part of this series.

The last version (v2) of the second patch can be found at:
https://patchwork.ozlabs.org/patch/1225969/

Sandipan Das (2):
  selftests: vm: Do not override definition of ARCH
  selftests: vm: Fix 64-bit test builds for powerpc64le

 tools/testing/selftests/vm/Makefile| 4 ++--
 tools/testing/selftests/vm/run_vmtests | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.17.1



Re: [PATCH 2/2] pseries/makefile: Remove CONFIG_PPC_PSERIES check

2020-01-29 Thread Oliver O'Halloran
On Thu, Jan 30, 2020 at 5:32 PM Oliver O'Halloran  wrote:
>
> The platform makefile (arch/powerpc/platforms/pseries/Makefile) is only

doh

s/platform/pseries/

> included by the platform makefile (arch/powerpc/platform/Makefile) when
> CONFIG_PPC_PSERIES is selected, so checking for CONFIG_PPC_PSERIES in the
> pseries makefile is pointless.
>
> Signed-off-by: Oliver O'Halloran 
> ---
>  arch/powerpc/platforms/pseries/Makefile | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/Makefile 
> b/arch/powerpc/platforms/pseries/Makefile
> index a3c74a5..c8a2b0b 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -29,6 +29,4 @@ obj-$(CONFIG_PPC_SPLPAR)  += vphn.o
>  obj-$(CONFIG_PPC_SVM)  += svm.o
>  obj-$(CONFIG_FA_DUMP)  += rtas-fadump.o
>
> -ifdef CONFIG_PPC_PSERIES
>  obj-$(CONFIG_SUSPEND)  += suspend.o
> -endif
> --
> 2.9.5
>


[PATCH v18 24/24] selftests: vm: pkeys: Fix multilib builds for x86

2020-01-29 Thread Sandipan Das
This ensures that both 32-bit and 64-bit binaries are generated
when this is built on a x86_64 system. Most of the changes have
been borrowed from tools/testing/selftests/x86/Makefile.

Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/Makefile | 72 +
 1 file changed, 72 insertions(+)

diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 4e9c741be6af..82031f84af21 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -18,7 +18,30 @@ TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
+
+ifeq ($(ARCH),x86_64)
+CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) 
../x86/trivial_32bit_program.c -m32)
+CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh $(CC) 
../x86/trivial_64bit_program.c)
+CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh $(CC) 
../x86/trivial_program.c -no-pie)
+
+TARGETS := protection_keys
+BINARIES_32 := $(TARGETS:%=%_32)
+BINARIES_64 := $(TARGETS:%=%_64)
+
+ifeq ($(CAN_BUILD_WITH_NOPIE),1)
+CFLAGS += -no-pie
+endif
+
+ifeq ($(CAN_BUILD_I386),1)
+TEST_GEN_FILES += $(BINARIES_32)
+endif
+
+ifeq ($(CAN_BUILD_X86_64),1)
+TEST_GEN_FILES += $(BINARIES_64)
+endif
+else
 TEST_GEN_FILES += protection_keys
+endif
 
 ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 
sparc64 x86_64))
 TEST_GEN_FILES += va_128TBswitch
@@ -32,6 +55,55 @@ TEST_FILES := test_vmalloc.sh
 KSFT_KHDR_INSTALL := 1
 include ../lib.mk
 
+ifeq ($(ARCH),x86_64)
+BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
+BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64))
+
+define gen-target-rule-32
+$(1) $(1)_32: $(OUTPUT)/$(1)_32
+.PHONY: $(1) $(1)_32
+endef
+
+define gen-target-rule-64
+$(1) $(1)_64: $(OUTPUT)/$(1)_64
+.PHONY: $(1) $(1)_64
+endef
+
+ifeq ($(CAN_BUILD_I386),1)
+$(BINARIES_32): CFLAGS += -m32
+$(BINARIES_32): LDLIBS += -lrt -ldl -lm
+$(BINARIES_32): %_32: %.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+$(foreach t,$(TARGETS),$(eval $(call gen-target-rule-32,$(t
+endif
+
+ifeq ($(CAN_BUILD_X86_64),1)
+$(BINARIES_64): CFLAGS += -m64
+$(BINARIES_64): LDLIBS += -lrt -ldl
+$(BINARIES_64): %_64: %.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+$(foreach t,$(TARGETS),$(eval $(call gen-target-rule-64,$(t
+endif
+
+# x86_64 users should be encouraged to install 32-bit libraries
+ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01)
+all: warn_32bit_failure
+
+warn_32bit_failure:
+   @echo "Warning: you seem to have a broken 32-bit build" 2>&1;   
\
+   echo  "environment. This will reduce test coverage of 64-bit" 2>&1; 
\
+   echo  "kernels. If you are using a Debian-like distribution," 2>&1; 
\
+   echo  "try:"; 2>&1; 
\
+   echo  "";   
\
+   echo  "  apt-get install gcc-multilib libc6-i386 libc6-dev-i386";   
\
+   echo  "";   
\
+   echo  "If you are using a Fedora-like distribution, try:";  
\
+   echo  "";   
\
+   echo  "  yum install glibc-devel.*i686";
\
+   exit 0;
+endif
+endif
+
 $(OUTPUT)/userfaultfd: LDLIBS += -lpthread
 
 $(OUTPUT)/mlock-random-test: LDLIBS += -lcap
-- 
2.17.1



[PATCH v18 23/24] selftests: vm: pkeys: Use the correct page size on powerpc

2020-01-29 Thread Sandipan Das
Both 4K and 64K pages are supported on powerpc. Parts of
the selftest code perform alignment computations based on
the PAGE_SIZE macro which is currently hardcoded to 64K
for powerpc. This causes some test failures on kernels
configured with 4K page size.

In some cases, we need to enforce function alignment on
page size. Since this can only be done at build time,
64K is used as the alignment factor as that also ensures
4K alignment.

Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-powerpc.h| 2 +-
 tools/testing/selftests/vm/protection_keys.c | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index 02bd4dd7d467..3a761e51a587 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -36,7 +36,7 @@
 pkey-31 and exec-only key */
 #define PKEY_BITS_PER_PKEY 2
 #define HPAGE_SIZE (1UL << 24)
-#define PAGE_SIZE  (1UL << 16)
+#define PAGE_SIZE  sysconf(_SC_PAGESIZE)
 
 static inline u32 pkey_bit_position(int pkey)
 {
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index a1cb9a71e77c..fc19addcb5c8 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -146,7 +146,12 @@ void abort_hooks(void)
  * will then fault, which makes sure that the fault code handles
  * execute-only memory properly.
  */
+#ifdef __powerpc64__
+/* This way, both 4K and 64K alignment are maintained */
+__attribute__((__aligned__(65536)))
+#else
 __attribute__((__aligned__(PAGE_SIZE)))
+#endif
 void lots_o_noops_around_write(int *write_to_me)
 {
dprintf3("running %s()\n", __func__);
-- 
2.17.1



[PATCH v18 22/24] selftests/vm/pkeys: Override access right definitions on powerpc

2020-01-29 Thread Sandipan Das
From: Ram Pai 

Some platforms hardcode the x86 values for PKEY_DISABLE_ACCESS
and PKEY_DISABLE_WRITE such as those in:
 /usr/include/bits/mman-shared.h.

This overrides the definitions with correct values for powerpc.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-powerpc.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index d31665c48f5e..02bd4dd7d467 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -16,11 +16,13 @@
 #define fpregs fp_regs
 #define si_pkey_offset 0x20
 
-#ifndef PKEY_DISABLE_ACCESS
+#ifdef PKEY_DISABLE_ACCESS
+#undef PKEY_DISABLE_ACCESS
 # define PKEY_DISABLE_ACCESS   0x3  /* disable read and write */
 #endif
 
-#ifndef PKEY_DISABLE_WRITE
+#ifdef PKEY_DISABLE_WRITE
+#undef PKEY_DISABLE_WRITE
 # define PKEY_DISABLE_WRITE0x2
 #endif
 
-- 
2.17.1



[PATCH v18 21/24] selftests/vm/pkeys: Test correct behaviour of pkey-0

2020-01-29 Thread Sandipan Das
From: Ram Pai 

Ensure that pkey-0 is allocated on start and that it can
be attached dynamically in various modes, without failures.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 53 
 1 file changed, 53 insertions(+)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index d4952b57cc90..a1cb9a71e77c 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -964,6 +964,58 @@ __attribute__((noinline)) int read_ptr(int *ptr)
return *ptr;
 }
 
+void test_pkey_alloc_free_attach_pkey0(int *ptr, u16 pkey)
+{
+   int i, err;
+   int max_nr_pkey_allocs;
+   int alloced_pkeys[NR_PKEYS];
+   int nr_alloced = 0;
+   long size;
+
+   pkey_assert(pkey_last_malloc_record);
+   size = pkey_last_malloc_record->size;
+   /*
+* This is a bit of a hack.  But mprotect() requires
+* huge-page-aligned sizes when operating on hugetlbfs.
+* So, make sure that we use something that's a multiple
+* of a huge page when we can.
+*/
+   if (size >= HPAGE_SIZE)
+   size = HPAGE_SIZE;
+
+   /* allocate every possible key and make sure key-0 never got allocated 
*/
+   max_nr_pkey_allocs = NR_PKEYS;
+   for (i = 0; i < max_nr_pkey_allocs; i++) {
+   int new_pkey = alloc_pkey();
+   pkey_assert(new_pkey != 0);
+
+   if (new_pkey < 0)
+   break;
+   alloced_pkeys[nr_alloced++] = new_pkey;
+   }
+   /* free all the allocated keys */
+   for (i = 0; i < nr_alloced; i++) {
+   int free_ret;
+
+   if (!alloced_pkeys[i])
+   continue;
+   free_ret = sys_pkey_free(alloced_pkeys[i]);
+   pkey_assert(!free_ret);
+   }
+
+   /* attach key-0 in various modes */
+   err = sys_mprotect_pkey(ptr, size, PROT_READ, 0);
+   pkey_assert(!err);
+   err = sys_mprotect_pkey(ptr, size, PROT_WRITE, 0);
+   pkey_assert(!err);
+   err = sys_mprotect_pkey(ptr, size, PROT_EXEC, 0);
+   pkey_assert(!err);
+   err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE, 0);
+   pkey_assert(!err);
+   err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE|PROT_EXEC, 0);
+   pkey_assert(!err);
+}
+
 void test_read_of_write_disabled_region(int *ptr, u16 pkey)
 {
int ptr_contents;
@@ -1448,6 +1500,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_pkey_syscalls_on_non_allocated_pkey,
test_pkey_syscalls_bad_args,
test_pkey_alloc_exhaust,
+   test_pkey_alloc_free_attach_pkey0,
 };
 
 void run_tests_once(void)
-- 
2.17.1



[PATCH v18 16/24] selftests/vm/pkeys: Improve checks to determine pkey support

2020-01-29 Thread Sandipan Das
From: Ram Pai 

For the pkeys subsystem to work, both the CPU and the
kernel need to have support. So, additionally check if
the kernel supports pkeys apart from the CPU feature
checks.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h| 30 
 tools/testing/selftests/vm/pkey-powerpc.h|  3 +-
 tools/testing/selftests/vm/pkey-x86.h|  2 +-
 tools/testing/selftests/vm/protection_keys.c |  7 +++--
 4 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 2f4b1eb3a680..59ccdff18214 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -76,6 +76,8 @@ extern void abort_hooks(void);
 
 __attribute__((noinline)) int read_ptr(int *ptr);
 void expected_pkey_fault(int pkey);
+int sys_pkey_alloc(unsigned long flags, unsigned long init_val);
+int sys_pkey_free(unsigned long pkey);
 
 #if defined(__i386__) || defined(__x86_64__) /* arch */
 #include "pkey-x86.h"
@@ -186,4 +188,32 @@ static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si)
 #endif
 }
 
+static inline int kernel_has_pkeys(void)
+{
+   /* try allocating a key and see if it succeeds */
+   int ret = sys_pkey_alloc(0, 0);
+   if (ret <= 0) {
+   return 0;
+   }
+   sys_pkey_free(ret);
+   return 1;
+}
+
+static inline int is_pkeys_supported(void)
+{
+   /* check if the cpu supports pkeys */
+   if (!cpu_has_pkeys()) {
+   dprintf1("SKIP: %s: no CPU support\n", __func__);
+   return 0;
+   }
+
+   /* check if the kernel supports pkeys */
+   if (!kernel_has_pkeys()) {
+   dprintf1("SKIP: %s: no kernel support\n", __func__);
+   return 0;
+   }
+
+   return 1;
+}
+
 #endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index 319673bbab0b..7d7c3ffafdd9 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -63,8 +63,9 @@ static inline void __write_pkey_reg(u64 pkey_reg)
__func__, __read_pkey_reg(), pkey_reg);
 }
 
-static inline int cpu_has_pku(void)
+static inline int cpu_has_pkeys(void)
 {
+   /* No simple way to determine this */
return 1;
 }
 
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
index a0c59d4f7af2..6421b846aa16 100644
--- a/tools/testing/selftests/vm/pkey-x86.h
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -97,7 +97,7 @@ static inline void __cpuid(unsigned int *eax, unsigned int 
*ebx,
 #define X86_FEATURE_PKU(1<<3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE  (1<<4) /* OS Protection Keys Enable */
 
-static inline int cpu_has_pku(void)
+static inline int cpu_has_pkeys(void)
 {
unsigned int eax;
unsigned int ebx;
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 5fcbbc525364..95f173049f43 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1378,7 +1378,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 
pkey)
int size = PAGE_SIZE;
int sret;
 
-   if (cpu_has_pku()) {
+   if (cpu_has_pkeys()) {
dprintf1("SKIP: %s: no CPU support\n", __func__);
return;
}
@@ -1447,12 +1447,13 @@ void pkey_setup_shadow(void)
 int main(void)
 {
int nr_iterations = 22;
+   int pkeys_supported = is_pkeys_supported();
 
setup_handlers();
 
-   printf("has pku: %d\n", cpu_has_pku());
+   printf("has pkeys: %d\n", pkeys_supported);
 
-   if (!cpu_has_pku()) {
+   if (!pkeys_supported) {
int size = PAGE_SIZE;
int *ptr;
 
-- 
2.17.1



[PATCH v18 20/24] selftests/vm/pkeys: Introduce a sub-page allocator

2020-01-29 Thread Sandipan Das
From: Ram Pai 

This introduces a new allocator that allocates 4K hardware
pages to back 64K linux pages. This allocator is available
only on powerpc.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h|  6 +
 tools/testing/selftests/vm/pkey-powerpc.h| 25 
 tools/testing/selftests/vm/pkey-x86.h|  5 
 tools/testing/selftests/vm/protection_keys.c |  1 +
 4 files changed, 37 insertions(+)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 59ccdff18214..622a85848f61 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -28,6 +28,9 @@
 extern int dprint_in_signal;
 extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
 
+extern int test_nr;
+extern int iteration_nr;
+
 #ifdef __GNUC__
 __attribute__((format(printf, 1, 2)))
 #endif
@@ -78,6 +81,9 @@ __attribute__((noinline)) int read_ptr(int *ptr);
 void expected_pkey_fault(int pkey);
 int sys_pkey_alloc(unsigned long flags, unsigned long init_val);
 int sys_pkey_free(unsigned long pkey);
+int mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
+   unsigned long pkey);
+void record_pkey_malloc(void *ptr, long size, int prot);
 
 #if defined(__i386__) || defined(__x86_64__) /* arch */
 #include "pkey-x86.h"
diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index 7d7c3ffafdd9..d31665c48f5e 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -106,4 +106,29 @@ void expect_fault_on_read_execonly_key(void *p1, int pkey)
 /* 4-byte instructions * 16384 = 64K page */
 #define __page_o_noops() asm(".rept 16384 ; nop; .endr")
 
+void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey)
+{
+   void *ptr;
+   int ret;
+
+   dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
+   size, prot, pkey);
+   pkey_assert(pkey < NR_PKEYS);
+   ptr = mmap(NULL, size, prot, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+   pkey_assert(ptr != (void *)-1);
+
+   ret = syscall(__NR_subpage_prot, ptr, size, NULL);
+   if (ret) {
+   perror("subpage_perm");
+   return PTR_ERR_ENOTSUP;
+   }
+
+   ret = mprotect_pkey((void *)ptr, PAGE_SIZE, prot, pkey);
+   pkey_assert(!ret);
+   record_pkey_malloc(ptr, size, prot);
+
+   dprintf1("%s() for pkey %d @ %p\n", __func__, pkey, ptr);
+   return ptr;
+}
+
 #endif /* _PKEYS_POWERPC_H */
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
index 6421b846aa16..3be20f5d5275 100644
--- a/tools/testing/selftests/vm/pkey-x86.h
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -173,4 +173,9 @@ void expect_fault_on_read_execonly_key(void *p1, int pkey)
expected_pkey_fault(pkey);
 }
 
+void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey)
+{
+   return PTR_ERR_ENOTSUP;
+}
+
 #endif /* _PKEYS_X86_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 8bb4de103874..d4952b57cc90 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -845,6 +845,7 @@ void *malloc_pkey_mmap_dax(long size, int prot, u16 pkey)
 void *(*pkey_malloc[])(long size, int prot, u16 pkey) = {
 
malloc_pkey_with_mprotect,
+   malloc_pkey_with_mprotect_subpage,
malloc_pkey_anon_huge,
malloc_pkey_hugetlb
 /* can not do direct with the pkey_mprotect() API:
-- 
2.17.1



[PATCH v18 19/24] selftests/vm/pkeys: Detect write violation on a mapped access-denied-key page

2020-01-29 Thread Sandipan Das
From: Ram Pai 

Detect write-violation on a page to which access-disabled
key is associated much after the page is mapped.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index cb31a5cdf6d9..8bb4de103874 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1027,6 +1027,18 @@ void test_write_of_access_disabled_region(int *ptr, u16 
pkey)
*ptr = __LINE__;
expected_pkey_fault(pkey);
 }
+
+void test_write_of_access_disabled_region_with_page_already_mapped(int *ptr,
+   u16 pkey)
+{
+   *ptr = __LINE__;
+   dprintf1("disabling access; after accessing the page, "
+   " to PKEY[%02d], doing write\n", pkey);
+   pkey_access_deny(pkey);
+   *ptr = __LINE__;
+   expected_pkey_fault(pkey);
+}
+
 void test_kernel_write_of_access_disabled_region(int *ptr, u16 pkey)
 {
int ret;
@@ -1423,6 +1435,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_write_of_write_disabled_region,
test_write_of_write_disabled_region_with_page_already_mapped,
test_write_of_access_disabled_region,
+   test_write_of_access_disabled_region_with_page_already_mapped,
test_kernel_write_of_access_disabled_region,
test_kernel_write_of_write_disabled_region,
test_kernel_gup_of_access_disabled_region,
-- 
2.17.1



[PATCH v18 18/24] selftests/vm/pkeys: Associate key on a mapped page and detect write violation

2020-01-29 Thread Sandipan Das
From: Ram Pai 

Detect write-violation on a page to which write-disabled
key is associated much after the page is mapped.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index f65d384ef6a0..cb31a5cdf6d9 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1002,6 +1002,17 @@ void 
test_read_of_access_disabled_region_with_page_already_mapped(int *ptr,
expected_pkey_fault(pkey);
 }
 
+void test_write_of_write_disabled_region_with_page_already_mapped(int *ptr,
+   u16 pkey)
+{
+   *ptr = __LINE__;
+   dprintf1("disabling write access; after accessing the page, "
+   "to PKEY[%02d], doing write\n", pkey);
+   pkey_write_deny(pkey);
+   *ptr = __LINE__;
+   expected_pkey_fault(pkey);
+}
+
 void test_write_of_write_disabled_region(int *ptr, u16 pkey)
 {
dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey);
@@ -1410,6 +1421,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_read_of_access_disabled_region,
test_read_of_access_disabled_region_with_page_already_mapped,
test_write_of_write_disabled_region,
+   test_write_of_write_disabled_region_with_page_already_mapped,
test_write_of_access_disabled_region,
test_kernel_write_of_access_disabled_region,
test_kernel_write_of_write_disabled_region,
-- 
2.17.1



[PATCH v18 17/24] selftests/vm/pkeys: Associate key on a mapped page and detect access violation

2020-01-29 Thread Sandipan Das
From: Ram Pai 

Detect access-violation on a page to which access-disabled
key is associated much after the page is mapped.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 95f173049f43..f65d384ef6a0 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -984,6 +984,24 @@ void test_read_of_access_disabled_region(int *ptr, u16 
pkey)
dprintf1("*ptr: %d\n", ptr_contents);
expected_pkey_fault(pkey);
 }
+
+void test_read_of_access_disabled_region_with_page_already_mapped(int *ptr,
+   u16 pkey)
+{
+   int ptr_contents;
+
+   dprintf1("disabling access to PKEY[%02d], doing read @ %p\n",
+   pkey, ptr);
+   ptr_contents = read_ptr(ptr);
+   dprintf1("reading ptr before disabling the read : %d\n",
+   ptr_contents);
+   read_pkey_reg();
+   pkey_access_deny(pkey);
+   ptr_contents = read_ptr(ptr);
+   dprintf1("*ptr: %d\n", ptr_contents);
+   expected_pkey_fault(pkey);
+}
+
 void test_write_of_write_disabled_region(int *ptr, u16 pkey)
 {
dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey);
@@ -1390,6 +1408,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 
pkey)
 void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_read_of_write_disabled_region,
test_read_of_access_disabled_region,
+   test_read_of_access_disabled_region_with_page_already_mapped,
test_write_of_write_disabled_region,
test_write_of_access_disabled_region,
test_kernel_write_of_access_disabled_region,
-- 
2.17.1



[PATCH v18 15/24] selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()

2020-01-29 Thread Sandipan Das
From: Ram Pai 

Some pkeys which are valid on the hardware are reserved
and not available for application use. These keys cannot
be allocated.

test_pkey_alloc_exhaust() tries to account for these and
has an assertion which validates if all available pkeys
have been exahaustively allocated. However, the expression
that is currently used is only valid for x86. On powerpc,
a pkey is additionally reserved as compared to x86. Hence,
the assertion is made to use an arch-specific helper to
get the correct count of reserved pkeys.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index e6de078a9196..5fcbbc525364 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1153,6 +1153,7 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
dprintf3("%s()::%d\n", __func__, __LINE__);
 
/*
+* On x86:
 * There are 16 pkeys supported in hardware.  Three are
 * allocated by the time we get here:
 *   1. The default key (0)
@@ -1160,8 +1161,16 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
 *   3. One allocated by the test code and passed in via
 *  'pkey' to this function.
 * Ensure that we can allocate at least another 13 (16-3).
+*
+* On powerpc:
+* There are either 5, 28, 29 or 32 pkeys supported in
+* hardware depending on the page size (4K or 64K) and
+* platform (powernv or powervm). Four are allocated by
+* the time we get here. These include pkey-0, pkey-1,
+* exec-only pkey and the one allocated by the test code.
+* Ensure that we can allocate the remaining.
 */
-   pkey_assert(i >= NR_PKEYS-3);
+   pkey_assert(i >= (NR_PKEYS - get_arch_reserved_keys() - 1));
 
for (i = 0; i < nr_allocated_pkeys; i++) {
err = sys_pkey_free(allocated_pkeys[i]);
-- 
2.17.1



[PATCH v18 14/24] selftests/vm/pkeys: Fix number of reserved powerpc pkeys

2020-01-29 Thread Sandipan Das
From: "Desnes A. Nunes do Rosario" 

The number of reserved pkeys in a PowerNV environment is
different from that on PowerVM or KVM.

Tested on PowerVM and PowerNV environments.

Signed-off-by: "Desnes A. Nunes do Rosario" 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-powerpc.h | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index c79f4160a6a0..319673bbab0b 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -28,7 +28,10 @@
 #define NR_RESERVED_PKEYS_4K   27 /* pkey-0, pkey-1, exec-only-pkey
  and 24 other keys that cannot be
  represented in the PTE */
-#define NR_RESERVED_PKEYS_64K  3  /* pkey-0, pkey-1 and exec-only-pkey */
+#define NR_RESERVED_PKEYS_64K_3KEYS3 /* PowerNV and KVM: pkey-0,
+pkey-1 and exec-only key */
+#define NR_RESERVED_PKEYS_64K_4KEYS4 /* PowerVM: pkey-0, pkey-1,
+pkey-31 and exec-only key */
 #define PKEY_BITS_PER_PKEY 2
 #define HPAGE_SIZE (1UL << 24)
 #define PAGE_SIZE  (1UL << 16)
@@ -65,12 +68,27 @@ static inline int cpu_has_pku(void)
return 1;
 }
 
+static inline bool arch_is_powervm()
+{
+   struct stat buf;
+
+   if ((stat("/sys/firmware/devicetree/base/ibm,partition-name", &buf) == 
0) &&
+   (stat("/sys/firmware/devicetree/base/hmc-managed?", &buf) == 0) &&
+   (stat("/sys/firmware/devicetree/base/chosen/qemu,graphic-width", 
&buf) == -1) )
+   return true;
+
+   return false;
+}
+
 static inline int get_arch_reserved_keys(void)
 {
if (sysconf(_SC_PAGESIZE) == 4096)
return NR_RESERVED_PKEYS_4K;
else
-   return NR_RESERVED_PKEYS_64K;
+   if (arch_is_powervm())
+   return NR_RESERVED_PKEYS_64K_4KEYS;
+   else
+   return NR_RESERVED_PKEYS_64K_3KEYS;
 }
 
 void expect_fault_on_read_execonly_key(void *p1, int pkey)
-- 
2.17.1



[PATCH v18 13/24] selftests/vm/pkeys: Introduce powerpc support

2020-01-29 Thread Sandipan Das
From: Ram Pai 

This makes use of the abstractions added earlier and
introduces support for powerpc.

For powerpc, after receiving the SIGSEGV, the signal
handler must explicitly restore access permissions
for the faulting pkey to allow the test to continue.
As this makes use of pkey_access_allow(), all of its
dependencies and other similar functions have been
moved ahead of the signal handler.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h|   2 +
 tools/testing/selftests/vm/pkey-powerpc.h|  90 +++
 tools/testing/selftests/vm/protection_keys.c | 269 ++-
 3 files changed, 233 insertions(+), 128 deletions(-)
 create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 621fb2a0a5ef..2f4b1eb3a680 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -79,6 +79,8 @@ void expected_pkey_fault(int pkey);
 
 #if defined(__i386__) || defined(__x86_64__) /* arch */
 #include "pkey-x86.h"
+#elif defined(__powerpc64__) /* arch */
+#include "pkey-powerpc.h"
 #else /* arch */
 #error Architecture not supported
 #endif /* arch */
diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
new file mode 100644
index ..c79f4160a6a0
--- /dev/null
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _PKEYS_POWERPC_H
+#define _PKEYS_POWERPC_H
+
+#ifndef SYS_mprotect_key
+# define SYS_mprotect_key  386
+#endif
+#ifndef SYS_pkey_alloc
+# define SYS_pkey_alloc384
+# define SYS_pkey_free 385
+#endif
+#define REG_IP_IDX PT_NIP
+#define REG_TRAPNO PT_TRAP
+#define gregs  gp_regs
+#define fpregs fp_regs
+#define si_pkey_offset 0x20
+
+#ifndef PKEY_DISABLE_ACCESS
+# define PKEY_DISABLE_ACCESS   0x3  /* disable read and write */
+#endif
+
+#ifndef PKEY_DISABLE_WRITE
+# define PKEY_DISABLE_WRITE0x2
+#endif
+
+#define NR_PKEYS   32
+#define NR_RESERVED_PKEYS_4K   27 /* pkey-0, pkey-1, exec-only-pkey
+ and 24 other keys that cannot be
+ represented in the PTE */
+#define NR_RESERVED_PKEYS_64K  3  /* pkey-0, pkey-1 and exec-only-pkey */
+#define PKEY_BITS_PER_PKEY 2
+#define HPAGE_SIZE (1UL << 24)
+#define PAGE_SIZE  (1UL << 16)
+
+static inline u32 pkey_bit_position(int pkey)
+{
+   return (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY;
+}
+
+static inline u64 __read_pkey_reg(void)
+{
+   u64 pkey_reg;
+
+   asm volatile("mfspr %0, 0xd" : "=r" (pkey_reg));
+
+   return pkey_reg;
+}
+
+static inline void __write_pkey_reg(u64 pkey_reg)
+{
+   u64 amr = pkey_reg;
+
+   dprintf4("%s() changing %016llx to %016llx\n",
+__func__, __read_pkey_reg(), pkey_reg);
+
+   asm volatile("mtspr 0xd, %0" : : "r" ((unsigned long)(amr)) : "memory");
+
+   dprintf4("%s() pkey register after changing %016llx to %016llx\n",
+   __func__, __read_pkey_reg(), pkey_reg);
+}
+
+static inline int cpu_has_pku(void)
+{
+   return 1;
+}
+
+static inline int get_arch_reserved_keys(void)
+{
+   if (sysconf(_SC_PAGESIZE) == 4096)
+   return NR_RESERVED_PKEYS_4K;
+   else
+   return NR_RESERVED_PKEYS_64K;
+}
+
+void expect_fault_on_read_execonly_key(void *p1, int pkey)
+{
+   /*
+* powerpc does not allow userspace to change permissions of exec-only
+* keys since those keys are not allocated by userspace. The signal
+* handler wont be able to reset the permissions, which means the code
+* will infinitely continue to segfault here.
+*/
+   return;
+}
+
+/* 4-byte instructions * 16384 = 64K page */
+#define __page_o_noops() asm(".rept 16384 ; nop; .endr")
+
+#endif /* _PKEYS_POWERPC_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 57c71056c93d..e6de078a9196 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -169,6 +169,125 @@ void dump_mem(void *dumpme, int len_bytes)
}
 }
 
+static u32 hw_pkey_get(int pkey, unsigned long flags)
+{
+   u64 pkey_reg = __read_pkey_reg();
+
+   dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n",
+   __func__, pkey, flags, 0, 0);
+   dprintf2("%s() raw pkey_reg: %016llx\n", __func__, pkey_reg);
+
+   return (u32) get_pkey_bits(pkey_reg, pkey);
+}
+
+static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags)
+{
+   u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
+   u64 old_pkey_reg = __read_pkey_reg();
+   

[PATCH v18 12/24] selftests/vm/pkeys: Introduce generic pkey abstractions

2020-01-29 Thread Sandipan Das
From: Ram Pai 

This introduces some generic abstractions and provides
the corresponding architecture-specfic implementations
for these abstractions.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h| 12 
 tools/testing/selftests/vm/pkey-x86.h| 15 +++
 tools/testing/selftests/vm/protection_keys.c |  8 ++--
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 0e3da7c8d628..621fb2a0a5ef 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -74,6 +74,9 @@ extern void abort_hooks(void);
}   \
 } while (0)
 
+__attribute__((noinline)) int read_ptr(int *ptr);
+void expected_pkey_fault(int pkey);
+
 #if defined(__i386__) || defined(__x86_64__) /* arch */
 #include "pkey-x86.h"
 #else /* arch */
@@ -172,4 +175,13 @@ static inline void __pkey_write_allow(int pkey, int 
do_allow_write)
 #define __stringify_1(x...) #x
 #define __stringify(x...)   __stringify_1(x)
 
+static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si)
+{
+#ifdef si_pkey
+   return &si->si_pkey;
+#else
+   return (u32 *)(((u8 *)si) + si_pkey_offset);
+#endif
+}
+
 #endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
index def2a1bcf6a5..a0c59d4f7af2 100644
--- a/tools/testing/selftests/vm/pkey-x86.h
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -42,6 +42,7 @@
 #endif
 
 #define NR_PKEYS   16
+#define NR_RESERVED_PKEYS  2 /* pkey-0 and exec-only-pkey */
 #define PKEY_BITS_PER_PKEY 2
 #define HPAGE_SIZE (1UL<<21)
 #define PAGE_SIZE  4096
@@ -158,4 +159,18 @@ int pkey_reg_xstate_offset(void)
return xstate_offset;
 }
 
+static inline int get_arch_reserved_keys(void)
+{
+   return NR_RESERVED_PKEYS;
+}
+
+void expect_fault_on_read_execonly_key(void *p1, int pkey)
+{
+   int ptr_contents;
+
+   ptr_contents = read_ptr(p1);
+   dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
+   expected_pkey_fault(pkey);
+}
+
 #endif /* _PKEYS_X86_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 535e464e27e9..57c71056c93d 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1307,9 +1307,7 @@ void test_executing_on_unreadable_memory(int *ptr, u16 
pkey)
madvise(p1, PAGE_SIZE, MADV_DONTNEED);
lots_o_noops_around_write(&scratch);
do_not_expect_pkey_fault("executing on PROT_EXEC memory");
-   ptr_contents = read_ptr(p1);
-   dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
-   expected_pkey_fault(pkey);
+   expect_fault_on_read_execonly_key(p1, pkey);
 }
 
 void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey)
@@ -1336,9 +1334,7 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, 
u16 pkey)
madvise(p1, PAGE_SIZE, MADV_DONTNEED);
lots_o_noops_around_write(&scratch);
do_not_expect_pkey_fault("executing on PROT_EXEC memory");
-   ptr_contents = read_ptr(p1);
-   dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
-   expected_pkey_fault(UNKNOWN_PKEY);
+   expect_fault_on_read_execonly_key(p1, UNKNOWN_PKEY);
 
/*
 * Put the memory back to non-PROT_EXEC.  Should clear the
-- 
2.17.1



[PATCH v18 11/24] selftests: vm: pkeys: Use the correct huge page size

2020-01-29 Thread Sandipan Das
The huge page size can vary across architectures. This will
ensure that the correct huge page size is used when accessing
the hugetlb controls under sysfs. Instead of using a hardcoded
page size (i.e. 2MB), this now uses the HPAGE_SIZE macro which
is arch-specific.

Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 23 ++--
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 9cc82b65f828..535e464e27e9 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -739,12 +739,15 @@ void *malloc_pkey_anon_huge(long size, int prot, u16 pkey)
 }
 
 int hugetlb_setup_ok;
+#define SYSFS_FMT_NR_HUGE_PAGES 
"/sys/kernel/mm/hugepages/hugepages-%ldkB/nr_hugepages"
 #define GET_NR_HUGE_PAGES 10
 void setup_hugetlbfs(void)
 {
int err;
int fd;
-   char buf[] = "123";
+   char buf[256];
+   long hpagesz_kb;
+   long hpagesz_mb;
 
if (geteuid() != 0) {
fprintf(stderr, "WARNING: not run as root, can not do hugetlb 
test\n");
@@ -755,11 +758,16 @@ void setup_hugetlbfs(void)
 
/*
 * Now go make sure that we got the pages and that they
-* are 2M pages.  Someone might have made 1G the default.
+* are PMD-level pages. Someone might have made PUD-level
+* pages the default.
 */
-   fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages", 
O_RDONLY);
+   hpagesz_kb = HPAGE_SIZE / 1024;
+   hpagesz_mb = hpagesz_kb / 1024;
+   sprintf(buf, SYSFS_FMT_NR_HUGE_PAGES, hpagesz_kb);
+   fd = open(buf, O_RDONLY);
if (fd < 0) {
-   perror("opening sysfs 2M hugetlb config");
+   fprintf(stderr, "opening sysfs %ldM hugetlb config: %s\n",
+   hpagesz_mb, strerror(errno));
return;
}
 
@@ -767,13 +775,14 @@ void setup_hugetlbfs(void)
err = read(fd, buf, sizeof(buf)-1);
close(fd);
if (err <= 0) {
-   perror("reading sysfs 2M hugetlb config");
+   fprintf(stderr, "reading sysfs %ldM hugetlb config: %s\n",
+   hpagesz_mb, strerror(errno));
return;
}
 
if (atoi(buf) != GET_NR_HUGE_PAGES) {
-   fprintf(stderr, "could not confirm 2M pages, got: '%s' expected 
%d\n",
-   buf, GET_NR_HUGE_PAGES);
+   fprintf(stderr, "could not confirm %ldM pages, got: '%s' 
expected %d\n",
+   hpagesz_mb, buf, GET_NR_HUGE_PAGES);
return;
}
 
-- 
2.17.1



[PATCH v18 10/24] selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random

2020-01-29 Thread Sandipan Das
From: Ram Pai 

alloc_random_pkey() was allocating the same pkey every
time. Not all pkeys were geting tested. This fixes it.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 7fd52d5c4bfd..9cc82b65f828 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -25,6 +25,7 @@
 #define __SANE_USERSPACE_TYPES__
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -546,10 +547,10 @@ int alloc_random_pkey(void)
int nr_alloced = 0;
int random_index;
memset(alloced_pkeys, 0, sizeof(alloced_pkeys));
+   srand((unsigned int)time(NULL));
 
/* allocate every possible key and make a note of which ones we got */
max_nr_pkey_allocs = NR_PKEYS;
-   max_nr_pkey_allocs = 1;
for (i = 0; i < max_nr_pkey_allocs; i++) {
int new_pkey = alloc_pkey();
if (new_pkey < 0)
-- 
2.17.1



[PATCH v18 09/24] selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()

2020-01-29 Thread Sandipan Das
From: Ram Pai 

In some cases, a pkey's bits need not necessarily change
in a way that the value of the pkey register increases
when performing a pkey_disable_set() or decreases when
performing a pkey_disable_clear().

For example, on powerpc, if a pkey's current state is
PKEY_DISABLE_ACCESS and we perform a pkey_write_disable()
on it, the bits still remain the same. We will observe
something similar when the pkey's current state is 0 and
a pkey_access_enable() is performed on it.

Either case would cause some assertions to fail. This
fixes the problem.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 4b1ddb526228..7fd52d5c4bfd 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -400,7 +400,7 @@ void pkey_disable_set(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%016llx\n",
__func__, pkey, read_pkey_reg());
if (flags)
-   pkey_assert(read_pkey_reg() > orig_pkey_reg);
+   pkey_assert(read_pkey_reg() >= orig_pkey_reg);
dprintf1("END<---%s(%d, 0x%x)\n", __func__,
pkey, flags);
 }
@@ -431,7 +431,7 @@ void pkey_disable_clear(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__,
pkey, read_pkey_reg());
if (flags)
-   assert(read_pkey_reg() < orig_pkey_reg);
+   assert(read_pkey_reg() <= orig_pkey_reg);
 }
 
 void pkey_write_allow(int pkey)
-- 
2.17.1



[PATCH v18 08/24] selftests/vm/pkeys: Fix pkey_disable_clear()

2020-01-29 Thread Sandipan Das
From: Ram Pai 

Currently, pkey_disable_clear() sets the specified bits
instead clearing them. This has been dead code up to now
because its only callers i.e. pkey_access/write_allow()
are also unused.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index bed9d4de12b4..4b1ddb526228 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -418,7 +418,7 @@ void pkey_disable_clear(int pkey, int flags)
pkey, pkey, pkey_rights);
pkey_assert(pkey_rights >= 0);
 
-   pkey_rights |= flags;
+   pkey_rights &= ~flags;
 
ret = hw_pkey_set(pkey, pkey_rights, 0);
shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights);
@@ -431,7 +431,7 @@ void pkey_disable_clear(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__,
pkey, read_pkey_reg());
if (flags)
-   assert(read_pkey_reg() > orig_pkey_reg);
+   assert(read_pkey_reg() < orig_pkey_reg);
 }
 
 void pkey_write_allow(int pkey)
-- 
2.17.1



[PATCH v18 00/24] selftests, powerpc, x86: Memory Protection Keys

2020-01-29 Thread Sandipan Das
Memory protection keys enables an application to protect its address
space from inadvertent access by its own code.

This feature is now enabled on powerpc and has been available since
4.16-rc1. The patches move the selftests to arch neutral directory
and enhance their test coverage.

Tested on powerpc64 and x86_64 (Skylake-SP).

Link to development branch:
https://github.com/sandip4n/linux/tree/pkey-selftests

Changelog
-
Link to previous version (v17):
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=154174

v18:
(1) Fixed issues with x86 multilib builds based on
feedback from Dave.
(2) Moved patch 2 to the end of the series.

v17:
(1) Fixed issues with i386 builds when running on x86_64
based on feedback from Dave.
(2) Replaced patch 6 from previous version with patch 7.
This addresses u64 format specifier related concerns
that Michael had raised in v15.

v16:
(1) Rebased on top of latest master.
(2) Switched to u64 instead of using an arch-dependent
pkey_reg_t type for references to the pkey register
based on suggestions from Dave, Michal and Michael.
(3) Removed build time determination of page size based
on suggestion from Michael.
(4) Fixed comment before the definition of __page_o_noops()
from patch 13 ("selftests/vm/pkeys: Introduce powerpc
support").

v15:
(1) Rebased on top of latest master.
(2) Addressed review comments from Dave Hansen.
(3) Moved code for getting or setting pkey bits to new
helpers. These changes replace patch 7 of v14.
(4) Added a fix which ensures that the correct count of
reserved keys is used across different platforms.
(5) Added a fix which ensures that the correct page size
is used as powerpc supports both 4K and 64K pages.

v14:
(1) Incorporated another round of comments from Dave Hansen.

v13:
(1) Incorporated comments for Dave Hansen.
(2) Added one more test for correct pkey-0 behavior.

v12:
(1) Fixed the offset of pkey field in the siginfo structure for
x86_64 and powerpc. And tries to use the actual field
if the headers have it defined.

v11:
(1) Fixed a deadlock in the ptrace testcase.

v10 and prior:
(1) Moved the testcase to arch neutral directory.
(2) Split the changes into incremental patches.

Desnes A. Nunes do Rosario (1):
  selftests/vm/pkeys: Fix number of reserved powerpc pkeys

Ram Pai (16):
  selftests/x86/pkeys: Move selftests to arch-neutral directory
  selftests/vm/pkeys: Rename all references to pkru to a generic name
  selftests/vm/pkeys: Move generic definitions to header file
  selftests/vm/pkeys: Fix pkey_disable_clear()
  selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()
  selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random
  selftests/vm/pkeys: Introduce generic pkey abstractions
  selftests/vm/pkeys: Introduce powerpc support
  selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()
  selftests/vm/pkeys: Improve checks to determine pkey support
  selftests/vm/pkeys: Associate key on a mapped page and detect access
violation
  selftests/vm/pkeys: Associate key on a mapped page and detect write
violation
  selftests/vm/pkeys: Detect write violation on a mapped
access-denied-key page
  selftests/vm/pkeys: Introduce a sub-page allocator
  selftests/vm/pkeys: Test correct behaviour of pkey-0
  selftests/vm/pkeys: Override access right definitions on powerpc

Sandipan Das (5):
  selftests: vm: pkeys: Use sane types for pkey register
  selftests: vm: pkeys: Add helpers for pkey bits
  selftests: vm: pkeys: Use the correct huge page size
  selftests: vm: pkeys: Use the correct page size on powerpc
  selftests: vm: pkeys: Fix multilib builds for x86

Thiago Jung Bauermann (2):
  selftests/vm/pkeys: Move some definitions to arch-specific header
  selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf()

 tools/testing/selftests/vm/.gitignore |   1 +
 tools/testing/selftests/vm/Makefile   |  73 ++
 tools/testing/selftests/vm/pkey-helpers.h | 225 ++
 tools/testing/selftests/vm/pkey-powerpc.h | 136 
 tools/testing/selftests/vm/pkey-x86.h | 181 +
 .../selftests/{x86 => vm}/protection_keys.c   | 696 ++
 tools/testing/selftests/x86/.gitignore|   1 -
 tools/testing/selftests/x86/Makefile  |   2 +-
 tools/testing/selftests/x86/pkey-helpers.h| 219 --
 9 files changed, 1002 insertions(+), 532 deletions(-)
 create mode 100644 tools/testing/selftests/vm/pkey-helpers.h
 create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h
 create mode 100644 tools/testing/selftests/vm/pkey-x86.h
 rename tools/testing/selftests/{x86 => vm}/protection_keys.c (74%)
 delete mode 100644 tools/test

[PATCH v18 07/24] selftests: vm: pkeys: Add helpers for pkey bits

2020-01-29 Thread Sandipan Das
This introduces some functions that help with setting
or clearing bits of a particular pkey. This also adds
an abstraction for getting a pkey's bit position in
the pkey register as this may vary across architectures.

Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h| 22 ++
 tools/testing/selftests/vm/pkey-x86.h|  5 +++
 tools/testing/selftests/vm/protection_keys.c | 32 ++--
 3 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index dfbce49269ce..0e3da7c8d628 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -80,6 +80,28 @@ extern void abort_hooks(void);
 #error Architecture not supported
 #endif /* arch */
 
+#define PKEY_MASK  (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
+
+static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags)
+{
+   u32 shift = pkey_bit_position(pkey);
+   /* mask out bits from pkey in old value */
+   reg &= ~((u64)PKEY_MASK << shift);
+   /* OR in new bits for pkey */
+   reg |= (flags & PKEY_MASK) << shift;
+   return reg;
+}
+
+static inline u64 get_pkey_bits(u64 reg, int pkey)
+{
+   u32 shift = pkey_bit_position(pkey);
+   /*
+* shift down the relevant bits to the lowest two, then
+* mask off all the other higher bits
+*/
+   return ((reg >> shift) & PKEY_MASK);
+}
+
 extern u64 shadow_pkey_reg;
 
 static inline u64 _read_pkey_reg(int line)
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
index 6ffea27e2d2d..def2a1bcf6a5 100644
--- a/tools/testing/selftests/vm/pkey-x86.h
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -118,6 +118,11 @@ static inline int cpu_has_pku(void)
return 1;
 }
 
+static inline u32 pkey_bit_position(int pkey)
+{
+   return pkey * PKEY_BITS_PER_PKEY;
+}
+
 #define XSTATE_PKEY_BIT(9)
 #define XSTATE_PKEY0x200
 
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index efa35cc6f6b9..bed9d4de12b4 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -334,25 +334,13 @@ pid_t fork_lazy_child(void)
 
 static u32 hw_pkey_get(int pkey, unsigned long flags)
 {
-   u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
u64 pkey_reg = __read_pkey_reg();
-   u64 shifted_pkey_reg;
-   u32 masked_pkey_reg;
 
dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n",
__func__, pkey, flags, 0, 0);
dprintf2("%s() raw pkey_reg: %016llx\n", __func__, pkey_reg);
 
-   shifted_pkey_reg = (pkey_reg >> (pkey * PKEY_BITS_PER_PKEY));
-   dprintf2("%s() shifted_pkey_reg: %016llx\n", __func__,
-   shifted_pkey_reg);
-   masked_pkey_reg = shifted_pkey_reg & mask;
-   dprintf2("%s() masked  pkey_reg: %x\n", __func__, masked_pkey_reg);
-   /*
-* shift down the relevant bits to the lowest two, then
-* mask off all the other high bits.
-*/
-   return masked_pkey_reg;
+   return (u32) get_pkey_bits(pkey_reg, pkey);
 }
 
 static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags)
@@ -364,12 +352,8 @@ static int hw_pkey_set(int pkey, unsigned long rights, 
unsigned long flags)
/* make sure that 'rights' only contains the bits we expect: */
assert(!(rights & ~mask));
 
-   /* copy old pkey_reg */
-   new_pkey_reg = old_pkey_reg;
-   /* mask out bits from pkey in old value: */
-   new_pkey_reg &= ~(mask << (pkey * PKEY_BITS_PER_PKEY));
-   /* OR in new bits for pkey: */
-   new_pkey_reg |= (rights << (pkey * PKEY_BITS_PER_PKEY));
+   /* modify bits accordingly in old pkey_reg and assign it */
+   new_pkey_reg = set_pkey_bits(old_pkey_reg, pkey, rights);
 
__write_pkey_reg(new_pkey_reg);
 
@@ -403,7 +387,7 @@ void pkey_disable_set(int pkey, int flags)
ret = hw_pkey_set(pkey, pkey_rights, syscall_flags);
assert(!ret);
/* pkey_reg and flags have the same format */
-   shadow_pkey_reg |= flags << (pkey * 2);
+   shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights);
dprintf1("%s(%d) shadow: 0x%016llx\n",
__func__, pkey, shadow_pkey_reg);
 
@@ -437,7 +421,7 @@ void pkey_disable_clear(int pkey, int flags)
pkey_rights |= flags;
 
ret = hw_pkey_set(pkey, pkey_rights, 0);
-   shadow_pkey_reg &= ~(flags << (pkey * 2));
+   shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights);
pkey_assert(ret >= 0);
 
pkey_rights = hw_pkey_get(pkey, syscall_flags);
@@ -513,7 +497,8 @@ int alloc_pkey(void)
shadow_pkey_reg);
if (ret) {
/* clear both the bits: */
-   shadow_pkey_reg &= ~(0x3 

[PATCH v18 06/24] selftests: vm: pkeys: Use sane types for pkey register

2020-01-29 Thread Sandipan Das
The size of the pkey register can vary across architectures.
This converts the data type of all its references to u64 in
preparation for multi-arch support.

To keep the definition of the u64 type consistent and remove
format specifier related warnings, __SANE_USERSPACE_TYPES__
is defined as suggested by Michael Ellerman.

Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h| 31 +++
 tools/testing/selftests/vm/pkey-x86.h|  8 +-
 tools/testing/selftests/vm/protection_keys.c | 86 
 3 files changed, 72 insertions(+), 53 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 7f18a82e54fc..dfbce49269ce 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -14,10 +14,10 @@
 #include 
 
 /* Define some kernel-like types */
-#define  u8 uint8_t
-#define u16 uint16_t
-#define u32 uint32_t
-#define u64 uint64_t
+#define  u8 __u8
+#define u16 __u16
+#define u32 __u32
+#define u64 __u64
 
 #define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
 
@@ -80,13 +80,14 @@ extern void abort_hooks(void);
 #error Architecture not supported
 #endif /* arch */
 
-extern unsigned int shadow_pkey_reg;
+extern u64 shadow_pkey_reg;
 
-static inline unsigned int _read_pkey_reg(int line)
+static inline u64 _read_pkey_reg(int line)
 {
-   unsigned int pkey_reg = __read_pkey_reg();
+   u64 pkey_reg = __read_pkey_reg();
 
-   dprintf4("read_pkey_reg(line=%d) pkey_reg: %x shadow: %x\n",
+   dprintf4("read_pkey_reg(line=%d) pkey_reg: %016llx"
+   " shadow: %016llx\n",
line, pkey_reg, shadow_pkey_reg);
assert(pkey_reg == shadow_pkey_reg);
 
@@ -95,15 +96,15 @@ static inline unsigned int _read_pkey_reg(int line)
 
 #define read_pkey_reg() _read_pkey_reg(__LINE__)
 
-static inline void write_pkey_reg(unsigned int pkey_reg)
+static inline void write_pkey_reg(u64 pkey_reg)
 {
-   dprintf4("%s() changing %08x to %08x\n", __func__,
+   dprintf4("%s() changing %016llx to %016llx\n", __func__,
__read_pkey_reg(), pkey_reg);
/* will do the shadow check for us: */
read_pkey_reg();
__write_pkey_reg(pkey_reg);
shadow_pkey_reg = pkey_reg;
-   dprintf4("%s(%08x) pkey_reg: %08x\n", __func__,
+   dprintf4("%s(%016llx) pkey_reg: %016llx\n", __func__,
pkey_reg, __read_pkey_reg());
 }
 
@@ -113,7 +114,7 @@ static inline void write_pkey_reg(unsigned int pkey_reg)
  */
 static inline void __pkey_access_allow(int pkey, int do_allow)
 {
-   unsigned int pkey_reg = read_pkey_reg();
+   u64 pkey_reg = read_pkey_reg();
int bit = pkey * 2;
 
if (do_allow)
@@ -121,13 +122,13 @@ static inline void __pkey_access_allow(int pkey, int 
do_allow)
else
pkey_reg |= (1<
 #include 
 #include 
@@ -48,7 +49,7 @@
 int iteration_nr = 1;
 int test_nr;
 
-unsigned int shadow_pkey_reg;
+u64 shadow_pkey_reg;
 int dprint_in_signal;
 char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
 
@@ -163,7 +164,7 @@ void dump_mem(void *dumpme, int len_bytes)
 
for (i = 0; i < len_bytes; i += sizeof(u64)) {
u64 *ptr = (u64 *)(c + i);
-   dprintf1("dump[%03d][@%p]: %016jx\n", i, ptr, *ptr);
+   dprintf1("dump[%03d][@%p]: %016llx\n", i, ptr, *ptr);
}
 }
 
@@ -205,7 +206,8 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
 
dprint_in_signal = 1;
dprintf1("===SIGSEGV\n");
-   dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__,
+   dprintf1("%s()::%d, pkey_reg: 0x%016llx shadow: %016llx\n",
+   __func__, __LINE__,
__read_pkey_reg(), shadow_pkey_reg);
 
trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
@@ -213,8 +215,9 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
fpregset = uctxt->uc_mcontext.fpregs;
fpregs = (void *)fpregset;
 
-   dprintf2("%s() trapno: %d ip: 0x%lx info->si_code: %s/%d\n", __func__,
-   trapno, ip, si_code_str(si->si_code), si->si_code);
+   dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n",
+   __func__, trapno, ip, si_code_str(si->si_code),
+   si->si_code);
 #ifdef __i386__
/*
 * 32-bit has some extra padding so that userspace can tell whether
@@ -256,8 +259,9 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
 * need __read_pkey_reg() version so we do not do shadow_pkey_reg
 * checking
 */
-   dprintf1("signal pkey_reg from  pkey_reg: %08x\n", __read_pkey_reg());
-   dprintf1("pkey from siginfo: %jx\n", siginfo_pkey);
+   dprintf1("signal pkey_reg from  pkey_reg: %016llx\n",
+   __read_pkey_reg());

[PATCH v18 05/24] selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf()

2020-01-29 Thread Sandipan Das
From: Thiago Jung Bauermann 

This will help us ensure we print pkey_reg_t values correctly in
different architectures.

Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 3ed2f021bf7a..7f18a82e54fc 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -27,6 +27,10 @@
 #define DPRINT_IN_SIGNAL_BUF_SIZE 4096
 extern int dprint_in_signal;
 extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
+
+#ifdef __GNUC__
+__attribute__((format(printf, 1, 2)))
+#endif
 static inline void sigsafe_printf(const char *format, ...)
 {
va_list ap;
-- 
2.17.1



[PATCH v18 04/24] selftests/vm/pkeys: Move some definitions to arch-specific header

2020-01-29 Thread Sandipan Das
From: Thiago Jung Bauermann 

In preparation for multi-arch support, move definitions which
have arch-specific values to x86-specific header.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h| 111 +
 tools/testing/selftests/vm/pkey-x86.h| 156 +++
 tools/testing/selftests/vm/protection_keys.c |  47 --
 3 files changed, 162 insertions(+), 152 deletions(-)
 create mode 100644 tools/testing/selftests/vm/pkey-x86.h

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 6ad1bd54ef94..3ed2f021bf7a 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -21,9 +21,6 @@
 
 #define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
 
-#define NR_PKEYS 16
-#define PKEY_BITS_PER_PKEY 2
-
 #ifndef DEBUG_LEVEL
 #define DEBUG_LEVEL 0
 #endif
@@ -73,19 +70,13 @@ extern void abort_hooks(void);
}   \
 } while (0)
 
+#if defined(__i386__) || defined(__x86_64__) /* arch */
+#include "pkey-x86.h"
+#else /* arch */
+#error Architecture not supported
+#endif /* arch */
+
 extern unsigned int shadow_pkey_reg;
-static inline unsigned int __read_pkey_reg(void)
-{
-   unsigned int eax, edx;
-   unsigned int ecx = 0;
-   unsigned int pkey_reg;
-
-   asm volatile(".byte 0x0f,0x01,0xee\n\t"
-: "=a" (eax), "=d" (edx)
-: "c" (ecx));
-   pkey_reg = eax;
-   return pkey_reg;
-}
 
 static inline unsigned int _read_pkey_reg(int line)
 {
@@ -100,19 +91,6 @@ static inline unsigned int _read_pkey_reg(int line)
 
 #define read_pkey_reg() _read_pkey_reg(__LINE__)
 
-static inline void __write_pkey_reg(unsigned int pkey_reg)
-{
-   unsigned int eax = pkey_reg;
-   unsigned int ecx = 0;
-   unsigned int edx = 0;
-
-   dprintf4("%s() changing %08x to %08x\n", __func__,
-   __read_pkey_reg(), pkey_reg);
-   asm volatile(".byte 0x0f,0x01,0xef\n\t"
-: : "a" (eax), "c" (ecx), "d" (edx));
-   assert(pkey_reg == __read_pkey_reg());
-}
-
 static inline void write_pkey_reg(unsigned int pkey_reg)
 {
dprintf4("%s() changing %08x to %08x\n", __func__,
@@ -157,83 +135,6 @@ static inline void __pkey_write_allow(int pkey, int 
do_allow_write)
dprintf4("pkey_reg now: %08x\n", read_pkey_reg());
 }
 
-#define PAGE_SIZE 4096
-#define MB (1<<20)
-
-static inline void __cpuid(unsigned int *eax, unsigned int *ebx,
-   unsigned int *ecx, unsigned int *edx)
-{
-   /* ecx is often an input as well as an output. */
-   asm volatile(
-   "cpuid;"
-   : "=a" (*eax),
- "=b" (*ebx),
- "=c" (*ecx),
- "=d" (*edx)
-   : "0" (*eax), "2" (*ecx));
-}
-
-/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
-#define X86_FEATURE_PKU(1<<3) /* Protection Keys for Userspace */
-#define X86_FEATURE_OSPKE  (1<<4) /* OS Protection Keys Enable */
-
-static inline int cpu_has_pku(void)
-{
-   unsigned int eax;
-   unsigned int ebx;
-   unsigned int ecx;
-   unsigned int edx;
-
-   eax = 0x7;
-   ecx = 0x0;
-   __cpuid(&eax, &ebx, &ecx, &edx);
-
-   if (!(ecx & X86_FEATURE_PKU)) {
-   dprintf2("cpu does not have PKU\n");
-   return 0;
-   }
-   if (!(ecx & X86_FEATURE_OSPKE)) {
-   dprintf2("cpu does not have OSPKE\n");
-   return 0;
-   }
-   return 1;
-}
-
-#define XSTATE_PKEY_BIT(9)
-#define XSTATE_PKEY0x200
-
-int pkey_reg_xstate_offset(void)
-{
-   unsigned int eax;
-   unsigned int ebx;
-   unsigned int ecx;
-   unsigned int edx;
-   int xstate_offset;
-   int xstate_size;
-   unsigned long XSTATE_CPUID = 0xd;
-   int leaf;
-
-   /* assume that XSTATE_PKEY is set in XCR0 */
-   leaf = XSTATE_PKEY_BIT;
-   {
-   eax = XSTATE_CPUID;
-   ecx = leaf;
-   __cpuid(&eax, &ebx, &ecx, &edx);
-
-   if (leaf == XSTATE_PKEY_BIT) {
-   xstate_offset = ebx;
-   xstate_size = eax;
-   }
-   }
-
-   if (xstate_size == 0) {
-   printf("could not find size/offset of PKEY in xsave state\n");
-   return 0;
-   }
-
-   return xstate_offset;
-}
-
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
 #define ALIGN_UP(x, align_to)  (((x) + ((align_to)-1)) & ~((align_to)-1))
 #define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
new file mode 100644
index ..2f04ade8ca9c
--- /dev/null
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -0,0 +1,15

[PATCH v18 03/24] selftests/vm/pkeys: Move generic definitions to header file

2020-01-29 Thread Sandipan Das
From: Ram Pai 

Moved all the generic definition and helper functions to the
header file.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h| 35 +---
 tools/testing/selftests/vm/protection_keys.c | 27 ---
 2 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index d5779be4793f..6ad1bd54ef94 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -13,6 +13,14 @@
 #include 
 #include 
 
+/* Define some kernel-like types */
+#define  u8 uint8_t
+#define u16 uint16_t
+#define u32 uint32_t
+#define u64 uint64_t
+
+#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
+
 #define NR_PKEYS 16
 #define PKEY_BITS_PER_PKEY 2
 
@@ -53,6 +61,18 @@ static inline void sigsafe_printf(const char *format, ...)
 #define dprintf3(args...) dprintf_level(3, args)
 #define dprintf4(args...) dprintf_level(4, args)
 
+extern void abort_hooks(void);
+#define pkey_assert(condition) do {\
+   if (!(condition)) { \
+   dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \
+   __FILE__, __LINE__, \
+   test_nr, iteration_nr); \
+   dprintf0("errno at assert: %d", errno); \
+   abort_hooks();  \
+   exit(__LINE__); \
+   }   \
+} while (0)
+
 extern unsigned int shadow_pkey_reg;
 static inline unsigned int __read_pkey_reg(void)
 {
@@ -137,11 +157,6 @@ static inline void __pkey_write_allow(int pkey, int 
do_allow_write)
dprintf4("pkey_reg now: %08x\n", read_pkey_reg());
 }
 
-#define PROT_PKEY0 0x10/* protection key value (bit 0) */
-#define PROT_PKEY1 0x20/* protection key value (bit 1) */
-#define PROT_PKEY2 0x40/* protection key value (bit 2) */
-#define PROT_PKEY3 0x80/* protection key value (bit 3) */
-
 #define PAGE_SIZE 4096
 #define MB (1<<20)
 
@@ -219,4 +234,14 @@ int pkey_reg_xstate_offset(void)
return xstate_offset;
 }
 
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
+#define ALIGN_UP(x, align_to)  (((x) + ((align_to)-1)) & ~((align_to)-1))
+#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
+#define ALIGN_PTR_UP(p, ptr_align_to)  \
+   ((typeof(p))ALIGN_UP((unsigned long)(p), ptr_align_to))
+#define ALIGN_PTR_DOWN(p, ptr_align_to)\
+   ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to))
+#define __stringify_1(x...) #x
+#define __stringify(x...)   __stringify_1(x)
+
 #endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 2f4ab81c570d..42ffb58810f2 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -51,31 +51,10 @@ int test_nr;
 unsigned int shadow_pkey_reg;
 
 #define HPAGE_SIZE (1UL<<21)
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
-#define ALIGN_UP(x, align_to)  (((x) + ((align_to)-1)) & ~((align_to)-1))
-#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
-#define ALIGN_PTR_UP(p, ptr_align_to)  ((typeof(p))ALIGN_UP((unsigned 
long)(p),ptr_align_to))
-#define ALIGN_PTR_DOWN(p, ptr_align_to)
((typeof(p))ALIGN_DOWN((unsigned long)(p),  ptr_align_to))
-#define __stringify_1(x...) #x
-#define __stringify(x...)   __stringify_1(x)
-
-#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
 
 int dprint_in_signal;
 char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
 
-extern void abort_hooks(void);
-#define pkey_assert(condition) do {\
-   if (!(condition)) { \
-   dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \
-   __FILE__, __LINE__, \
-   test_nr, iteration_nr); \
-   dprintf0("errno at assert: %d", errno); \
-   abort_hooks();  \
-   exit(__LINE__); \
-   }   \
-} while (0)
-
 void cat_into_file(char *str, char *file)
 {
int fd = open(file, O_RDWR);
@@ -186,12 +165,6 @@ void lots_o_noops_around_write(int *write_to_me)
dprintf3("%s() done\n", __func__);
 }
 
-/* Define some kernel-like types */
-#define  u8 uint8_t
-#define u16 uint16_t
-#define u32 uint32_t
-#define u64 uint64_t
-
 #ifdef __i386__
 
 #ifndef SYS_mprotect_key
-- 
2.17.1



[PATCH v18 02/24] selftests/vm/pkeys: Rename all references to pkru to a generic name

2020-01-29 Thread Sandipan Das
From: Ram Pai 

This renames PKRU references to "pkey_reg" or "pkey" based on
the usage.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h|  85 +++
 tools/testing/selftests/vm/protection_keys.c | 240 ++-
 2 files changed, 170 insertions(+), 155 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 254e5436bdd9..d5779be4793f 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -14,7 +14,7 @@
 #include 
 
 #define NR_PKEYS 16
-#define PKRU_BITS_PER_PKEY 2
+#define PKEY_BITS_PER_PKEY 2
 
 #ifndef DEBUG_LEVEL
 #define DEBUG_LEVEL 0
@@ -53,85 +53,88 @@ static inline void sigsafe_printf(const char *format, ...)
 #define dprintf3(args...) dprintf_level(3, args)
 #define dprintf4(args...) dprintf_level(4, args)
 
-extern unsigned int shadow_pkru;
-static inline unsigned int __rdpkru(void)
+extern unsigned int shadow_pkey_reg;
+static inline unsigned int __read_pkey_reg(void)
 {
unsigned int eax, edx;
unsigned int ecx = 0;
-   unsigned int pkru;
+   unsigned int pkey_reg;
 
asm volatile(".byte 0x0f,0x01,0xee\n\t"
 : "=a" (eax), "=d" (edx)
 : "c" (ecx));
-   pkru = eax;
-   return pkru;
+   pkey_reg = eax;
+   return pkey_reg;
 }
 
-static inline unsigned int _rdpkru(int line)
+static inline unsigned int _read_pkey_reg(int line)
 {
-   unsigned int pkru = __rdpkru();
+   unsigned int pkey_reg = __read_pkey_reg();
 
-   dprintf4("rdpkru(line=%d) pkru: %x shadow: %x\n",
-   line, pkru, shadow_pkru);
-   assert(pkru == shadow_pkru);
+   dprintf4("read_pkey_reg(line=%d) pkey_reg: %x shadow: %x\n",
+   line, pkey_reg, shadow_pkey_reg);
+   assert(pkey_reg == shadow_pkey_reg);
 
-   return pkru;
+   return pkey_reg;
 }
 
-#define rdpkru() _rdpkru(__LINE__)
+#define read_pkey_reg() _read_pkey_reg(__LINE__)
 
-static inline void __wrpkru(unsigned int pkru)
+static inline void __write_pkey_reg(unsigned int pkey_reg)
 {
-   unsigned int eax = pkru;
+   unsigned int eax = pkey_reg;
unsigned int ecx = 0;
unsigned int edx = 0;
 
-   dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+   dprintf4("%s() changing %08x to %08x\n", __func__,
+   __read_pkey_reg(), pkey_reg);
asm volatile(".byte 0x0f,0x01,0xef\n\t"
 : : "a" (eax), "c" (ecx), "d" (edx));
-   assert(pkru == __rdpkru());
+   assert(pkey_reg == __read_pkey_reg());
 }
 
-static inline void wrpkru(unsigned int pkru)
+static inline void write_pkey_reg(unsigned int pkey_reg)
 {
-   dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+   dprintf4("%s() changing %08x to %08x\n", __func__,
+   __read_pkey_reg(), pkey_reg);
/* will do the shadow check for us: */
-   rdpkru();
-   __wrpkru(pkru);
-   shadow_pkru = pkru;
-   dprintf4("%s(%08x) pkru: %08x\n", __func__, pkru, __rdpkru());
+   read_pkey_reg();
+   __write_pkey_reg(pkey_reg);
+   shadow_pkey_reg = pkey_reg;
+   dprintf4("%s(%08x) pkey_reg: %08x\n", __func__,
+   pkey_reg, __read_pkey_reg());
 }
 
 /*
  * These are technically racy. since something could
- * change PKRU between the read and the write.
+ * change PKEY register between the read and the write.
  */
 static inline void __pkey_access_allow(int pkey, int do_allow)
 {
-   unsigned int pkru = rdpkru();
+   unsigned int pkey_reg = read_pkey_reg();
int bit = pkey * 2;
 
if (do_allow)
-   pkru &= (1<>>>===SIGSEGV\n");
-   dprintf1("%s()::%d, pkru: 0x%x shadow: %x\n", __func__, __LINE__,
-   __rdpkru(), shadow_pkru);
+   dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__,
+   __read_pkey_reg(), shadow_pkey_reg);
 
trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
@@ -289,19 +289,19 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
 */
fpregs += 0x70;
 #endif
-   pkru_offset = pkru_xstate_offset();
-   pkru_ptr = (void *)(&fpregs[pkru_offset]);
+   pkey_reg_offset = pkey_reg_xstate_offset();
+   pkey_reg_ptr = (void *)(&fpregs[pkey_reg_offset]);
 
dprintf1("siginfo: %p\n", si);
dprintf1(" fpregs: %p\n", fpregs);
/*
-* If we got a PKRU fault, we *HAVE* to have at least one bit set in
+* If we got a PKEY fault, we *HAVE* to have at least one bit set in
 * here.
 */
-   dprintf1("pkru_xstate_offset: %d\n", pkru_

[PATCH v18 01/24] selftests/x86/pkeys: Move selftests to arch-neutral directory

2020-01-29 Thread Sandipan Das
From: Ram Pai 

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Acked-by: Ingo Molnar 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/.gitignore | 1 +
 tools/testing/selftests/vm/Makefile   | 1 +
 tools/testing/selftests/{x86 => vm}/pkey-helpers.h| 0
 tools/testing/selftests/{x86 => vm}/protection_keys.c | 0
 tools/testing/selftests/x86/.gitignore| 1 -
 tools/testing/selftests/x86/Makefile  | 2 +-
 6 files changed, 3 insertions(+), 2 deletions(-)
 rename tools/testing/selftests/{x86 => vm}/pkey-helpers.h (100%)
 rename tools/testing/selftests/{x86 => vm}/protection_keys.c (100%)

diff --git a/tools/testing/selftests/vm/.gitignore 
b/tools/testing/selftests/vm/.gitignore
index 31b3c98b6d34..c55837bf39fa 100644
--- a/tools/testing/selftests/vm/.gitignore
+++ b/tools/testing/selftests/vm/.gitignore
@@ -14,3 +14,4 @@ virtual_address_range
 gup_benchmark
 va_128TBswitch
 map_fixed_noreplace
+protection_keys
diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 7f9a8a8c31da..4e9c741be6af 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -18,6 +18,7 @@ TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
+TEST_GEN_FILES += protection_keys
 
 ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 
sparc64 x86_64))
 TEST_GEN_FILES += va_128TBswitch
diff --git a/tools/testing/selftests/x86/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
similarity index 100%
rename from tools/testing/selftests/x86/pkey-helpers.h
rename to tools/testing/selftests/vm/pkey-helpers.h
diff --git a/tools/testing/selftests/x86/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
similarity index 100%
rename from tools/testing/selftests/x86/protection_keys.c
rename to tools/testing/selftests/vm/protection_keys.c
diff --git a/tools/testing/selftests/x86/.gitignore 
b/tools/testing/selftests/x86/.gitignore
index 7757f73ff9a3..eb30ffd83876 100644
--- a/tools/testing/selftests/x86/.gitignore
+++ b/tools/testing/selftests/x86/.gitignore
@@ -11,5 +11,4 @@ ldt_gdt
 iopl
 mpx-mini-test
 ioperm
-protection_keys
 test_vdso
diff --git a/tools/testing/selftests/x86/Makefile 
b/tools/testing/selftests/x86/Makefile
index 5d49bfec1e9a..5f16821c7f63 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -12,7 +12,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) 
trivial_program.c -no-pie)
 
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt 
test_mremap_vdso \
check_initial_reg_state sigreturn iopl ioperm \
-   protection_keys test_vdso test_vsyscall mov_ss_trap \
+   test_vdso test_vsyscall mov_ss_trap \
syscall_arg_fault
 TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
-- 
2.17.1



[PATCH 1/2] pseries/vio: Remove stray #ifdef CONFIG_PPC_PSERIES

2020-01-29 Thread Oliver O'Halloran
vio.c requires CONFIG_IBMVIO which in turn depends on PPC_PSERIES.
In other words, this ifdef is pointless. At a guess it's a carry-over
from pre-history.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/vio.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/vio.c 
b/arch/powerpc/platforms/pseries/vio.c
index f682b7b..37f1f25 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -1628,7 +1628,6 @@ const void *vio_get_attribute(struct vio_dev *vdev, char 
*which, int *length)
 }
 EXPORT_SYMBOL(vio_get_attribute);
 
-#ifdef CONFIG_PPC_PSERIES
 /* vio_find_name() - internal because only vio.c knows how we formatted the
  * kobject name
  */
@@ -1698,7 +1697,6 @@ int vio_disable_interrupts(struct vio_dev *dev)
return rc;
 }
 EXPORT_SYMBOL(vio_disable_interrupts);
-#endif /* CONFIG_PPC_PSERIES */
 
 static int __init vio_init(void)
 {
-- 
2.9.5



[PATCH 2/2] pseries/makefile: Remove CONFIG_PPC_PSERIES check

2020-01-29 Thread Oliver O'Halloran
The platform makefile (arch/powerpc/platforms/pseries/Makefile) is only
included by the platform makefile (arch/powerpc/platform/Makefile) when
CONFIG_PPC_PSERIES is selected, so checking for CONFIG_PPC_PSERIES in the
pseries makefile is pointless.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/Makefile | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index a3c74a5..c8a2b0b 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -29,6 +29,4 @@ obj-$(CONFIG_PPC_SPLPAR)  += vphn.o
 obj-$(CONFIG_PPC_SVM)  += svm.o
 obj-$(CONFIG_FA_DUMP)  += rtas-fadump.o
 
-ifdef CONFIG_PPC_PSERIES
 obj-$(CONFIG_SUSPEND)  += suspend.o
-endif
-- 
2.9.5



Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys

2020-01-29 Thread Sandipan Das
Hi Dave,

On 30/01/20 12:29 am, Dave Hansen wrote:
> On 1/28/20 1:38 AM, Sandipan Das wrote:
>> On 27/01/20 9:12 pm, Dave Hansen wrote:
>>> How have you tested this patch (and the whole series for that matter)?
>>>
>> I replaced the second patch with this one and did a build test.
>> Till v16, I had tested the whole series (build + run) on both a POWER8
>> system (with 4K and 64K page sizes) and a Skylake SP system but for
>> x86_64 only.
> 
> Do you have any idea why I was seeing x86 build errors and you were not?
> 

There were problems with patch 2 from v17. The fixed patch is what I replied
with previously in this thread. The test results that I posted were with that
patch included. Will post out v18 today with the fix.

- Sandipan



[PATCH] powerpc/papr_scm: Mark papr_scm_ndctl() as static

2020-01-29 Thread Vaibhav Jain
Function papr_scm_ndctl() is neither exported from the module nor
called directly from outside 'papr.c' hence should be marked 'static'.

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 0b4467e378e5..e4606100e286 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -246,8 +246,9 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
return 0;
 }
 
-int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm 
*nvdimm,
-   unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
+static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
+ struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len, int *cmd_rc)
 {
struct nd_cmd_get_config_size *get_size_hdr;
struct papr_scm_priv *p;
-- 
2.24.1



Re: [PATCH FIX] KVM: PPC: Book3S HV: Release lock on page-out failure path

2020-01-29 Thread Paul Mackerras
On Wed, Jan 22, 2020 at 10:25:42AM +0530, Bharata B Rao wrote:
> When migrate_vma_setup() fails in kvmppc_svm_page_out(),
> release kvm->arch.uvmem_lock before returning.
> 
> Fixes: ca9f4942670 ("KVM: PPC: Book3S HV: Support for running secure guests")
> Signed-off-by: Bharata B Rao 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-29 Thread Gerald Schaefer
On Mon, 27 Jan 2020 22:33:08 -0500
Qian Cai  wrote:

> > 
> >> Did those tests ever find any regression or this is almost only useful for 
> >> new
> > 
> > The test has already found problems with s390 page table helpers.
> 
> Hmm, that is pretty weak where s390 is not even official supported with this 
> version.
> 

I first had to get the three patches upstream, each fixing non-conform
behavior on s390, and each issue was found by this extremely useful test:

2416cefc504b s390/mm: add mm_pxd_folded() checks to pxd_free()
2d1fc1eb9b54 s390/mm: simplify page table helpers for large entries
1c27a4bc817b s390/mm: make pmd/pud_bad() report large entries as bad

I did not see any direct effect of this misbehavior yet, but I am
very happy that this could be found and fixed in order to prevent
future issues. And this is exactly the value of this test, to make
sure that all architectures have a common understanding of how
the various page table helpers are supposed to work.

For example, who would have thought that pXd_bad() is supposed to
report large entries as bad? It's not really documented anywhere,
so we just checked them for sanity like normal entries, which
apparently worked fine so far, but for how long?



Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-29 Thread Gerald Schaefer
On Tue, 28 Jan 2020 06:57:53 +0530
Anshuman Khandual  wrote:

> This adds tests which will validate architecture page table helpers and
> other accessors in their compliance with expected generic MM semantics.
> This will help various architectures in validating changes to existing
> page table helpers or addition of new ones.
> 
> This test covers basic page table entry transformations including but not
> limited to old, young, dirty, clean, write, write protect etc at various
> level along with populating intermediate entries with next page table page
> and validating them.
> 
> Test page table pages are allocated from system memory with required size
> and alignments. The mapped pfns at page table levels are derived from a
> real pfn representing a valid kernel text symbol. This test gets called
> right after page_alloc_init_late().
> 
> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
> arm64. Going forward, other architectures too can enable this after fixing
> build or runtime problems (if any) with their page table helpers.
> 
> Folks interested in making sure that a given platform's page table helpers
> conform to expected generic MM semantics should enable the above config
> which will just trigger this test during boot. Any non conformity here will
> be reported as an warning which would need to be fixed. This test will help
> catch any changes to the agreed upon semantics expected from generic MM and
> enable platforms to accommodate it thereafter.
> 

[...]

> 
> Tested-by: Christophe Leroy  #PPC32

Tested-by: Gerald Schaefer  # s390

Thanks again for this effort, and for keeping up the spirit against
all odds and even after 12 iterations :-)

> 
> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt 
> b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> new file mode 100644
> index ..f3f8111edbe3
> --- /dev/null
> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> @@ -0,0 +1,35 @@
> +#
> +# Feature name:  debug-vm-pgtable
> +# Kconfig:   ARCH_HAS_DEBUG_VM_PGTABLE
> +# description:   arch supports pgtable tests for semantics compliance
> +#
> +---
> +| arch |status|
> +---
> +|   alpha: | TODO |
> +| arc: |  ok  |
> +| arm: | TODO |
> +|   arm64: |  ok  |
> +| c6x: | TODO |
> +|csky: | TODO |
> +|   h8300: | TODO |
> +| hexagon: | TODO |
> +|ia64: | TODO |
> +|m68k: | TODO |
> +|  microblaze: | TODO |
> +|mips: | TODO |
> +|   nds32: | TODO |
> +|   nios2: | TODO |
> +|openrisc: | TODO |
> +|  parisc: | TODO |
> +|  powerpc/32: |  ok  |
> +|  powerpc/64: | TODO |
> +|   riscv: | TODO |
> +|s390: | TODO |

s390 is ok now, with my patches included in v5.5-rc1. So you can now add

--- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
+++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
@@ -25,7 +25,7 @@
 |  powerpc/32: |  ok  |
 |  powerpc/64: | TODO |
 |   riscv: | TODO |
-|s390: | TODO |
+|s390: |  ok  |
 |  sh: | TODO |
 |   sparc: | TODO |
 |  um: | TODO |
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -59,6 +59,7 @@ config KASAN_SHADOW_OFFSET
 config S390
def_bool y
select ARCH_BINFMT_ELF_STATE
+   select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE



Re: [PATCH] powerpc/process: Remove unneccessary #ifdef CONFIG_PPC64 in copy_thread_tls()

2020-01-29 Thread Michal Suchánek
On Wed, Jan 29, 2020 at 07:50:07PM +, Christophe Leroy wrote:
> is_32bit_task() exists on both PPC64 and PPC32, no need of an ifdefery.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/process.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index fad50db9dcf2..e730b8e522b0 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1634,11 +1634,9 @@ int copy_thread_tls(unsigned long clone_flags, 
> unsigned long usp,
>   p->thread.regs = childregs;
>   childregs->gpr[3] = 0;  /* Result from fork() */
>   if (clone_flags & CLONE_SETTLS) {
> -#ifdef CONFIG_PPC64
>   if (!is_32bit_task())
>   childregs->gpr[13] = tls;
>   else
> -#endif
>   childregs->gpr[2] = tls;
>   }
>  

Reviewed-by: Michal Suchanek 

Thanks

Michal


[PATCH] powerpc/process: Remove unneccessary #ifdef CONFIG_PPC64 in copy_thread_tls()

2020-01-29 Thread Christophe Leroy
is_32bit_task() exists on both PPC64 and PPC32, no need of an ifdefery.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/process.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fad50db9dcf2..e730b8e522b0 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1634,11 +1634,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned 
long usp,
p->thread.regs = childregs;
childregs->gpr[3] = 0;  /* Result from fork() */
if (clone_flags & CLONE_SETTLS) {
-#ifdef CONFIG_PPC64
if (!is_32bit_task())
childregs->gpr[13] = tls;
else
-#endif
childregs->gpr[2] = tls;
}
 
-- 
2.25.0



Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys

2020-01-29 Thread Florian Weimer
* Dave Hansen:

> Still doesn't build for me:
>
>> # make
>> make --no-builtin-rules ARCH=x86_64 -C ../../../.. headers_install
>> make[1]: Entering directory '/home/dave/linux.git'
>>   INSTALL ./usr/include
>> make[1]: Leaving directory '/home/dave/linux.git'
>> make: *** No rule to make target 
>> '/home/dave/linux.git/tools/testing/selftests/vm/protection_keys_32', needed 
>> by 'all'.  Stop.

Do you have 32-bit libraries installed?

Thanks,
Florian



Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys

2020-01-29 Thread Dave Hansen
On 1/28/20 1:38 AM, Sandipan Das wrote:
> On 27/01/20 9:12 pm, Dave Hansen wrote:
>> How have you tested this patch (and the whole series for that matter)?
>>
> I replaced the second patch with this one and did a build test.
> Till v16, I had tested the whole series (build + run) on both a POWER8
> system (with 4K and 64K page sizes) and a Skylake SP system but for
> x86_64 only.

Do you have any idea why I was seeing x86 build errors and you were not?


Re: [PATCH] powerpc/drmem: cache LMBs in xarray to accelerate lookup

2020-01-29 Thread Scott Cheloha
On Tue, Jan 28, 2020 at 05:56:55PM -0600, Nathan Lynch wrote:
> Scott Cheloha  writes:
> > LMB lookup is currently an O(n) linear search.  This scales poorly when
> > there are many LMBs.
> >
> > If we cache each LMB by both its base address and its DRC index
> > in an xarray we can cut lookups to O(log n), greatly accelerating
> > drmem initialization and memory hotplug.
> >
> > This patch introduces two xarrays of of LMBs and fills them during
> > drmem initialization.  The patch also adds two interfaces for LMB
> > lookup.
> 
> Good but can you replace the array of LMBs altogether
> (drmem_info->lmbs)? xarray allows iteration over the members if needed.

I don't think we can without potentially changing the current behavior.

The current behavior in dlpar_memory_{add,remove}_by_ic() is to advance
linearly through the array from the LMB with the matching DRC index.

Iteration through the xarray via xa_for_each_start() will return LMBs
indexed with monotonically increasing DRC indices.

Are they equivalent?  Or can we have an LMB with a smaller DRC index
appear at a greater offset in the array?

If the following condition is possible:

drmem_info->lmbs[i].drc_index > drmem_info->lmbs[j].drc_index

where i < j, then we have a possible behavior change because
xa_for_each_start() may not return a contiguous array slice.  It might
"leap backwards" in the array.  Or it might skip over a chunk of LMBs.


Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-29 Thread Segher Boessenkool
On Wed, Jan 29, 2020 at 06:02:34PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> > On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote:
> >> * Segher Boessenkool:
> >> >> But GCC doesn't expose them as integers to C code, so you can't do much
> >> >> without them.
> >> >
> >> > Sure, it doesn't expose any other registers directly, either.
> >> 
> >> I can use r0 & 1 with a register variable r0 to check a bit.
> >
> > That is not reliable, or supported, and it *will* break.  This is
> > explicit for local register asm, and global register asm is
> > underdefined.
> 
> Ugh.  I did not know that.  And neither did the person who wrote
> powerpc64/sysdep.h because it uses register variables in regular C
> expressions. 8-(  Other architectures are affected as well.

Where?  I don't see any?  Ah, the other one, heh (there are two).

No, that *is* supported: as input to or output from an asm, a local
register asm variable *is* guaranteed to live in the specified register.
This is the *only* supported use.  Other uses may sometimes still work,
but they never worked reliably, and it cannot be made reliable; it has
been documented as not supported since ages, and it will not work at all
anymore some day.


Segher


Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-29 Thread Florian Weimer
* Segher Boessenkool:

> On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote:
>> * Segher Boessenkool:
>> >> But GCC doesn't expose them as integers to C code, so you can't do much
>> >> without them.
>> >
>> > Sure, it doesn't expose any other registers directly, either.
>> 
>> I can use r0 & 1 with a register variable r0 to check a bit.
>
> That is not reliable, or supported, and it *will* break.  This is
> explicit for local register asm, and global register asm is
> underdefined.

Ugh.  I did not know that.  And neither did the person who wrote
powerpc64/sysdep.h because it uses register variables in regular C
expressions. 8-(  Other architectures are affected as well.

One set of issues is less of a problem if system call arguments are
variables and not complex expressions, so that side effects do not
clobber registers in the initialization:

register long __a0 asm ("$4") = (long) (arg1);

But I wasn't aware of that constraint on the macro users at all.

Thanks,
Florian



Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-29 Thread Segher Boessenkool
On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> >> But GCC doesn't expose them as integers to C code, so you can't do much
> >> without them.
> >
> > Sure, it doesn't expose any other registers directly, either.
> 
> I can use r0 & 1 with a register variable r0 to check a bit.

That is not reliable, or supported, and it *will* break.  This is
explicit for local register asm, and global register asm is
underdefined.

> I don't
> think writing a similar check against a condition register is possible
> today.

That's right.  You cannot express CCmode (MODE_CC) values in C at all.


Segher


Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-29 Thread Florian Weimer
* Segher Boessenkool:

> On Tue, Jan 28, 2020 at 05:04:49PM +0100, Florian Weimer wrote:
>> * Segher Boessenkool:
>> 
>> >> > I don't think we can save LR in a regular register around the system
>> >> > call, explicitly in the inline asm statement, because we still have to
>> >> > generate proper unwinding information using CFI directives, something
>> >> > that you cannot do from within the asm statement.
>> >
>> > Why not?
>> 
>> As far as I knowm there isn't a CFI directive that allows us to restore
>> the CFI state at the end of the inline assembly.  If we say that LR is
>> stored in a different register than what the rest of the function uses,
>> that would lead to incorrect CFI after the exit of the inline assembler
>> fragment.
>> 
>> At least that's what I think.  Compilers aren't really my thing.
>
> .cfi_restore?  Or .cfi_remember_state / .cfi_restore_state, that is
> probably easiest in inline assembler.

Oh, right, .cfi_remember_state and .cfi_restore_state should work, as
long as -fno-dwarf2-cfi-asm isn't used (but that's okay given that we
need this only for the glibc build).

But it looks like we can use an explicit "lr" clobber, so we should be
good anyway.

>> >> > GCC does not model the condition registers,
>> >
>> > Huh?  It does model the condition register, as 8 registers in GCC's
>> > internal model (one each for CR0..CR7).
>> 
>> But GCC doesn't expose them as integers to C code, so you can't do much
>> without them.
>
> Sure, it doesn't expose any other registers directly, either.

I can use r0 & 1 with a register variable r0 to check a bit.  I don't
think writing a similar check against a condition register is possible
today.

Thanks,
Florian



Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-29 Thread Tulio Magno Quites Machado Filho
Nicholas Piggin  writes:

> Adhemerval Zanella's on January 29, 2020 3:26 am:
>> 
>> We already had to push a similar hack where glibc used to abort transactions
>> prior syscalls to avoid some side-effects on kernel (commit 56cf2763819d2f).
>> It was eventually removed from syscall handling by f0458cf4f9ff3d870, where
>> we only enable TLE if kernel suppors PPC_FEATURE2_HTM_NOSC.
>> 
>> The transaction syscall abort used to read a variable directly from TCB,
>> so this could be an option. I would expect that we could optimize it where
>> if glibc is building against a recent kernel and compiler is building
>> for a ISA 3.0+ cpu we could remove the 'sc' code.
>> 
>
> We would just have to be careful of running on ISA 3.0 CPUs on older
> kernels which do not support scv.

Can we assume that, if a syscall is available through sc it's also available
in scv 0?

Because if that's true, I believe your suggestion to interpret PPC_FEATURE2_SCV
as scv 0 support would be helpful to provide this support via IFUNC even
when glibc is built using --with-cpu=power8, which is the most common scenario
in ppc64le.

In that scenario, it seems new HWCAP bits for new vectors wouldn't be too
frequent, which was the only downside of this proposal.

-- 
Tulio Magno


[RFC PATCH 6/6] powerpc/papr_scm: Implement support for DSM_PAPR_SCM_STATS

2020-01-29 Thread Vaibhav Jain
The DSM 'DSM_PAPR_SCM_STATS' should return the PAPR defined buffer
that holds various dimm performance attributes as defined in Ref[1]
back to user-space in response to ND_CMD_CALL.

Presently the module doesn't interpret nor consume these stat as they
are only intended to be consumer and reported by
'libndctl'[2]. PAPR sped [1] states that input buffer to be provided
to H_SCM_PERFORMANCE_HEALTH hcall should be 4KiB in size. However due
to limitations of the libnvdimm envelop (which is 256 bytes in size)
such a large buffer cannot be copied back to user-space.

Hence we do an optimization of querying the size of the output buffer
from H_SCM_PERFORMANCE_HEALTH hcall and copy only the needed portion
of the buffer to the user-space payload package.

This patch implements this DSM by implementing a new function
papr_scm_get_stats that queries the DIMM stat information and then
copies PHYP provided buffer that holds these stat attributes to the
package payload whose layout is defined by 'struct
papr_scm_perf_stats'

References:
[1]: https://patchwork.ozlabs.org/patch/1154292/
[2]: https://github.com/pmem/ndctl

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 6c0bc8f027db..ac50968453b0 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -34,6 +34,7 @@
 enum {
DSM_PAPR_MIN =  0x1,
DSM_PAPR_SCM_HEALTH,
+   DSM_PAPR_SCM_STATS,
DSM_PAPR_MAX,
 };
 
@@ -410,6 +411,51 @@ static int papr_scm_get_health(struct papr_scm_priv *p,
return 0;
 }
 
+/* Fetch the DIMM stats and populate it in provided papr_scm package */
+static int papr_scm_get_stats(struct papr_scm_priv *p,
+ struct nd_pkg_papr_scm *pkg)
+{
+   struct papr_scm_perf_stats *retbuffer;
+   int rc;
+   size_t copysize;
+
+   /* Return buffer for phyp where stats are written */
+   retbuffer = kzalloc(PAPR_SCM_MAX_PERF_STAT, GFP_KERNEL);
+
+   if (!retbuffer)
+   return -ENOMEM;
+
+   rc = drc_pmem_query_stats(p, retbuffer);
+   if (rc)
+   goto out;
+
+   /*
+* Parse the retbuffer, fetch the size returned and return the
+* first nd_size_out bytes back to userspce.
+*/
+   pkg->hdr.nd_fw_size = be16_to_cpu(retbuffer->size);
+   copysize = min_t(__u32, pkg->hdr.nd_fw_size, pkg->hdr.nd_size_out);
+
+   memcpy(pkg->payload, retbuffer, copysize);
+
+   /* Verify if the returned buffer was copied completely */
+   if (pkg->hdr.nd_fw_size > copysize) {
+   rc = -ENOSPC;
+   goto out;
+   }
+
+out:
+   kfree(retbuffer);
+   /*
+* Put the error in out package and return success from function
+* so that errors if any are propogated back to userspace.
+*/
+   pkg->cmd_status = rc;
+   dev_dbg(&p->pdev->dev, "%s completion code = %d\n", __func__, rc);
+
+   return 0;
+}
+
 int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm 
*nvdimm,
unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
 {
@@ -460,6 +506,11 @@ int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, 
struct nvdimm *nvdimm,
*cmd_rc = papr_scm_get_health(p, call_pkg);
break;
 
+   case DSM_PAPR_SCM_STATS:
+   call_pkg = (struct nd_pkg_papr_scm *) buf;
+   *cmd_rc = papr_scm_get_stats(p, call_pkg);
+   break;
+
default:
dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd_in);
*cmd_rc = -EINVAL;
-- 
2.24.1



[RFC PATCH 5/6] powerpc/papr_scm: Implement support for DSM_PAPR_SCM_HEALTH

2020-01-29 Thread Vaibhav Jain
The DSM 'DSM_PAPR_SCM_HEALTH' should return the health-bitmap and
health-valid-bitmap information for a dimm back to userspace in
response to ND_CMD_CALL.

This patch implements this DSM by implementing a new function
papr_scm_get_health() that queries the DIMM health information and
then copies these bitmaps to the package payload whose layout is
defined by 'struct papr_scm_ndctl_health'.

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 46 +++
 1 file changed, 46 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index e1a2c0e61077..6c0bc8f027db 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -33,9 +33,16 @@
  */
 enum {
DSM_PAPR_MIN =  0x1,
+   DSM_PAPR_SCM_HEALTH,
DSM_PAPR_MAX,
 };
 
+/* Struct as returned by kernel in response to PAPR_DSM_PAPR_SMART_HEALTH */
+struct papr_scm_ndctl_health {
+   __be64 health_bitmap;
+   __be64 health_bitmap_valid;
+} __packed;
+
 /* Payload expected with ND_CMD_CALL ioctl from libnvdimm */
 struct nd_pkg_papr_scm {
struct nd_cmd_pkg hdr;  /* Package header containing sub-cmd */
@@ -369,6 +376,40 @@ static int cmd_to_func(struct nvdimm *nvdimm, unsigned int 
cmd, void *buf,
return pkg->hdr.nd_command;
 }
 
+/* Fetch the DIMM health info and populate it in provided papr_scm package */
+static int papr_scm_get_health(struct papr_scm_priv *p,
+  struct nd_pkg_papr_scm *pkg)
+{
+   int rc;
+   struct papr_scm_ndctl_health *health =
+   (struct papr_scm_ndctl_health *)pkg->payload;
+
+   pkg->hdr.nd_fw_size = sizeof(struct papr_scm_ndctl_health);
+
+   if (pkg->hdr.nd_size_out < sizeof(struct papr_scm_ndctl_health)) {
+   rc = -ENOSPC;
+   goto out;
+   }
+
+   rc = drc_pmem_query_health(p);
+   if (rc)
+   goto out;
+
+   /* Copy the health data to the payload */
+   health->health_bitmap = p->health_bitmap;
+   health->health_bitmap_valid = p->health_bitmap_valid;
+
+out:
+   /*
+* Put the error in out package and return success from function
+* so that errors if any are propogated back to userspace.
+*/
+   pkg->cmd_status = rc;
+   dev_dbg(&p->pdev->dev, "%s completion code = %d\n", __func__, rc);
+
+   return 0;
+}
+
 int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm 
*nvdimm,
unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
 {
@@ -414,6 +455,11 @@ int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, 
struct nvdimm *nvdimm,
*cmd_rc = 0;
break;
 
+   case DSM_PAPR_SCM_HEALTH:
+   call_pkg = (struct nd_pkg_papr_scm *) buf;
+   *cmd_rc = papr_scm_get_health(p, call_pkg);
+   break;
+
default:
dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd_in);
*cmd_rc = -EINVAL;
-- 
2.24.1



[RFC PATCH 4/6] powerpc/papr_scm: Add support for handling PAPR DSM commands

2020-01-29 Thread Vaibhav Jain
Implement support for handling PAPR DSM commands in papr_scm
module. We advertise support for ND_CMD_CALL for the dimm command mask
and implement necessary scaffolding in the module to handle ND_CMD_CALL
ioctl and DSM commands that we receive.

The layout of the DSM commands as we expect from libnvdimm/libndctl is
defined in 'struct nd_pkg_papr_scm' which contains a 'struct
nd_cmd_pkg' as header. This header is used to communicate the DSM
command via 'nd_pkg_papr_scm->nd_command' and size of payload that
need to be sent/received for servicing the DSM.

The PAPR DSM commands are assigned indexes started from 0x1 to
prevent them from overlapping ND_CMD_* values and also makes handling
dimm commands in papr_scm_ndctl() easier via a simplified switch-case
block. For this a new function cmd_to_func() is implemented that reads
the args to papr_scm_ndctl() , performs sanity tests on them and
converts them to PAPR DSM commands which can then be handled via the
switch-case block.

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 96 +--
 1 file changed, 89 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index deaece6e4d18..e1a2c0e61077 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -20,10 +20,29 @@
 #define PAPR_SCM_DIMM_CMD_MASK \
((1ul << ND_CMD_GET_CONFIG_SIZE) | \
 (1ul << ND_CMD_GET_CONFIG_DATA) | \
-(1ul << ND_CMD_SET_CONFIG_DATA))
+(1ul << ND_CMD_SET_CONFIG_DATA) | \
+(1ul << ND_CMD_CALL))
 
 #define PAPR_SCM_MAX_PERF_STAT 4096
 
+/*
+ * Sub commands for ND_CMD_CALL. To prevent overlap from ND_CMD_*, values for
+ * these enums start at 0x1. These values are then returned from
+ * cmd_to_func() making it easy to implement the switch-case block in
+ * papr_scm_ndctl()
+ */
+enum {
+   DSM_PAPR_MIN =  0x1,
+   DSM_PAPR_MAX,
+};
+
+/* Payload expected with ND_CMD_CALL ioctl from libnvdimm */
+struct nd_pkg_papr_scm {
+   struct nd_cmd_pkg hdr;  /* Package header containing sub-cmd */
+   uint32_t cmd_status;/* Out: Sub-cmd status returned back */
+   uint8_t payload[];  /* Out: Sub-cmd data buffer */
+} __packed;
+
 /* Buffer layout returned by phyp when reporting drc perf stats */
 struct papr_scm_perf_stats {
uint8_t version;/* Should be 0x01 */
@@ -303,19 +322,74 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
return 0;
 }
 
+/*
+ * Validate the input to dimm-control function and return papr_scm specific
+ * commands. This does sanity validation to ND_CMD_CALL sub-command packages.
+ */
+static int cmd_to_func(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+  unsigned int buf_len)
+{
+   unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
+   struct nd_pkg_papr_scm *pkg = (struct nd_pkg_papr_scm *)buf;
+
+   /* Only dimm-specific calls are supported atm */
+   if (!nvdimm)
+   return -EINVAL;
+
+   if (!test_bit(cmd, &cmd_mask)) {
+
+   pr_debug("%s: Unsupported cmd=%u\n", __func__, cmd);
+   return -EINVAL;
+
+   } else if (cmd != ND_CMD_CALL) {
+
+   return cmd;
+
+   } else if (buf_len < sizeof(struct nd_pkg_papr_scm)) {
+
+   pr_debug("%s: Invalid pkg size=%u\n", __func__, buf_len);
+   return -EINVAL;
+
+   } else if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
+
+   pr_debug("%s: Invalid pkg family=0x%llx\n", __func__,
+pkg->hdr.nd_family);
+   return -EINVAL;
+
+   } else if (pkg->hdr.nd_command <= DSM_PAPR_MIN ||
+  pkg->hdr.nd_command >= DSM_PAPR_MAX) {
+
+   /* for unknown subcommands return ND_CMD_CALL */
+   pr_debug("%s: Unknown sub-command=0x%llx\n", __func__,
+pkg->hdr.nd_command);
+   return ND_CMD_CALL;
+   }
+
+   /* Return the DSM_PAPR_SCM_* command */
+   return pkg->hdr.nd_command;
+}
+
 int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm 
*nvdimm,
unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
 {
struct nd_cmd_get_config_size *get_size_hdr;
struct papr_scm_priv *p;
+   struct nd_pkg_papr_scm *call_pkg = NULL;
+   int cmd_in, rc;
 
-   /* Only dimm-specific calls are supported atm */
-   if (!nvdimm)
-   return -EINVAL;
+   /* Use a local variable in case cmd_rc pointer is NULL */
+   if (cmd_rc == NULL)
+   cmd_rc = &rc;
+
+   cmd_in = cmd_to_func(nvdimm, cmd, buf, buf_len);
+   if (cmd_in < 0) {
+   pr_debug("%s: Invalid cmd=%u. Err=%d\n", __func__, cmd, cmd_in);
+   return cmd_in;
+   }
 
p = nvdimm_provider_data(nvdimm);
 
-   switch

[RFC PATCH 3/6] UAPI: ndctl: Introduce NVDIMM_FAMILY_PAPR as a new NVDIMM DSM family

2020-01-29 Thread Vaibhav Jain
Add PAPR-scm family of DSM command-set to to the white list of NVDIMM
command sets.

Signed-off-by: Vaibhav Jain 
---
 include/uapi/linux/ndctl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..0e09dc5cec19 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
 #define NVDIMM_FAMILY_HPE2 2
 #define NVDIMM_FAMILY_MSFT 3
 #define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_PAPR 5
 
 #define ND_IOCTL_CALL  _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
-- 
2.24.1



[RFC PATCH 2/6] powerpc/papr_scm: Fetch dimm performance stats from PHYP

2020-01-29 Thread Vaibhav Jain
Implement support for fetching dimm performance metrics via
H_SCM_PERFORMANCE_HEALTH hcall as documented in Ref[1]. The hcall
returns a structure as described in Ref[1] and defined as newly
introduced 'struct papr_scm_perf_stats'. The struct has a header
followed by key-value pairs of performance attributes.

The module 'papr_scm' however doesn't interpret or parses these
performance attributes as it will be returning this buffer as it is to
user-space for further parsing. For now we only implement a dimm
specific sysfs attribute named 'papr_stats_version' that returns the
version of the structure returned from H_SCM_PERFORMANCE_HEALTH hcall.

Hence this patch implements a new function drc_pmem_query_stats() that
issues hcall H_SCM_PERFORMANCE_HEALTH ,requesting PHYP to store
performance stats in pre-allocated 'struct papr_scm_perf_stats' buffer
'stats'.

References:
[1]: https://patchwork.ozlabs.org/patch/1154292/

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 56 +++
 1 file changed, 56 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 1a0cc66f3dc9..deaece6e4d18 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -22,6 +22,16 @@
 (1ul << ND_CMD_GET_CONFIG_DATA) | \
 (1ul << ND_CMD_SET_CONFIG_DATA))
 
+#define PAPR_SCM_MAX_PERF_STAT 4096
+
+/* Buffer layout returned by phyp when reporting drc perf stats */
+struct papr_scm_perf_stats {
+   uint8_t version;/* Should be 0x01 */
+   uint8_t reserved1;
+   __be16 size;/* Size of this struct in bytes */
+   uint8_t buffer[];   /* Performance matrics */
+} __packed;
+
 struct papr_scm_priv {
struct platform_device *pdev;
struct device_node *dn;
@@ -148,6 +158,25 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
return drc_pmem_bind(p);
 }
 
+static int drc_pmem_query_stats(struct papr_scm_priv *p,
+   struct papr_scm_perf_stats *stats)
+{
+   unsigned long ret[PLPAR_HCALL_BUFSIZE];
+   int64_t rc;
+
+   if (!stats)
+   return -EINVAL;
+
+   rc = plpar_hcall(H_SCM_PERFORMANCE_STATS, ret, p->drc_index,
+__pa(stats));
+   if (rc != H_SUCCESS) {
+   dev_err(&p->pdev->dev,
+"Failed to query performance stats, Err:%lld\n", rc);
+   return -ENXIO;
+   } else
+   return 0;
+}
+
 static int drc_pmem_query_health(struct papr_scm_priv *p)
 {
unsigned long ret[PLPAR_HCALL_BUFSIZE];
@@ -332,6 +361,32 @@ static inline int papr_scm_node(int node)
return min_node;
 }
 
+static ssize_t papr_stats_version_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct nvdimm *dimm = to_nvdimm(dev);
+   struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+   struct papr_scm_perf_stats *retbuffer;
+   int rc;
+
+   /* Return buffer for phyp where stats are written */
+   retbuffer = kzalloc(PAPR_SCM_MAX_PERF_STAT, GFP_KERNEL);
+   if (!retbuffer)
+   return -ENOMEM;
+
+   rc = drc_pmem_query_stats(p, retbuffer);
+   if (rc)
+   goto out;
+   else
+   rc = sprintf(buf, "%d\n", retbuffer->version);
+
+out:
+   kfree(retbuffer);
+   return rc;
+
+}
+DEVICE_ATTR_RO(papr_stats_version);
+
 static ssize_t papr_health_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
@@ -353,6 +408,7 @@ DEVICE_ATTR_RO(papr_health);
 /* papr_scm specific dimm attributes */
 static struct attribute *papr_scm_nd_attributes[] = {
&dev_attr_papr_health.attr,
+   &dev_attr_papr_stats_version.attr,
NULL,
 };
 
-- 
2.24.1



[RFC PATCH 1/6] powerpc/papr_scm: Provide support for fetching dimm health information

2020-01-29 Thread Vaibhav Jain
Implement support for fetching dimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair of
64-bit big-endian integers which are then stored in 'struct
papr_scm_priv' and subsequently exposed to userspace via dimm
attribute 'papr_health'.

References:
[1]: https://patchwork.ozlabs.org/patch/1154292/

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 65 ++-
 1 file changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 0b4467e378e5..1a0cc66f3dc9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -39,6 +39,10 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
+
+   /* Health information for the dimm */
+   __be64 health_bitmap;
+   __be64 health_bitmap_valid;
 };
 
 static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -144,6 +148,30 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
return drc_pmem_bind(p);
 }
 
+static int drc_pmem_query_health(struct papr_scm_priv *p)
+{
+   unsigned long ret[PLPAR_HCALL_BUFSIZE];
+   int64_t rc;
+
+   rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
+   if (rc != H_SUCCESS) {
+   dev_err(&p->pdev->dev,
+"Failed to query health information, Err:%lld\n", rc);
+   return -ENXIO;
+   }
+
+   /* Store the retrieved health information in dimm platform data */
+
+   p->health_bitmap = ret[0];
+   p->health_bitmap_valid = ret[1];
+
+   dev_dbg(&p->pdev->dev,
+   "Queried dimm health info. Bitmap:0x%016llx Mask:0x%016llx\n",
+   be64_to_cpu(p->health_bitmap),
+   be64_to_cpu(p->health_bitmap_valid));
+
+   return 0;
+}
 
 static int papr_scm_meta_get(struct papr_scm_priv *p,
 struct nd_cmd_get_config_data_hdr *hdr)
@@ -304,6 +332,39 @@ static inline int papr_scm_node(int node)
return min_node;
 }
 
+static ssize_t papr_health_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct nvdimm *dimm = to_nvdimm(dev);
+   struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+   int rc;
+
+   rc = drc_pmem_query_health(p);
+
+   if (rc)
+   return rc;
+   else
+   return sprintf(buf, "0x%016llX 0x%016llX\n",
+  be64_to_cpu(p->health_bitmap),
+  be64_to_cpu(p->health_bitmap_valid));
+}
+DEVICE_ATTR_RO(papr_health);
+
+/* papr_scm specific dimm attributes */
+static struct attribute *papr_scm_nd_attributes[] = {
+   &dev_attr_papr_health.attr,
+   NULL,
+};
+
+static struct attribute_group papr_scm_nd_attribute_group = {
+   .attrs = papr_scm_nd_attributes,
+};
+
+static const struct attribute_group *papr_scm_dimm_attr_groups[] = {
+   &papr_scm_nd_attribute_group,
+   NULL,
+};
+
 static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 {
struct device *dev = &p->pdev->dev;
@@ -330,8 +391,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
dimm_flags = 0;
set_bit(NDD_ALIASING, &dimm_flags);
 
-   p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
- PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
+   p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_attr_groups,
+ dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
if (!p->nvdimm) {
dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
goto err;
-- 
2.24.1



[RFC PATCH 0/6] powerpc/papr_scm: Implement support for reporting DIMM health and stats

2020-01-29 Thread Vaibhav Jain
The PAPR standard provides suitable mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in Ref[2]. Until
now these stats were never fetched in the papr_scm modules nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not having
support for ACPI and NFIT. Hence 'ndctl' is unable to query and report the dimm
health status and a user had no way to determine the current health status of
a NDVIMM.

To overcome this limitation this RFC patch-set proposes a new set of
Dimm-Specific-Methods(DSM) for querying and fetching health and stat information
for dimms that support PAPR and provides an implementation in kernel for these
DSM in papr_scm modules. These changes coupled with draft/proposed ndtcl changes
located at Ref[4] should provide a way for user to retributive NVDIMM status
using ndtcl. Below is a sample output using proposed kernel + ndctl for PAPR
NVDIMM in an emulation environment:

 # ndctl list -DH
[
  {
"dev":"nmem0",
"health":{
  "health_state":"fatal",
  "life_used_percentage":60,
  "shutdown_state":"dirty"
}
  }
]

PAPR Dimm-Specific-Methods(DSM)


As the name suggests DSMs are used by vendor specific code in libndctl to
execute certain operations or fetch certain information for NVDIMMS. DSMs
can be sent to papr_scm module via libndctl (userspace) and libnvdimm(kernel)
using the ND_CMD_CALL ioctl which can be handled in the dimm control function
papr_scm_ndctl(). For PAPR this RFC proposes two DSMs that directly map to
hcalls provided by PHYP to query NVDIMM health and stats. These DSMs are:

* DSM_PAPR_SCM_HEALTH: Which map to hcall H_SCM_HEALTH and returns dimm health.

* DSM_PAPR_SCM_STATS: Which map to hcall H_SCM_PERFORMANCE_STATS and returns
  dimm performance stats.

The ioctl ND_CMD_CALL can also transfer data between
user-space and kernel via 'envelopes'. The envelop is part of a
'struct nd_cmd_pkg' which in return is wrapped in a user defined struct which
in our case is called 'struct nd_pkg_papr_scm' (packaged). These struct is
defined as:

struct nd_pkg_papr_scm {
struct nd_cmd_pkg hdr;  /* Package header containing sub-cmd */
uint32_t cmd_status;/* Out: Sub-cmd status returned back */
uint8_t payload[];  /* Out: Sub-cmd data buffer */
};

The 'payload' field of the package holds the libnvdimm defined 'envelope' which
is used to send/receive data from userspace buffer (libndctl). This RFC uses 
this
field to copy the results of hcalls that are executed in response to the DSM
commands.

Please note that results of hcalls are not interpreted (with few exceptions) in
papr_scm module at all. Instead they are directly copied to the 'payload' field
and sent to userspace(libndctl) for interpretation. This essentially means that
the papr_scm module simply acts as a conduit for libndctl to issue hcalls and
fetch its output. This should make parsing and interpreting with output buffers
of hcalls easier as it can be performed in userspace.

References:
[1]: "Power Architecture Platform Reference"
  https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2]: "[DOC,v2] powerpc: Provide initial documentation for PAPR hcalls"
 https://patchwork.ozlabs.org/patch/1154292/
[3]: "Linux on Power Architecture Platform Reference"
 https://members.openpowerfoundation.org/document/dl/469
[4]: https://github.com/vaibhav92/ndctl/tree/papr_scm_health

Vaibhav Jain (6):
  powerpc/papr_scm: Provide support for fetching dimm health information
  powerpc/papr_scm: Fetch dimm performance stats from PHYP
  UAPI: ndctl: Introduce NVDIMM_FAMILY_PAPR as a new NVDIMM DSM family
  powerpc/papr_scm: Add support for handling PAPR DSM commands
  powerpc/papr_scm: Implement support for DSM_PAPR_SCM_HEALTH
  powerpc/papr_scm: Implement support for DSM_PAPR_SCM_STATS

 arch/powerpc/platforms/pseries/papr_scm.c | 314 +-
 include/uapi/linux/ndctl.h|   1 +
 2 files changed, 306 insertions(+), 9 deletions(-)

-- 
2.24.1



[PATCH v6 4/5] powerpc/numa: Early request for home node associativity

2020-01-29 Thread Srikar Dronamraju
Currently the kernel detects if its running on a shared lpar platform
and requests home node associativity before the scheduler sched_domains
are setup. However between the time NUMA setup is initialized and the
request for home node associativity, workqueue initializes its per node
cpumask. The per node workqueue possible cpumask may turn invalid
after home node associativity resulting in weird situations like
workqueue possible cpumask being a subset of workqueue online cpumask.

This can be fixed by requesting home node associativity earlier just
before NUMA setup. However at the NUMA setup time, kernel may not be in
a position to detect if its running on a shared lpar platform. So
request for home node associativity and if the request fails, fallback
on the device tree property.

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran 
Reported-by: Abdul Haleem 
---
Changelog (v5->v6):
- Handled comments from Michael Ellerman
  * hardware-id is now deciphered in vphn_get_nid
  * No need to pass extra bool

Changelog (v2->v3):
- Handled comments from Nathan Lynch
  * Use first thread of the core for cpu-to-node map.
  * get hardware-id in numa_setup_cpu

Changelog (v1->v2):
- Handled comments from Nathan Lynch
  * Dont depend on pacas to be setup for the hwid

 arch/powerpc/mm/numa.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 63ec0c3..ee7400f 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -461,6 +461,41 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
return nid;
 }
 
+#ifdef CONFIG_PPC_SPLPAR
+static int vphn_get_nid(long lcpu)
+{
+   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+   long rc, hwid;
+
+   /*
+* On a shared lpar, device tree will not have node associativity.
+* At this time lppaca, or its __old_status field may not be
+* updated. Hence kernel cannot detect if its on a shared lpar. So
+* request an explicit associativity irrespective of whether the
+* lpar is shared or dedicated. Use the device tree property as a
+* fallback. cpu_to_phys_id is only valid between
+* smp_setup_cpu_maps() and smp_setup_pacas().
+*/
+   if (firmware_has_feature(FW_FEATURE_VPHN)) {
+   if (cpu_to_phys_id)
+   hwid = cpu_to_phys_id[lcpu];
+   else
+   hwid = get_hard_smp_processor_id(lcpu);
+
+   rc = hcall_vphn(hwid, VPHN_FLAG_VCPU, associativity);
+   if (rc == H_SUCCESS)
+   return associativity_to_nid(associativity);
+   }
+
+   return NUMA_NO_NODE;
+}
+#else
+static int vphn_get_nid(long unused)
+{
+   return NUMA_NO_NODE;
+}
+#endif  /* CONFIG_PPC_SPLPAR */
+
 /*
  * Figure out to which domain a cpu belongs and stick it there.
  * Return the id of the domain used.
@@ -485,6 +520,10 @@ static int numa_setup_cpu(unsigned long lcpu)
return nid;
}
 
+   nid = vphn_get_nid(lcpu);
+   if (nid != NUMA_NO_NODE)
+   goto out_present;
+
cpu = of_get_cpu_node(lcpu, NULL);
 
if (!cpu) {
@@ -496,6 +535,7 @@ static int numa_setup_cpu(unsigned long lcpu)
}
 
nid = of_node_to_nid_single(cpu);
+   of_node_put(cpu);
 
 out_present:
if (nid < 0 || !node_possible(nid))
@@ -515,7 +555,6 @@ static int numa_setup_cpu(unsigned long lcpu)
}
 
map_cpu_to_node(lcpu, nid);
-   of_node_put(cpu);
 out:
return nid;
 }
-- 
1.8.3.1



[PATCH v6 5/5] powerpc/numa: Remove late request for home node associativity

2020-01-29 Thread Srikar Dronamraju
With commit ("powerpc/numa: Early request for home node associativity"),
commit 2ea626306810 ("powerpc/topology: Get topology for shared
processors at boot") which was requesting home node associativity
becomes redundant.

Hence remove the late request for home node associativity.

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Abdul Haleem 
Cc: Satheesh Rajendran 
Reported-by: Abdul Haleem 
Reviewed-by: Nathan Lynch 
---
 arch/powerpc/include/asm/topology.h | 4 
 arch/powerpc/kernel/smp.c   | 5 -
 arch/powerpc/mm/numa.c  | 9 -
 3 files changed, 18 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 2f7e1ea..9bd396f 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -98,7 +98,6 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 
*cpu2_assoc)
 extern int prrn_is_enabled(void);
 extern int find_and_online_cpu_nid(int cpu);
 extern int timed_topology_update(int nsecs);
-extern void __init shared_proc_topology_init(void);
 #else
 static inline int start_topology_update(void)
 {
@@ -121,9 +120,6 @@ static inline int timed_topology_update(int nsecs)
return 0;
 }
 
-#ifdef CONFIG_SMP
-static inline void shared_proc_topology_init(void) {}
-#endif
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
 #include 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ea6adbf..cdd39a0 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1359,11 +1359,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
if (smp_ops && smp_ops->bringup_done)
smp_ops->bringup_done();
 
-   /*
-* On a shared LPAR, associativity needs to be requested.
-* Hence, get numa topology before dumping cpu topology
-*/
-   shared_proc_topology_init();
dump_numa_cpu_topology();
 
 #ifdef CONFIG_SCHED_SMT
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index ee7400f..01a03f2 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1632,15 +1632,6 @@ int prrn_is_enabled(void)
return prrn_enabled;
 }
 
-void __init shared_proc_topology_init(void)
-{
-   if (lppaca_shared_proc(get_lppaca())) {
-   bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
-   nr_cpumask_bits);
-   numa_update_cpu_topology(false);
-   }
-}
-
 static int topology_read(struct seq_file *file, void *v)
 {
if (vphn_enabled || prrn_enabled)
-- 
1.8.3.1



[PATCH v6 2/5] powerpc/numa: Handle extra hcall_vphn error cases

2020-01-29 Thread Srikar Dronamraju
Currently code handles H_FUNCTION, H_SUCCESS, H_HARDWARE return codes.
However hcall_vphn can return other return codes. Now it also handles
H_PARAMETER return code.  Also the rest return codes are handled under the
default case.

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Abdul Haleem 
Cc: Satheesh Rajendran 
Reported-by: Abdul Haleem 
Reviewed-by: Nathan Lynch 
---
Changelog (v2->v1):
 Handled comments from Nathan:
  - Split patch from patch 1.
  - Corrected a problem where I missed calling stop_topology_update().
  - Using pr_err_ratelimited instead of printk.

 arch/powerpc/mm/numa.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 50d68d2..8fbe57c 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1191,23 +1191,30 @@ static long vphn_get_associativity(unsigned long cpu,
VPHN_FLAG_VCPU, associativity);
 
switch (rc) {
+   case H_SUCCESS:
+   dbg("VPHN hcall succeeded. Reset polling...\n");
+   timed_topology_update(0);
+   goto out;
+
case H_FUNCTION:
-   printk_once(KERN_INFO
-   "VPHN is not supported. Disabling polling...\n");
-   stop_topology_update();
+   pr_err_ratelimited("VPHN unsupported. Disabling polling...\n");
break;
case H_HARDWARE:
-   printk(KERN_ERR
-   "hcall_vphn() experienced a hardware fault "
+   pr_err_ratelimited("hcall_vphn() experienced a hardware fault "
"preventing VPHN. Disabling polling...\n");
-   stop_topology_update();
break;
-   case H_SUCCESS:
-   dbg("VPHN hcall succeeded. Reset polling...\n");
-   timed_topology_update(0);
+   case H_PARAMETER:
+   pr_err_ratelimited("hcall_vphn() was passed an invalid 
parameter. "
+   "Disabling polling...\n");
+   break;
+   default:
+   pr_err_ratelimited("hcall_vphn() returned %ld. Disabling 
polling...\n"
+   , rc);
break;
}
 
+   stop_topology_update();
+out:
return rc;
 }
 
-- 
1.8.3.1



[PATCH v6 3/5] powerpc/numa: Use cpu node map of first sibling thread

2020-01-29 Thread Srikar Dronamraju
All the sibling threads of a core have to be part of the same node.
To ensure that all the sibling threads map to the same node, always
lookup/update the cpu-to-node map of the first thread in the core.

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Abdul Haleem 
Cc: Satheesh Rajendran 
Reported-by: Abdul Haleem 
Reviewed-by: Nathan Lynch 
---
Changelog: v3->v4
As suggested by Nathan, add a warning while mapping offline cpu.

 arch/powerpc/mm/numa.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8fbe57c..63ec0c3 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -467,15 +467,20 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
  */
 static int numa_setup_cpu(unsigned long lcpu)
 {
-   int nid = NUMA_NO_NODE;
struct device_node *cpu;
+   int fcpu = cpu_first_thread_sibling(lcpu);
+   int nid = NUMA_NO_NODE;
 
/*
 * If a valid cpu-to-node mapping is already available, use it
 * directly instead of querying the firmware, since it represents
 * the most recent mapping notified to us by the platform (eg: VPHN).
+* Since cpu_to_node binding remains the same for all threads in the
+* core. If a valid cpu-to-node mapping is already available, for
+* the first thread in the core, use it.
 */
-   if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) {
+   nid = numa_cpu_lookup_table[fcpu];
+   if (nid >= 0) {
map_cpu_to_node(lcpu, nid);
return nid;
}
@@ -496,6 +501,19 @@ static int numa_setup_cpu(unsigned long lcpu)
if (nid < 0 || !node_possible(nid))
nid = first_online_node;
 
+   /*
+* Update for the first thread of the core. All threads of a core
+* have to be part of the same node. This not only avoids querying
+* for every other thread in the core, but always avoids a case
+* where virtual node associativity change causes subsequent threads
+* of a core to be associated with different nid. However if first
+* thread is already online, expect it to have a valid mapping.
+*/
+   if (fcpu != lcpu) {
+   WARN_ON(cpu_online(fcpu));
+   map_cpu_to_node(fcpu, nid);
+   }
+
map_cpu_to_node(lcpu, nid);
of_node_put(cpu);
 out:
-- 
1.8.3.1



[PATCH v6 1/5] powerpc/vphn: Check for error from hcall_vphn

2020-01-29 Thread Srikar Dronamraju
There is no value in unpacking associativity, if
H_HOME_NODE_ASSOCIATIVITY hcall has returned an error.

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Abdul Haleem 
Cc: Satheesh Rajendran 
Reported-by: Abdul Haleem 
Reviewed-by: Nathan Lynch 
---
Changelog (v2->v1):
- Split the patch into 2(Suggested by Nathan).

 arch/powerpc/platforms/pseries/vphn.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/vphn.c 
b/arch/powerpc/platforms/pseries/vphn.c
index 3f07bf6..cca474a 100644
--- a/arch/powerpc/platforms/pseries/vphn.c
+++ b/arch/powerpc/platforms/pseries/vphn.c
@@ -82,7 +82,8 @@ long hcall_vphn(unsigned long cpu, u64 flags, __be32 
*associativity)
long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
 
rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, retbuf, flags, cpu);
-   vphn_unpack_associativity(retbuf, associativity);
+   if (rc == H_SUCCESS)
+   vphn_unpack_associativity(retbuf, associativity);
 
return rc;
 }
-- 
1.8.3.1



[PATCH v6 0/5] Early node associativity

2020-01-29 Thread Srikar Dronamraju
Note: (Only patch 4 changes from v5)

Abdul reported  a warning on a shared lpar.
"WARNING: workqueue cpumask: online intersect > possible intersect".
This is because per node workqueue possible mask is set very early in the
boot process even before the system was querying the home node
associativity. However per node workqueue online cpumask gets updated
dynamically. Hence there is a chance when per node workqueue online cpumask
is a superset of per node workqueue possible mask.

Link for v5: 
http://lkml.kernel.org/r/20191216144904.6776-1-sri...@linux.vnet.ibm.com
Changelog: v5->v6
 - Handled comments from Michael Ellerman
 - Rebased to v5.5-rc6

Link for v4: https://patchwork.ozlabs.org/patch/1161979
Changelog: v4->v5:
- Rebased to v5.5-rc2

Link for v3: 
http://lkml.kernel.org/r/20190906135020.19772-1-sri...@linux.vnet.ibm.com
Changelog: v3->v4
 - Added a warning as suggested by Nathan Lynch.

Link for v2: 
http://lkml.kernel.org/r/20190829055023.6171-1-sri...@linux.vnet.ibm.com
Changelog: v2->v3
 - Handled comments from Nathan Lynch.

Link for v1: https://patchwork.ozlabs.org/patch/1151658
Changelog: v1->v2
 - Handled comments from Nathan Lynch.

Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Abdul Haleem 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran 

Srikar Dronamraju (5):
  powerpc/vphn: Check for error from hcall_vphn
  powerpc/numa: Handle extra hcall_vphn error cases
  powerpc/numa: Use cpu node map of first sibling thread
  powerpc/numa: Early request for home node associativity
  powerpc/numa: Remove late request for home node associativity

 arch/powerpc/include/asm/topology.h   |  4 --
 arch/powerpc/kernel/smp.c |  5 --
 arch/powerpc/mm/numa.c| 97 +++
 arch/powerpc/platforms/pseries/vphn.c |  3 +-
 4 files changed, 78 insertions(+), 31 deletions(-)

-- 
1.8.3.1



[PATCH v4] powerpc/smp: Use nid as fallback for package_id

2020-01-29 Thread Srikar Dronamraju
Package_id is to find out all cores that are part of the same chip. On
PowerNV machines, package_id defaults to chip_id. However ibm,chip_id
property is not present in device-tree of PowerVM Lpars. Hence lscpu
output shows one core per socket and multiple cores.

To overcome this, use nid as the package_id on PowerVM Lpars.

Before the patch.
---
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  128
On-line CPU(s) list: 0-127
Thread(s) per core:  8
Core(s) per socket:  1   <--
Socket(s):   16  <--
NUMA node(s):2
Model:   2.2 (pvr 004e 0202)
Model name:  POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:   32K
L1i cache:   32K
L2 cache:512K
L3 cache:10240K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127
 #
 # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
 -1
 #

After the patch
---
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  128
On-line CPU(s) list: 0-127
Thread(s) per core:  8  <--
Core(s) per socket:  8  <--
Socket(s):   2
NUMA node(s):2
Model:   2.2 (pvr 004e 0202)
Model name:  POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:   32K
L1i cache:   32K
L2 cache:512K
L3 cache:10240K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127
 #
 # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
 0
 #
Now lscpu output is more in line with the system configuration.

Signed-off-by: Srikar Dronamraju 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Vasant Hegde 
Cc: Vaidyanathan Srinivasan 
---
Changelog from v3: 
https://lore.kernel.org/linuxppc-dev/20191216142119.5937-1-sri...@linux.vnet.ibm.com/t/#u
- Handled comments from Michael Ellerman.
  - Rename function cpu_to_nid to get_physical_package_id
  - Moved code back to arch/powerpc/kernel/smp.c from pseries/lpar.c
  - Use export_symbol_gpl instead of export_symbol.
  - Rebased to v5.5-rc6

Changelog from v2: https://patchwork.ozlabs.org/patch/1151649
- Rebased to v5.5-rc2
- Renamed the new function to cpu_to_nid
- Removed checks to fadump property. (Looked too excessive)
- Moved pseries specific code to pseries/lpar.c

Changelog from v1: https://patchwork.ozlabs.org/patch/1126145
- In v1 cpu_to_chip_id was overloaded to fallback on nid.  Michael
  Ellerman wasn't comfortable with nid being shown up as chip_id.

 arch/powerpc/include/asm/topology.h |  6 ++
 arch/powerpc/kernel/smp.c   | 30 ++---
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 2f7e1ea5089e..e2e1ccd4a18d 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -134,7 +134,13 @@ static inline void shared_proc_topology_init(void) {}
 #ifdef CONFIG_PPC64
 #include 
 
+#ifdef CONFIG_PPC_SPLPAR
+int get_physical_package_id(int cpu);
+#define topology_physical_package_id(cpu)  (get_physical_package_id(cpu))
+#else
 #define topology_physical_package_id(cpu)  (cpu_to_chip_id(cpu))
+#endif
+
 #define topology_sibling_cpumask(cpu)  (per_cpu(cpu_sibling_map, cpu))
 #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
 #define topology_core_id(cpu)  (cpu_to_core_id(cpu))
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ea6adbf6a221..d75fc8fd8168 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1185,10 +1185,34 @@ static inline void add_cpu_to_smallcore_masks(int cpu)
}
 }
 
+int get_physical_package_id(int cpu)
+{
+   int ppid = cpu_to_chip_id(cpu);
+
+#ifdef CONFIG_PPC_SPLPAR
+   /*
+* If the platform is PowerNV or Guest on KVM, ibm,chip-id is
+* defined. Hence we would return the chip-id as the
+* get_physical_package_id.
+*/
+   if (ppid == -1 && firmware_has_feature(FW_FEATURE_LPAR)) {
+   struct device_node *np = of_get_cpu_node(cpu, NULL);
+
+   if (np) {
+   ppid = of_node_to_nid(np);
+   of_node_put(np);
+   }
+   }
+#endif /* CONFIG_PPC_SPLPAR */
+
+   return ppid;
+}
+EXPORT_SYMBOL_GPL(get_physical_package_id);
+
 static void add_cpu_to_masks(int cpu)
 {
int first_thread = cpu_first_thread_sibling(cpu);
-   int chipid = cpu_to_chip_id(cpu);
+   int ppid = get_physical_package_id(cpu);
int i;
 
/*
@@ -1217,11 +1241,11 @@ static void add_cpu_to_masks(int cpu)
for_each_cpu(i, cpu_l2_cache_mask(cpu))

Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-29 Thread Segher Boessenkool
On Wed, Jan 29, 2020 at 02:58:44PM +1000, Nicholas Piggin wrote:
> Adhemerval Zanella's on January 29, 2020 3:26 am:
> > __asm__ __volatile__\
> >   ("sc\n\t" \
> >"bns+ 1f\n\t"\
> >"neg %1, %1\n\t" \
> >"1:\n\t" \

> True, but the taken branch would be a 1 cycle bubble in fetch. Could 
> avoid that by branching out of line then back for the error case. But
> mfocrf is fine (only sources one register), that's what should be used
> here I think.

neg %9,%1 ; isel %1,%9,%1,so

> That probably makes the performance argument for avoiding CR[SO] for
> error return indication less significant. Commonality with other
> architectures is probably the bigger reason for it.

Yes, and to have the syscall calling convention closer to the normal
function calling convention would be good, too.


Segher


[PATCH] powerpc/32s: Fix kasan_early_hash_table() for CONFIG_VMAP_STACK

2020-01-29 Thread Christophe Leroy
On book3s/32 CPUs that are handling MMU through a hash table,
MMU_init_hw() function was adapted for VMAP_STACK in order to
handle virtual addresses instead of physical addresses in the
low level hash functions.

When using KASAN, the same adaptations are required for the
early hash table set up by kasan_early_hash_table() function.

Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/kasan/kasan_init_32.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index d3cacd462560..16dd95bd0749 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -185,8 +185,11 @@ u8 __initdata early_hash[256 << 10] __aligned(256 << 10) = 
{0};
 
 static void __init kasan_early_hash_table(void)
 {
-   modify_instruction_site(&patch__hash_page_A0, 0x, __pa(early_hash) 
>> 16);
-   modify_instruction_site(&patch__flush_hash_A0, 0x, __pa(early_hash) 
>> 16);
+   unsigned int hash = IS_ENABLED(CONFIG_VMAP_STACK) ? (unsigned 
int)early_hash :
+   __pa(early_hash);
+
+   modify_instruction_site(&patch__hash_page_A0, 0x, hash >> 16);
+   modify_instruction_site(&patch__flush_hash_A0, 0x, hash >> 16);
 
Hash = (struct hash_pte *)early_hash;
 }
-- 
2.25.0



Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-29 Thread Qian Cai



> On Jan 29, 2020, at 5:36 AM, Catalin Marinas  wrote:
> 
> On Tue, Jan 28, 2020 at 02:07:10PM -0500, Qian Cai wrote:
>> On Jan 28, 2020, at 12:47 PM, Catalin Marinas  
>> wrote:
>>> The primary goal here is not finding regressions but having clearly
>>> defined semantics of the page table accessors across architectures. x86
>>> and arm64 are a good starting point and other architectures will be
>>> enabled as they are aligned to the same semantics.
>> 
>> This still does not answer the fundamental question. If this test is
>> simply inefficient to find bugs,
> 
> Who said this is inefficient (other than you)?

Inefficient of finding bugs. It said only found a bug or two in its lifetime?

> 
>> who wants to spend time to use it regularly? 
> 
> Arch maintainers, mm maintainers introducing new macros or assuming
> certain new semantics of the existing macros.
> 
>> If this is just one off test that may get running once in a few years
>> (when introducing a new arch), how does it justify the ongoing cost to
>> maintain it?
> 
> You are really missing the point. It's not only for a new arch but
> changes to existing arch code. And if the arch code churn in this area
> is relatively small, I'd expect a similarly small cost of maintaining
> this test.
> 
> If you only turn DEBUG_VM on once every few years, don't generalise this
> to the rest of the kernel developers (as others pointed out, this test
> is default y if DEBUG_VM).

Quite the opposite, I am running DEBUG_VM almost daily for regression
workload while I felt strongly this thing does not add any value mixing there.

So, I would suggest to decouple this away from DEBUG_VM, and clearly
document that this test is not something intended for automated regression
workloads, so those people don’t need to waste time running this.

> 
> Anyway, I think that's a pointless discussion, so not going to reply
> further (unless you have technical content to add).
> 
> -- 
> Catalin



[PATCH v2] powerpc/uaccess: simplify the get_fs() set_fs() logic

2020-01-29 Thread Christophe Leroy
On powerpc, we only have USER_DS and KERNEL_DS

Today, this is managed as an 'unsigned long' data space limit
which is used to compare the passed address with, plus a bit
in the thread_info flags that is set whenever modifying the limit
to enable the verification in addr_limit_user_check()

The limit is either the last address of user space when USER_DS is
set, and the last address of address space when KERNEL_DS is set.
In both cases, the limit is a compiletime constant.

get_fs() returns the limit, which is part of thread_info struct
set_fs() updates the limit then set the TI_FSCHECK flag.
addr_limit_user_check() check the flag, and if it is set it checks
the limit is the user limit, then unsets the TI_FSCHECK flag.

In addition, when the flag is set the syscall exit work is involved.
This exit work is heavy compared to normal syscall exit as it goes
through normal exception exit instead of the fast syscall exit.

Rename this TI_FSCHECK flag to TIF_KERNEL_DS flag which tells whether
KERNEL_DS or USER_DS is set. Get mm_segment_t be redifined as a bool
struct that is either false (for USER_DS) or true (for KERNEL_DS).
When TIF_KERNEL_DS is set, the limit is ~0UL. Otherwise it is
TASK_SIZE_USER (resp TASK_SIZE_USER64 on PPC64). When KERNEL_DS is
set, there is no range to check. Define TI_FSCHECK as an alias to
TIF_KERNEL_DS.

On exit, involve exit work when the bit is set, i.e. when KERNEL_DS
is set. addr_limit_user_check() will clear the bit and kill the
user process.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/processor.h   |  5 +---
 arch/powerpc/include/asm/thread_info.h |  9 ---
 arch/powerpc/include/asm/uaccess.h | 35 +-
 arch/powerpc/lib/sstep.c   |  2 +-
 4 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 8387698bd5b6..e9e3c3b0f05e 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -84,7 +84,7 @@ void start_thread(struct pt_regs *regs, unsigned long fdptr, 
unsigned long sp);
 void release_thread(struct task_struct *);
 
 typedef struct {
-   unsigned long seg;
+   bool is_kernel_ds;
 } mm_segment_t;
 
 #define TS_FPR(i) fp_state.fpr[i][TS_FPROFFSET]
@@ -148,7 +148,6 @@ struct thread_struct {
unsigned long   ksp_vsid;
 #endif
struct pt_regs  *regs;  /* Pointer to saved register state */
-   mm_segment_taddr_limit; /* for get_fs() validation */
 #ifdef CONFIG_BOOKE
/* BookE base exception scratch space; align on cacheline */
unsigned long   normsave[8] cacheline_aligned;
@@ -289,7 +288,6 @@ struct thread_struct {
 #define INIT_THREAD { \
.ksp = INIT_SP, \
.ksp_limit = INIT_SP_LIMIT, \
-   .addr_limit = KERNEL_DS, \
.pgdir = swapper_pg_dir, \
.fpexc_mode = MSR_FE0 | MSR_FE1, \
SPEFSCR_INIT \
@@ -298,7 +296,6 @@ struct thread_struct {
 #define INIT_THREAD  { \
.ksp = INIT_SP, \
.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
-   .addr_limit = KERNEL_DS, \
.fpexc_mode = 0, \
.fscr = FSCR_TAR | FSCR_EBB \
 }
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index a2270749b282..a681ce624ab7 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -69,7 +69,7 @@ struct thread_info {
 #define INIT_THREAD_INFO(tsk)  \
 {  \
.preempt_count = INIT_PREEMPT_COUNT,\
-   .flags =0,  \
+   .flags =_TIF_KERNEL_DS, \
 }
 
 #define THREAD_SIZE_ORDER  (THREAD_SHIFT - PAGE_SHIFT)
@@ -90,7 +90,8 @@ void arch_setup_new_exec(void);
 #define TIF_SYSCALL_TRACE  0   /* syscall trace active */
 #define TIF_SIGPENDING 1   /* signal pending */
 #define TIF_NEED_RESCHED   2   /* rescheduling necessary */
-#define TIF_FSCHECK3   /* Check FS is USER_DS on return */
+#define TIF_KERNEL_DS  3   /* KERNEL_DS is set */
+#define TIF_FSCHECKTIF_KERNEL_DS
 #define TIF_SYSCALL_EMU4   /* syscall emulation active */
 #define TIF_RESTORE_TM 5   /* need to restore TM FP/VEC/VSX */
 #define TIF_PATCH_PENDING  6   /* pending live patching update */
@@ -130,7 +131,7 @@ void arch_setup_new_exec(void);
 #define _TIF_SYSCALL_TRACEPOINT(1thread.addr_limit = fs;
-   /* On user-mode return check addr_limit (fs) is correct */
-   set_thread_flag(TIF_FSCHECK);
+   update_thread_flag(TIF_KERNEL_DS, segment_eq(fs, KERNEL_DS));
 }
 
-#define segment_eq(a, b)   ((a).seg == (b).seg)
-
-#define user_addr_max()   

[PATCH v1] powerpc/uaccess: simplify the get_fs() set_fs() logic

2020-01-29 Thread Christophe Leroy
On powerpc, we only have USER_DS and KERNEL_DS

Today, this is managed as an 'unsigned long' data space limit
which is used to compare the passed address with, plus a bit
in the thread_info flags that is set whenever modifying the limit
to enable the verification in addr_limit_user_check()

The limit is either the last address of user space when USER_DS is
set, and the last address of address space when KERNEL_DS is set.
In both cases, the limit is a compiletime constant.

get_fs() returns the limit, which is part of thread_info struct
set_fs() updates the limit then set the TI_FSCHECK flag.
addr_limit_user_check() check the flag, and if it is set it checks
the limit is the user limit, then unsets the TI_FSCHECK flag.

In addition, when the flag is set the syscall exit work is involved.

Remove this TI_FSCHECK flag, and replace it by a TIF_KERNEL_DS flag
which tells whether KERNEL_DS or USER_DS is set. When TIF_KERNEL_DS
is set, the limit is ~0UL. Otherwise it is TASK_SIZE_USER (resp
TASK_SIZE_USER64 on PPC64). When KERNEL_DS is set, there is no range
to check.

On exit, involve exit work when the bit is set, i.e. when KERNEL_DS
is set. As TI_FSCHECK is not set anymore, test will be done everytime
exit work is run, but doing the check is now as costly as checking
whether the check is to be done.

Signed-off-by: Christophe Leroy 
---
This version is first version. The intention was to not modify things
much, but the resulting assembly is bad, so lets take v2 instead.
---
 arch/powerpc/include/asm/processor.h   |  3 ---
 arch/powerpc/include/asm/thread_info.h |  8 
 arch/powerpc/include/asm/uaccess.h | 10 --
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 8387698bd5b6..0747f930a680 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -148,7 +148,6 @@ struct thread_struct {
unsigned long   ksp_vsid;
 #endif
struct pt_regs  *regs;  /* Pointer to saved register state */
-   mm_segment_taddr_limit; /* for get_fs() validation */
 #ifdef CONFIG_BOOKE
/* BookE base exception scratch space; align on cacheline */
unsigned long   normsave[8] cacheline_aligned;
@@ -289,7 +288,6 @@ struct thread_struct {
 #define INIT_THREAD { \
.ksp = INIT_SP, \
.ksp_limit = INIT_SP_LIMIT, \
-   .addr_limit = KERNEL_DS, \
.pgdir = swapper_pg_dir, \
.fpexc_mode = MSR_FE0 | MSR_FE1, \
SPEFSCR_INIT \
@@ -298,7 +296,6 @@ struct thread_struct {
 #define INIT_THREAD  { \
.ksp = INIT_SP, \
.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
-   .addr_limit = KERNEL_DS, \
.fpexc_mode = 0, \
.fscr = FSCR_TAR | FSCR_EBB \
 }
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index a2270749b282..8980bbddc4d6 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -69,7 +69,7 @@ struct thread_info {
 #define INIT_THREAD_INFO(tsk)  \
 {  \
.preempt_count = INIT_PREEMPT_COUNT,\
-   .flags =0,  \
+   .flags =_TIF_KERNEL_DS, \
 }
 
 #define THREAD_SIZE_ORDER  (THREAD_SHIFT - PAGE_SHIFT)
@@ -90,7 +90,7 @@ void arch_setup_new_exec(void);
 #define TIF_SYSCALL_TRACE  0   /* syscall trace active */
 #define TIF_SIGPENDING 1   /* signal pending */
 #define TIF_NEED_RESCHED   2   /* rescheduling necessary */
-#define TIF_FSCHECK3   /* Check FS is USER_DS on return */
+#define TIF_KERNEL_DS  3   /* KERNEL_DS is set */
 #define TIF_SYSCALL_EMU4   /* syscall emulation active */
 #define TIF_RESTORE_TM 5   /* need to restore TM FP/VEC/VSX */
 #define TIF_PATCH_PENDING  6   /* pending live patching update */
@@ -130,7 +130,7 @@ void arch_setup_new_exec(void);
 #define _TIF_SYSCALL_TRACEPOINT(1thread.addr_limit = fs;
-   /* On user-mode return check addr_limit (fs) is correct */
-   set_thread_flag(TIF_FSCHECK);
+   update_thread_flag(TIF_KERNEL_DS, segment_eq(fs, KERNEL_DS));
 }
 
-#define segment_eq(a, b)   ((a).seg == (b).seg)
-
 #define user_addr_max()(get_fs().seg)
 
 #ifdef __powerpc64__
-- 
2.25.0



Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-29 Thread Catalin Marinas
On Tue, Jan 28, 2020 at 02:07:10PM -0500, Qian Cai wrote:
> On Jan 28, 2020, at 12:47 PM, Catalin Marinas  wrote:
> > The primary goal here is not finding regressions but having clearly
> > defined semantics of the page table accessors across architectures. x86
> > and arm64 are a good starting point and other architectures will be
> > enabled as they are aligned to the same semantics.
> 
> This still does not answer the fundamental question. If this test is
> simply inefficient to find bugs,

Who said this is inefficient (other than you)?

> who wants to spend time to use it regularly? 

Arch maintainers, mm maintainers introducing new macros or assuming
certain new semantics of the existing macros.

> If this is just one off test that may get running once in a few years
> (when introducing a new arch), how does it justify the ongoing cost to
> maintain it?

You are really missing the point. It's not only for a new arch but
changes to existing arch code. And if the arch code churn in this area
is relatively small, I'd expect a similarly small cost of maintaining
this test.

If you only turn DEBUG_VM on once every few years, don't generalise this
to the rest of the kernel developers (as others pointed out, this test
is default y if DEBUG_VM).

Anyway, I think that's a pointless discussion, so not going to reply
further (unless you have technical content to add).

-- 
Catalin


Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys

2020-01-29 Thread Sandipan Das
Hi Dave,

On 28/01/20 3:08 pm, Sandipan Das wrote:
> On 27/01/20 9:12 pm, Dave Hansen wrote:
>>
>> How have you tested this patch (and the whole series for that matter)?
>>
> 
> I replaced the second patch with this one and did a build test.
> Till v16, I had tested the whole series (build + run) on both a POWER8
> system (with 4K and 64K page sizes) and a Skylake SP system but for
> x86_64 only. Following that, I could only do a build test locally on
> my laptop for i386 and x86_64 on my laptop as I did not have access to
> the Skylake system anymore.
> 
> 

I got a chance to use the Skylake system today and tested (build + run)
the whole series (v17 with the fixed Makefile) for both i386 and x86_64.
Everything passed.

- Sandipan